Cloning of TALE proteins: TALE vectors were assembled by a combination of REAL assembly and REAL-Fast assembly . The REAL assembly and REAL-Fast plasmid vectors do not include plasmids encoding non-canonical RVDs, so these non-canonical plasmids were cloned by site-directed mutagenesis on the canonical RVD plasmids. Plasmid vectors expressing one-, two-, or four-long TALE repeats (within the pUC57-ΔBsaI backbone) were ligated in a serial, hierarchical progression to assemble full TALE repeat arrays bearing the proper sequence of repeat variable diresidues (RVDs) to target the DNA sequence of interest. Generally, this assembly involved restriction enzyme digest of each N-terminal TALE repeat vector with BsaI and BamHI, followed by digest of its neighboring C-terminal TALE repeat vector with BsaI and BamHI, and finally ligation of these neighboring repeats by T4 DNA ligase. TALEN expression vectors were digested with SacII and BamHI to obtain the DNA-binding domain comprising the ∆152 N-terminal domain, the RVD repeats, and the +63 C-terminal domain. This fragment was ligated into a modified pDONR221 vector (Invitrogen), with SacII and BamHI restriction sites internal to attL recombination sites, to create Gateway-compatible TALE Entry clones. The TALE constructs were then transferred by Gateway recombinational cloning into the pDEST15 expression vector, which adds an N-terminal glutathione S-transferase (GST) tag (Invitrogen), by an LR reaction. All clones were full-length sequence-verified (Supplementary Data). Custom PBM design: Target sites for each TALE protein were determined using the canonical TALE code (NI: A, HD: C, NN: G, NG: T), and are preceded by the 5’ T to create the full target site. The constant flanking regions were the same as that used in a prior custom PBM design and do not contain binding sites for any of the TALE proteins in this study . Probe set descriptions, including the array design versions on which they are included, are provided in Supplementary Note. The Agilent AMAD ID for this custom array is 084120. TALE protein expression: Proteins were expressed using the PURExpress In Vitro Transcription and Translation Kit (New England Biolabs). Protein concentrations were determined by anti-GST western blots with a dilution series of recombinant GST (Sigma). Proteins were stored at 4oC until being used in PBM assays. The duration of storage at 4oC between protein expression and PBM experiments was typically one day, but never greater than three days. PBM experiments: PBM experiments were performed as follows: briefly, custom-designed microarrays were first double-stranded by an on-slide primer extension reaction. In the PBM assay, arrays were blocked with 2% milk in PBS for 1 h, washed with 0.1% Tween-20 in PBS and 0.01% TX-100 in PBS, then incubated with protein mixture (PBS, 2% milk, 0.2 mg ml-1 BSA, and 0.3 ug ml-1 salmon testes DNA) for 1 h. The final concentration of TALE protein in the PBM reactions was 200 nM, unless otherwise indicated (Supplementary Table 1). Arrays were washed with 0.5% Tween-20 in PBS and 0.01% TX-100 in PBS.
Lastly, the array was incubated for 20 min with an Alexa488-conjugated anti-GST antibody (Invitrogen A-11131), and washed with 0.05% Tween in PBS and PBS.
Hybridization protocol
NA
Scan protocol
PBM arrays were scanned using a GenePix 4400A Microarray Scanner (Molecular Devices), and scan images were analyzed by GenePix Pro (Molecular Devices).
Description
Notes from NEB: PURExpress® is a reconstituted protein synthesis system based on the PUREsystem™ (Shimizu et al., 2001) where all necessary components needed for in vitro transcription and translation are purified from E. coli. Assay for protein-DNA sequence specificity
Data processing
PBM data quantification: Raw data files were processed using the same general approach as used for universal PBMs. Briefly, masliner software was used to combine Alexa488 scans at three different laser power levels and to resolve the signal intensity in spots that are saturated at high laser power settings. Cy3 scans were performed at a single laser power level. If a data set had any negative background-subtracted intensity (BSI) values (which can occur if the region surrounding a spot is brighter than the spot itself), a pseudocount was added to all BSI values for that experiment such that all values were then positive. The custom PBM design included ten replicate probes for each sequence. For each experiment and for each set of probes with identical sequences, we calculated the median-adjusted BSI, median absolute deviation (MAD) and the robust standard deviation estimate from the MAD. Any individual replicate probe with a normalized adjusted BSI value more than 3 s.d. away from the median of the replicate probes was omitted from subsequent analysis, to avoid confounding statistical tests or incorrect choice of parameter settings in model fitting. For each TALE protein, we defined a background set of probes that comprises all the probes on the array designed to represent binding sites for other TALE proteins (not the one being assayed in a given experiment). The array median level was then calculated as the median normalized adjusted BSI of all probes in the background set. The standard deviation of the background set SIs was calculated robustly using the asymptotic approximation σ = 1.4826 x MAD. The z-score for each probe was calculated relative to the median and standard deviation of its corresponding background probes. These z-scores represent a linear transformation of the median SIs for each probe, and therefore facilitate interpretation but do not affect the PWM fitting procedure, which performs its own linear scaling adjustments.