Washington University School of Medicine Microarray Core
Manufacture protocol
The Brugia malayi Version 2 (V.2) array contains 18,153 oligonucleotides, 878 based on Wuchereria bancrofti ESTs, 1,016 based on Onchocerca volvulus gene indices, 804 from the Wolbachia complete genome, and the remaining 15,455 based on B. malayi ESTs, TIGR gene indices, TIGR gene models, and L3 ESTs.
Oligonucleotide selection: We patterned our oligo selection algorithm after PICK70 (J. DeRisi, personal communication). The algorithm utilizes two input files in oligo selection: the first is a fasta format file of all genes to be used for oligo selection and the second is a fasta format file of the entire genome sequence. For the input gene sequences, we bias selection of oligos to the 3?-most 1000 bases of sequence (for genes greater than 1000 bp in length) by pre-masking all sequences upstream of this cutoff. The initial portion of the oligo selection routine is designed to mask out any repetitive sequence elements or any highly similar portions (e.g. to other genes) of the input gene sequences. This masking is accomplished by BLAST comparison of genes to genome, using a stringent cut-off to mask out any sequence with 70% or greater similarity to any other sequence in the genome that extends over 66% of the desired oligo length. Once the input gene sequences are masked, they are subjected to the oligo selection algorithm. Here, the algorithm examines each sequence one at a time, starting from base number one and looking at the first contiguous 65 bases (or whatever the input desired oligo length is) with respect to the following parameters, in order: 1.Ensures that all of the bases being considered are either A, C, G or T (e.g. no N?s or masked bases are contained in the putative oligo sequence). If all bases do not conform, the algorithm shifts to base two of the sequence and reiterates the check. 2.If all bases conform, examines the input bases for whether they collectively fall in the desired G/C range of the oligo. If not, the sequence is rejected and the algorithm shifts to base two of the input sequence and reiterates step one, etc. 3.If all bases and the G/C range conform, the algorithm examines the input bases for whether they collectively fall in the desired Tm range of the oligo. If not, the sequence is rejected and the algorithm shifts to base two of the input sequence and reiterates step one. The Tm calculation currently used is nearest-neighbor calculation. For the next series of checks, the algorithm assigns ?scores? for the input oligo, rather than outright rejection of non-conforming sequence. The score assigned indicates the relative deviation of the oligo from the desired parameter, where a score of zero means that the oligo conforms to the parameter. 4.If the bases conform to the Tm range desired, then the algorithm examines the oligo for percentage of G/C content across the oligo, using a sliding window. If any window has a G/C content that deviates from the average G/C content of all windows by more than a specified amount (the default is 10% of the overall oligo Tm) the oligo score is incrementally increased. This ensures uniformity of G/C content across the oligo. 5.Once the sequence G/C uniformity is scored, the algorithm examines the oligo for homopolymers greater in length than the cut-off length set by the user (we typically stipulate that the same base cannot occur more than 4 times in a row or the oligo score is increased from zero). The number of homopolymer runs in the oligo greater than the cut-off length and the total number of bases encompassed by these runs is also determined and stored. 6.Once the homopolymer content is assessed, the algorithm calculates secondary structure potential. This is done using a fasta comparison of the sequence to itself. The optimal score of the fasta alignment is used here as the score reported for the oligo. An optimal fasta score of zero indicates no potential for self-annealing. The optimal score increases as more regions of potential self-alignment are found. Our examination of many such alignments indicates that oligos with a reverse fasta score lower than 100 lack significant secondary structure. 7.If the algorithm passes through all these checks with the oligo starting at base one of the input sequence, it writes that oligo and its scores to a temporary database and starts over with the next input oligo (bases 2-66 in this example). The algorithm reiterates over the entire input gene and then passes to the next gene. 8.In some stretches of input sequence, multiple oligos will result that differ by only one base in succession, but all conform to the desired oligo parameters. In these cases, the algorithm selects the best scoring sequence for all parameters and selects it as the best oligo sequence representing a particular region. 9.In some input genes, the algorithm may not find any oligos within the desired Tm range. In these cases, the algorithm will lengthen or shorten the oligo length according to the length range input, and re-iterate at each length to determine whether the desired Tm range can be met. If not, the algorithm selects the best oligo that falls outside the range, with the corresponding Tm appearing in the final output file. 10.The final output of the algorithm is a text document containing the optimal oligo for each gene examined, including the oligo names (gene name plus oligo start position as an extension) and sequences, followed by the output of all parameter scores for that sequence For genes that do not result in an oligo selection, we manually examine the input sequences and then relax certain algorithmic parameters in order to select an oligo (typically we relax the constraint on similarity in the initial masking step, up to 80% identity). In some instances, no oligo that uniquely represents a gene can be found since no sufficient length of the gene sequence is left unmasked due to gene family similarities at the nucleotide level. Oligos were synthesized from the consensus sequence of selected clusters (n=3569) by standard methods by Illumina (San Diego, CA). The oligonucleotides (50nM in 3x SSC with 0.75M betaine) were printed in duplicate on MWG Epoxy slides (MWG Biotech Inc, High point, NC) by a locally constructed linear servo arrayer.
Submission date
Mar 19, 2010
Last update date
Apr 06, 2010
Contact name
Uta StrĂ¼bing
Organization name
University Hospital Bonn
Department
Institute for Medical Microbiology, Immunology and Parasitology
ref|NP_499556.1| NitFhit family member (nft-1) [Caenorhabditis elegans] >sp|O76463|NFT1_CAEEL Nitrilase and fragile histidine triad fusion protein NitFhit [Includes: Bis(5'-adenosyl)-triphosphatase (Diadenosine 5',5'''-P1,P3-triphosphate hydrolase) (Dinucleosidetriphosphatase) (AP3A hydrolase) (AP3Aase); Nitrilase homolog ] >pdb|1EMS|A Chain A, Crystal Structure Of The C. Elegans Nitfhit Protein >pdb|1EMS|B Chain B, Crystal Structure Of The C. Elegans Nitfhit Protein >gb|AAC39136.1| nitrilase and fragile histidine triad fusion protein NitFhit [Caenorhabditis elegans] >emb|CAB60517.1| Hypothetical protein Y56A3A.13 [Caenorhabditis elegans]
ref|NP_492111.1| Deacetylase Complex Protein family member (dcp-66) [Caenorhabditis elegans] >emb|CAA96601.1| Hypothetical protein C26C6.5a [Caenorhabditis elegans] >emb|CAB07229.1| Hypothetical protein C26C6.5a [Caenorhabditis elegans]