Microarrays were designed using approximately 26,000 publicly available ESTs from M. californianus (ES7325872 - ES738966; ES408175 - ES387463; (Gracey et al., 2008)) and M. galloprovincialis (AJ623313 - AJ626468; AJ516092 - AJ516921; (Venier et al., 2003)). Both sets of ESTs were sequenced from libraries constructed using Suppression Subtractive Hybridization (SSH) (Diatchenko et al., 1996) to enrich for transcripts expressed in response to various stressors. Sequences for each species were separately screened for vector contamination, clustered, assembled, and loaded into a PostgresQL database using the PartiGene bioinformatics pipeline (Parkinson et al., 2004), yielding a total of 12,961 and 1,688 putative gene transcripts (clusters) for M. californianus and M. galloprovincialis sequences, respectively. Clusters were annotated for identity against the UniRef 90 protein database using NCBI Blast, with an e-value cutoff of 10-8. Clusters were also annotated for Gene Ontology (GO) and enzyme classification (EC) using the program annot8r (Schmid and Blaxter, 2008), again with an e-value cutoff of 10-8. Long oligo (60-mer) probes were designed against the M. californianus clusters using the probe design program YODA (Nordberg, 2005) under default parameters, with a maximum of 5 probes per cluster. This yielded 43,969 total unique probes to 11,506 clusters, with a mean of 2.6 probes per cluster. To measure the effects of interspecific sequence variation on probe performance, the above probes were compared to M. galloprovincialis clusters using BLAST. Probes designed to the M. californianus library that successfully aligned with the smaller M. galloprovincialis library had counterparts designed using the M. galloprovincialis sequence. This resulted in 556 pairs of probes that had matched M. californianus and M. galloprovincialis sequences, with a mean number of 4.6 divergent nucleotide bases per probe. Altogether, there were 44,524 unique probes that were duplicated or triplicated randomly to fill a 105,000-feature microarray and in-situ synthesized by Agilent Technologies, Inc. (Santa Clara, CA, USA).
Description
HMS/Somero-Mytilus-105K_Agilent-v1.0 To accurately compare the transcriptomes of Mytilus galloprovincialis and M. trossulus, we chose to develop a common microarray format that could be used for both species. This microarray design consisted of probe sequences generated from the out-group species, M. californianus. M. trossulus and M. galloprovincialis are approximately 30 million years divergent from M. californianus, yet only 3.5 million years divergent from each other (Seed, 1992). Therefore, heterologous hybridization to the microarray allowed us to compare transcriptional responses of M. galloprovincialis and M. trossulus without the inherent sequence biases that would result from a microarray that was designed from sequences of either M. galloprovincialis or M. trossulus. A limited number of sequences (556) from ESTs from M. galloprovincialis that matched M. californianus ESTs were included on the microarray to test for the effects of sequence mismatches. To further control for sequence biases only probes that were present in all hybridizations of all 84 samples of both M. galloprovincialis and M. trossulus were used in our analyses. The Platform data table reflects a condensed representation of the array's replicate features. The full array layout representing all of the individual features is linked as a supplementary file at the foot of this record.