GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1411197

Query DataSets for GSM1411197

Status

Public on Jun 12, 2014

Title

SCA2/FOR2 (CGH)

Sample type

genomic

Channel 1

Source name

DNA from G. scandens (SCA) erythrocytes

Organism

Geospiza scandens

Characteristics

species (symbol): G. scandens (SCA)
tissue: blood
cell type: erythrocytes

Treatment protocol

Experimental

Growth protocol

Birds were captured from January-April 2009 at El Garrapatero, a lowland arid site on Santa Cruz Island, Galapagos Archipelago, Ecuador, with mist nests and banded with numbered Monel bands to track recaptures. Birds were identified, aged and sexed using size and plumage characteristics.

Extracted molecule

genomic DNA

Extraction protocol

A small blood sample (90 μl) from each bird was collected in a microcapillary tube via brachial venipuncture. Samples were stored on wet ice in the field, then erythrocytes purified by centrifugation and cells stored in a -20˚C freezer at a field station. Following the field season, samples were placed in a -80˚C freezer for longer term storage. Erythrocyte DNA was isolated with DNAeasy Blood and Tissue Kit (Qiagen, Valencia, CA) and then stored at -80°C prior to analysis. DNA was sonicated as follows: DNA suspension was centrifuged at 500 × g for 5 min. The DNA was then resuspended in 5 ml NIM free of EDTA, PMSF, and leupeptin and centrifuged again. A small amount of DNA suspension (1 μl) was finally suspended in 5 μl of EDTA- and protease inhibitor-free NIM medium containing 12% polyvinyl pyrrolidone (PVP; Mr 360 000) (without protease inhibitors) (Tateno et al. 2000) and then purified using a series of washes and centrifugations (Ward et al. 1999) from variable number of animals per species analyzed. The same concentrations of DNA from individual blood samples were then used to produce pools of DNA material. Two DNA pools were produced in total per species, each one containing the same amount of DNA from different animals. The number of individuals used per pool is shown in the manuscript's Supplemental Table S6. These DNA pools were then used for CGH arrays.

Label

Cy5

Label protocol

For one sub-array of each species DNA samples from the experimental groups were labeled with Cy5 and DNA samples from the control lineage were labeled with Cy3. For the other sub-array of each species a dye swap was performed so that DNA samples from the experimental groups were labeled with Cy3 and DNA samples from the control lineage were labeled with Cy5.

Channel 2

Source name

DNA from Geospiza fortis (FOR) erythrocytes

Organism

Geospiza fortis

Characteristics

species (symbol): Geospiza fortis (FOR)
tissue: blood
cell type: erythrocytes

Treatment protocol

Control

Growth protocol

Extracted molecule

genomic DNA

Extraction protocol

Label

Cy3

Label protocol

Hybridization protocol

For the copy number variation analysis a CGH custom design by Roche Nimblegen was used that consisted of a whole genome tiling array of Zebra Finch (Taeniopygia guttata) with 720,000 probes per array. The probe size ranged from 50-75mer in length with median probe spacing of 1395bp. Two different comparative (CNV vs CNV) hybridization experiments were performed (2 sub-arrays) for each species in query (FUL, SCA, PAR, CRA) versus control FOR, with each sub-array including hybridizations from DNA pools from these different species. Two DNA pools were built for each species.

Scan protocol

Scanning and image acquisition was performed in-house by Nimblegen Inc.

Description

Sample name: 63132602
Processed data file: Regions_of_Loss&Gain_Finch_SCA.csv
G. scandens-SCA (experimental) sample 2 is compared to Geospiza fortis-FOR (control) sample 2

Data processing

For the CNV experiment raw data from the Cy3 and Cy5 channels were imported into R (R Development Core Team (2010), R: A language for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org), checked for quality and converted to MA values (M=Cy5-Cy3; A=(Cy5+Cy3)/2).Within each array, probes were separated into groups by GC content and each group was separately normalized, between Cy3 and Cy5 using the loess normalization procedure. This allowed for GC groups to receive a normalization curve specific to that group. After each array was normalized within array, the arrays were then normalized across arrays using the A quantile normalization procedure. (Manikkam et al. 2012). Following normalization, the average value of each probe was calculated and three different copy number variation algorithms were used on each of these probes including circular binary segmentation from the DNA copy (Olshen et al. 2004), CGHseg (Picard et al. 2005) and cghFlasso (Tibshirani and Wang 2008). These three algorithms were used with the default perimeters. The average values from the output of these algorithms were obtained. A threshold of 0.04 as a cut-off was used on the summary (average of the log-ratio from the three algorithms) to specify gains and losses. Consecutive probes (≥3) of gains and losses were used to identify separate CNV regions. A cut-off of 3-probe minimum was used and those regions were considered a valid CNV. The statistically significant copy number variation regions (CNV) were identified and P-values associated with each region presented. A cut-off of p < 10-5 was used to select the final regions of gains and losses. The July 2008 assembly of the Zebra Finch genome (taeGut1, WUSTL v3.2.4) produced by the Genome Sequencing Center at the Washington University in St. Louis (WUSTL) School of Medicine was retrieved (WUSTL 2008). A seed file was constructed and a BSgenome package was forged for using the Finch DNA sequence in the R code (Herve Pages BSgenome: Infrastructure for Biostrings-based genome data packages. R package version 1.24.0). This sequence was used to design the custom tiling arrays and to perform the bioinformatics. The chromosomal location of CNV and DMR clusters used an R-code developed to find chromosomal locations of clusters (M. K. Skinner et al. 2012). A 2 Mb sliding window with 50,000 base intervals was used to find the associated CNV and DMR in each window. A Z-test statistical analysis with p<0.05 was used on these windows to find the ones with over-represented CNV and DMR were merged together to form clusters. A typical cluster region averaged approximately 3 megabases in size. The DMR and CNV association with specific zebra finch genes and genome locations used the Gene NCBI database for zebra finch gene locations and correlated the epimutations associated (overlapped) with the genes. The three adjacent probes constituted approximately a 200bp homology search. The KEGG pathway associations were identified by using the KEGG website 'Search Pathway' tool (M. K. Skinner et al. 2012). Statistically significant over-representation uses a Fisher’s exact analysis. All DMR and CNV genomic data obtained in the current study have been deposited in the NCBI public GEO database. Spearman Rank correlation coefficients were used to test for a relationship between phylogenetic distance and epigenetic and genetic changes (Whitlock and Schluter 2009).
R-code data processing of raw data as described was used as the basis for conlusions in this study. Value definitions for processed data files (Regions_of_Loss&Gain_Finch_species): #Chromosome = Chromosome number, #cSTART = Cluster start site, #cSTOP = Cluster stop site, #RegionChanged = this is in Megabase, so the region (cSTOP-cSTART) divided by 1,000,000, when it is zero, it means single probe, #minP = p-value of that region showing how statistically significant that region is, P-value must be p<1.0E-5 or less for inclusion, #AverageLogRatio = average of logratio -since it has multiple arrays , #AverageSummary = average output of the three algorithm used to use for gain/loss threshold cut-off, #Gain.Loss = which regions were gain and which were losses, and #FORvs = compared finch species to reference species FOR. Other processed data files are included (arrayID#_segMNT.txt) but they were not used in our analysis.

Submission date

Jun 12, 2014

Last update date

Jun 12, 2014

Contact name

Michael K Skinner

E-mail(s)

skinner@mail.wsu.edu

Organization name

WSU

Department

SBS

Street address

Abelson 507

City

Pullman

State/province

ZIP/Postal code

99163

Country

USA

Platform ID

GPL18790

Series (1)

GSE58334

Epigenetics and the Evolution of Darwin’s Finches

Supplementary file	Size	Download	File type/resource
GSM1411197_63132602_532.pair.gz	11.3 Mb	(ftp)(http)	PAIR
GSM1411197_63132602_635.pair.gz	11.3 Mb	(ftp)(http)	PAIR
GSM1411197_63132602_segMNT.txt.gz	28.6 Mb	(ftp)(http)	TXT
Processed data are available on Series record
Processed data provided as supplementary file