|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Apr 30, 2012 |
Title |
Transcription Factor Binding Sites by ChIP-seq from ENCODE/PSU |
Project |
Mouse ENCODE
|
Organism |
Mus musculus |
Experiment type |
Genome binding/occupancy profiling by high throughput sequencing
|
Summary |
This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).
Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Maps of occupancy of genomic DNA by transcription factors (TFs) are determined by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for TF binding are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.
For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
|
|
|
Overall design |
Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). The chromatin immunoprecipitation followed published methods (Welch et al. 2004). Information on antibodies used is available via the hyperlinks in the "Select subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion for positive and negative controls - if available - by quantitative PCR of ChIP material) are processed for library construction for Illumina sequencing, using the ChIP-seq Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II sequencing system, and more recently on the HiSeq. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are being determined as biological replicates except time course samples. The data displayed are from the pooled reads for all replicates, but individual replicates are available by download. The resulting 36-nucleotide sequence reads (fastq files) were moved to a data library in Galaxy, and the tools implemented in Galaxy were used for further processing via workflows (Blankenberg et al. 2010). The reads were mapped to the mouse genome (mm9 assembly) using the program bowtie (Langmead et al. 2009), and the files of mapped reads for the ChIP sample and from the "input" control (no antibody) were processed by MACs (Zhang et al. 2008) to call peaks for occupancy by transcription factors, using the parameters mfold=15, bandwidth=125. Per-replicate aligments and sequences are available for download at downloads page (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodePsuTfbs/).
|
Web link |
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=wgEncodePsuTfbs
|
|
|
Contributor(s) |
Hardison R, Weiss MJ, Blobel GA, Schuster S, Taylor J, Nekrutenko A |
Citation missing |
Has this study been published? Please login to update or notify GEO. |
BioProject |
PRJNA63475 |
|
Submission date |
Feb 23, 2012 |
Last update date |
Oct 13, 2020 |
Contact name |
ENCODE DCC |
E-mail(s) |
encode-help@lists.stanford.edu
|
Organization name |
ENCODE DCC
|
Street address |
300 Pasteur Dr
|
City |
Stanford |
State/province |
CA |
ZIP/Postal code |
94305-5120 |
Country |
USA |
|
|
Platforms (2) |
GPL9250 |
Illumina Genome Analyzer II (Mus musculus) |
GPL13112 |
Illumina HiSeq 2000 (Mus musculus) |
|
Samples (38)
|
GSM923570 |
PSU_ChipSeq_G1E_CTCF |
GSM923571 |
PSU_ChipSeq_G1E-ER4_CTCF |
GSM923572 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265) |
GSM923573 |
PSU_ChipSeq_MEL_CTCF |
GSM923574 |
PSU_ChipSeq_MEL_Input |
GSM923575 |
PSU_ChipSeq_Erythrobl_GATA1_(SC-265) |
GSM923576 |
PSU_ChipSeq_G1E-ER4_TAL1_(SC-12984) |
GSM923577 |
PSU_ChipSeq_MEL_Pol2-4H8 |
GSM923578 |
PSU_ChipSeq_MEL_TAL1_(SC-12984) |
GSM923579 |
PSU_ChipSeq_G1E_TAL1_(SC-12984) |
GSM923580 |
PSU_ChipSeq_G1E_Input |
GSM923581 |
PSU_ChipSeq_G1E_GATA1_(SC-265) |
GSM923582 |
PSU_ChipSeq_Erythrobl_TAL1_(SC-12984) |
GSM923583 |
PSU_ChipSeq_Megakaryo_Input |
GSM923584 |
PSU_ChipSeq_CH12_PAX5_(N-15) |
GSM923585 |
PSU_ChipSeq_Erythrobl_Input |
GSM923586 |
PSU_ChipSeq_Megakaryo_GATA1_(SC-265) |
GSM923587 |
PSU_ChipSeq_G1E_GATA2_(SC-9008) |
GSM923588 |
PSU_ChipSeq_G1E-ER4_GATA2_(SC-9008) |
GSM923589 |
PSU_ChipSeq_G1E_Pol2-4H8 |
GSM923590 |
PSU_ChipSeq_G1E-ER4_Pol2-4H8 |
GSM995436 |
PSU_ChipSeq_G1E-ER4_Input_diffProtD_7hr_Timecourse |
GSM995437 |
PSU_ChipSeq_G1E-ER4_Input_diffProtD_3hr_Timecourse |
GSM995438 |
PSU_ChipSeq_G1E-ER4_Input_diffProtD_30hr_Timecourse |
GSM995439 |
PSU_ChipSeq_G1E-ER4_Input_diffProtD_24hr_Timecourse |
GSM995440 |
PSU_ChipSeq_G1E-ER4_Input_diffProtD_14hr_Timecourse |
GSM995441 |
PSU_ChipSeq_G1E-ER4_Input_Timecourse |
GSM995442 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_diffProtD_7hr_Timecourse |
GSM995443 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_diffProtD_3hr_Timecourse |
GSM995444 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_diffProtD_14hr_Timecourse |
GSM995445 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_Timecourse |
GSM995446 |
PSU_ChipSeq_Megakaryo_FLI1_(sc-356) |
GSM995447 |
PSU_ChipSeq_Megakaryo_TAL1_(SC-12984) |
GSM995448 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_diffProtD_30hr_Timecourse |
GSM995449 |
PSU_ChipSeq_G1E-ER4_GATA1_(SC-265)_diffProtD_24hr_Timecourse |
|
Relations |
SRA |
SRP012562 |
Supplementary file |
Size |
Download |
File type/resource |
GSE36029_RAW.tar |
338.5 Gb |
(http)(custom) |
TAR (of BIGWIG, BROADPEAK) |
GSE36029_run_info.txt.gz |
1.7 Kb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
|