|
Status |
Public on Nov 18, 2020 |
Title |
High dimensional association detection in large-scale genomic data |
Platform organism |
Mus musculus |
Sample organisms |
Homo sapiens; Mus musculus |
Experiment type |
Genome binding/occupancy profiling by high throughput sequencing Third-party reanalysis
|
Summary |
Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. Our approach improves upon existing methods by boosting statistical power to identify meaningful signals while retaining interpretability and computational tractability. We illustrate CLIMB's value on two sets of hematopoietic data: one studying CTCF ChIP-seq measured in 17 different cell populations, and another examining RNA-seq measured across constituent cell populations in three committed lineages. These analyses demonstrate that CLIMB captures biologically relevant clusters in the data and improves upon commonly-used pairwise comparisons and unsupervised clusterings typical of genomic analyses.
|
|
|
Overall design |
CTCF ChIP-seq from 17 cell populations were jointly analyzed; obtained clusters of genomic loci were compared against ATAC-seq, H3K4me1 ChIP- and H3K4me3 ChIP-seq at matching loci. RNA-seq reads were compared across differentiated states in the erythroid, myeloid, and megakaryocytic cell lineages. DNase-seq of 38 biosamples across all autosomes were jointly reanalyzed using CLIMB and non-negative matrix factorization (see GSE156074_DNase-seq_README.txt for details).
|
|
|
Contributor(s) |
Koch H, Keller C, Giardine B, Xiang G, Zhang F, Ross H, Li Q |
Citation(s) |
36371401 |
|
Submission date |
Aug 12, 2020 |
Last update date |
Nov 30, 2022 |
Contact name |
Ross Hardison |
E-mail(s) |
rch8@psu.edu
|
Organization name |
Pennsylvania State University
|
Street address |
303 Wartik Lab
|
City |
University Park |
State/province |
PA |
ZIP/Postal code |
16802 |
Country |
USA |
|
|
Platforms (1) |
GPL19057 |
Illumina NextSeq 500 (Mus musculus) |
|
Samples (8)
|
|
Relations |
Reanalysis of |
GSE100974 |
Reanalysis of |
GSE100975 |
Reanalysis of |
GSE101018 |
Reanalysis of |
GSE101021 |
Reanalysis of |
GSE101035 |
Reanalysis of |
GSE101037 |
Reanalysis of |
GSE101052 |
Reanalysis of |
GSE101329 |
Reanalysis of |
GSE143271 |
Reanalysis of |
GSM1023418 |
Reanalysis of |
GSM1151145 |
Reanalysis of |
GSM1167572 |
Reanalysis of |
GSM1441285 |
Reanalysis of |
GSM1441286 |
Reanalysis of |
GSM1441287 |
Reanalysis of |
GSM1441289 |
Reanalysis of |
GSM1441297 |
Reanalysis of |
GSM1441321 |
Reanalysis of |
GSM1441329 |
Reanalysis of |
GSM1480825 |
Reanalysis of |
GSM1972997 |
Reanalysis of |
GSM2423428 |
Reanalysis of |
GSM2423429 |
Reanalysis of |
GSM2445202 |
Reanalysis of |
GSM722394 |
Reanalysis of |
GSM918726 |
Reanalysis of |
GSM923568 |
Reanalysis of |
GSM923571 |
Reanalysis of |
GSM923573 |
Reanalysis of |
GSM946524 |
Reanalysis of |
GSM946536 |
BioProject |
PRJNA656702 |
SRA |
SRP277149 |
Supplementary file |
Size |
Download |
File type/resource |
GSE156074_CTCF_classes.bed.gz |
87.4 Kb |
(ftp)(http) |
BED |
GSE156074_DNase-seq_README.txt |
1.4 Kb |
(ftp)(http) |
TXT |
GSE156074_DNase_classes.bed.gz |
3.4 Mb |
(ftp)(http) |
BED |
GSE156074_RAW.tar |
557.4 Mb |
(http)(custom) |
TAR (of BW) |
GSE156074_climbCtcfSigCh12.bw |
72.2 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigEr4.bw |
70.8 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigEry.bw |
71.6 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigEryfl.bw |
71.0 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigHpc7.bw |
66.6 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigMel.bw |
71.9 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigMono.bw |
68.3 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigNeu.bw |
69.9 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigTcd4.bw |
70.1 Mb |
(ftp)(http) |
BW |
GSE156074_climbCtcfSigTcd8.bw |
66.1 Mb |
(ftp)(http) |
BW |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |