|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Oct 21, 2021 |
Title |
4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0) |
Sample type |
SRA |
|
|
Source name |
Multiple biosources: F121-6-CASTx129, F123-CASTx129
|
Organism |
Mus musculus |
Characteristics |
cell_line: F121-6-CASTx129, F123-CASTx129 mouse_strain: 129/Sv X Cast Sex: pooled male and female description: Day 0 F121-6 or F123 cells from an embryoid body differentiaton time course experiment C tissue_source: stem cell biosample_type: stem cells modifications_summary: None treatments_summary: None url: https://data.4dnucleome.org/biosamples/4DNBSHAXZE5I/ https://data.4dnucleome.org/biosamples/4DNBSBOPWYV1/
|
Extracted molecule |
genomic DNA |
Extraction protocol |
url: https://www.ncbi.nlm.nih.gov/pubmed/31536770 description: This is the updated protocol for single-cell combinatorial indexed Hi-C (sci-Hi-C) from Ramani et al. 2020.
|
|
|
Library strategy |
Hi-C |
Library source |
genomic |
Library selection |
other |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
lab: Christine Disteche, UW award: 1U54DK107979-01 4DN accession: 4DNEXK2CP3J1, 4DNEXU9FJ24U description: 4DNEXK2CP3J1 - sci-Hi-C on a differentiation time course of F121-6 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F121_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs 4DNEXU9FJ24U - sci-Hi-C on a differentiation time course of F123 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F123_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs submitted_by: Giancarlo Bonora ligation_time: 240 digestion_time: 960 tagging_method: Biotin-dT experiment_type: sci-Hi-C ligation_volume: 0.3 digestion_enzyme: DpnII contributing_labs: William Noble, UW, Zhijun Duan, UW, Jay Ashok Shendure, UW crosslinking_time: 10 biosample_quantity: 10000000 cells crosslinking_method: 3.5% Formaldehyde fragment_size_range: 350-800 ligation_temperature: 22 average_fragment_size: 296 digestion_temperature: 37 crosslinking_temperature: 22 url: https://data.4dnucleome.org/experiments-hi-c/4DNEXK2CP3J1/ https://data.4dnucleome.org/experiments-hi-c/4DNEXU9FJ24U/ This record includes combined data from multiple BioSamples: SAMN21435439, SAMN21435432
|
Data processing |
See the file "TREE.sci-Hi-C.txt" for an overview of the folders and files described herein.
Sci-Hi-C libraries were processed using a publicly available pipeline (Ramani V. et al. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14:263–6), which was adapted to process allelically segregated reads. Reads were aligned to an N-masked C57BL6/J reference genome (mm10) where every SNP locus (129 or cast for the F123, F121, and ES_Tsix-stop cell lines; B6 or spret for the Patski cell line) was substituted with an N to reduce mapping bias. The pipeline resulted in lists of contact pairs for all cells with at least 1,000 uniquely mapped contact pairs, a cis:trans ratio ≥1 and ≥95% of reads mapping to either the mouse or human genome (see manuscript Additional file 1: Table S1).
Contact matrices for the mouse genome were generated from the valid pairs by summing counts within bins of a specified size (e.g. 500kb). To obtain allelic information, each valid contact pair was segregated to its parental allele of origin if either end of the read pair contained at least one SNP particular to one of the parental strains. SNPs were based on Sanger data for mouse strain 129 and mouse species M. spretus and M. castaneus, compared to the mm10 B6 reference assembly. Read pairs containing no SNPs or containing SNPs belonging to both parental strains could not be unambiguously assigned to a single allele and were discarded.
Contact decay profiles (CDPs) ----------------------------- Contact decay profiles (CDPs) for each cell were generated by summing the intrachromosomal contact counts with respect to contact distance within exponentially increasing contact distance ranges, as described in the manuscript (Additional file 1: Fig. S2A). Specifically, bin boundaries corresponded to the series of distances 2x to 2(x+1) for x within the range [10; 27.750] incrementing by a step size of 0.125. For each cell, counts within the 143 bins were scale normalized to account for coverage differences between cells.
Matrices of non-allelic CDPs were generated for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "non-allelic_contact_decay_profiles" with matrices of CDPs for both genome-wide CDPs (e.g. "EBdiff_F121_d0.logbinContactDecay.mm10.txt.gz" and autosome-only CDPs ("EBdiff_F121_d0.autosomes.logbinContactDecay.mm10.txt.gz).
Matrices of allelic CDPs were generated for each chromosome, as well as genome-wide, for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "allelic_contact_decay_profiles" with subfolders for each chromosome (e.g. "allelic_contact_decay_profiles/allelicContactDecayData.chr1/EBdiff_F121_d0.alt.chr1.logbinContactDecay.mm10.txt.gz") and genome-wide allelic CDPs ("allelic_contact_decay_profiles/allelicContactDecayData.genome/EBdiff_F121_d0.alt.logbinContactDecay.mm10.txt.gz").
The allelic CDPs for each chromosome were concatenated together for allelic trajectory analysis and topic modeling. Only cells with a total contact count at least 50 along both chr1 and chrX were included. The resulting CDP count matrices are saved to the folder "allelic_catenated_contact_decay_profiles" with one file per chromosome (e.g. "allelicContactDecayRates.catenatedAllelicCDPs.F121.chr1.tsv.gz").
Cell cycle analysis
------------------- For cell cycle analysis, the cells were grouped into four clusters by k-means clustering using the Spearman correlation distance between the non-allelic CDPs aggregated across all autosomes within the 50kb to 8Mb range for each cell. The four clusters correspond to cells in different stages of the cell cycle progressing from mitosis into different stages of interphase. The cell cycle cluster groups of each cell are stored in the folder "autosomal_non-allelic_cell_cycle_cluster_ids" with one file for each cell type and time point "20191105_sciHiC_allSamples_Nmasked.EBdiff_F121_d0.countThresh1000.autosomalCDPs.50kb-8Mb.prop.kclust.disteuclidean.kCenters4.tsv." The files simply contain cell IDs and cluster IDs, e.g. "mES_EBdiff_D0_rep1-AACGGTCG.AATCAGAG": 3 "mES_EBdiff_D0_rep1-AACGGTCG.ACTCTACG": 2 "mES_EBdiff_D0_rep1-AACGGTCG.ACTGCTAC": 3 ...
Long-range to mid-range difference (LMD) ----------------------------------------
To determine which chrX homolog had acquired a 3D structure characteristic of the Xi in cells such as F121 cells, where XCI is only partially skewed during differentiation, the difference between the total long-range (6.5Mb – 87Mb) and mid-range (85kb – 1.1Mb) contact counts was assessed for each homolog of each cell, referred to hereafter as the long-range to mid-range difference (LMD). The difference between the resulting allelic LMDs for each homolog was calculated for each chromosome of every cell with sufficient coverage. A large LMD difference between the rebinned CDPs of chrX homologs A and B, say, is indicative of the fact that homolog A has assumed the Xi 3D structure. A threshold value (𝛿Xi) was chosen based on a 10% FPR using the distribution of LMD differences between the rebinned allelic CDPs of the chr1 homologs, which would be expected to show little difference in their LMDs. Cells with an absolute LMD difference between the rebinned allelic CDPs of the chrX homologs greater than 𝛿Xi were deemed to have an chrX homolog that had assumed the Xi 3D structure and the homolog with the higher LMD was classified as the Xi. The Xi status of each cell of each cell type and time point is saved in "LMD_XCI_cells_countthresh50" folder. For example, "F121_d11.altXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive castaneus chromosome, "F121_d11.refXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive 129 chromosome, and "F121_d11.nonXCIcells.tsv" contains the cell IDs of d11 F121 cells where XCI status could be confidently assigned to either X chromosome.
*************** file name: 4DNFI8WQR5EV.txt genome_assembly: GRCm38 description: Bar codes (B1) used for selection of specific reads in the following Disteche Lab sci-HiC experiments with experiment selection tag in parens: 4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0) file_type: barcodes file_format: txt
|
|
|
Submission date |
Oct 10, 2021 |
Last update date |
Oct 22, 2021 |
Contact name |
4DN DCIC |
E-mail(s) |
support@4dnucleome.org
|
Organization name |
4D Nucleome - Data Coordination and Integration Center
|
Street address |
10 Shattuck St
|
City |
Boston |
State/province |
MA |
ZIP/Postal code |
02115 |
Country |
USA |
|
|
Platform ID |
GPL17021 |
Series (2) |
GSE184554 |
Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation |
GSE185608 |
4DNESB7XYI9V - sci-Hi-C on mESCs differentiated to embryoid body |
|
Relations |
SRA |
SRX12565941 |
BioSample |
SAMN22486724 |
Supplementary file |
Size |
Download |
File type/resource |
GSM5620575_4DNFI8WQR5EV.txt.gz |
480 b |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|