NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM5620575 Query DataSets for GSM5620575
Status Public on Oct 21, 2021
Title 4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0)
Sample type SRA
 
Source name Multiple biosources: F121-6-CASTx129, F123-CASTx129
Organism Mus musculus
Characteristics cell_line: F121-6-CASTx129, F123-CASTx129
mouse_strain: 129/Sv X Cast
Sex: pooled male and female
description: Day 0 F121-6 or F123 cells from an embryoid body differentiaton time course experiment C
tissue_source: stem cell
biosample_type: stem cells
modifications_summary: None
treatments_summary: None
url: https://data.4dnucleome.org/biosamples/4DNBSHAXZE5I/ https://data.4dnucleome.org/biosamples/4DNBSBOPWYV1/
Extracted molecule genomic DNA
Extraction protocol url: https://www.ncbi.nlm.nih.gov/pubmed/31536770
description: This is the updated protocol for single-cell combinatorial indexed Hi-C (sci-Hi-C) from Ramani et al. 2020.
 
Library strategy Hi-C
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2500
 
Description lab: Christine Disteche, UW
award: 1U54DK107979-01
4DN accession: 4DNEXK2CP3J1, 4DNEXU9FJ24U
description: 4DNEXK2CP3J1 - sci-Hi-C on a differentiation time course of F121-6 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F121_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs
4DNEXU9FJ24U - sci-Hi-C on a differentiation time course of F123 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F123_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs
submitted_by: Giancarlo Bonora
ligation_time: 240
digestion_time: 960
tagging_method: Biotin-dT
experiment_type: sci-Hi-C
ligation_volume: 0.3
digestion_enzyme: DpnII
contributing_labs: William Noble, UW, Zhijun Duan, UW, Jay Ashok Shendure, UW
crosslinking_time: 10
biosample_quantity: 10000000 cells
crosslinking_method: 3.5% Formaldehyde
fragment_size_range: 350-800
ligation_temperature: 22
average_fragment_size: 296
digestion_temperature: 37
crosslinking_temperature: 22
url: https://data.4dnucleome.org/experiments-hi-c/4DNEXK2CP3J1/ https://data.4dnucleome.org/experiments-hi-c/4DNEXU9FJ24U/
This record includes combined data from multiple BioSamples: SAMN21435439, SAMN21435432
Data processing See the file "TREE.sci-Hi-C.txt" for an overview of the folders and files described herein.

Sci-Hi-C libraries were processed using a publicly available pipeline (Ramani V. et al. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14:263–6), which was adapted to process allelically segregated reads. Reads were aligned to an N-masked C57BL6/J reference genome (mm10) where every SNP locus (129 or cast for the F123, F121, and ES_Tsix-stop cell lines; B6 or spret for the Patski cell line) was substituted with an N to reduce mapping bias. The pipeline resulted in lists of contact pairs for all cells with at least 1,000 uniquely mapped contact pairs, a cis:trans ratio ≥1 and ≥95% of reads mapping to either the mouse or human genome (see manuscript Additional file 1: Table S1).

Contact matrices for the mouse genome were generated from the valid pairs by summing counts within bins of a specified size (e.g. 500kb). To obtain allelic information, each valid contact pair was segregated to its parental allele of origin if either end of the read pair contained at least one SNP particular to one of the parental strains. SNPs were based on Sanger data for mouse strain 129 and mouse species M. spretus and M. castaneus, compared to the mm10 B6 reference assembly. Read pairs containing no SNPs or containing SNPs belonging to both parental strains could not be unambiguously assigned to a single allele and were discarded.


Contact decay profiles (CDPs)
-----------------------------
Contact decay profiles (CDPs) for each cell were generated by summing the intrachromosomal contact counts with respect to contact distance within exponentially increasing contact distance ranges, as described in the manuscript (Additional file 1: Fig. S2A). Specifically, bin boundaries corresponded to the series of distances 2x to 2(x+1) for x within the range [10; 27.750] incrementing by a step size of 0.125. For each cell, counts within the 143 bins were scale normalized to account for coverage differences between cells.

Matrices of non-allelic CDPs were generated for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "non-allelic_contact_decay_profiles" with matrices of CDPs for both genome-wide CDPs (e.g. "EBdiff_F121_d0.logbinContactDecay.mm10.txt.gz" and autosome-only CDPs ("EBdiff_F121_d0.autosomes.logbinContactDecay.mm10.txt.gz).

Matrices of allelic CDPs were generated for each chromosome, as well as genome-wide, for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "allelic_contact_decay_profiles" with subfolders for each chromosome (e.g. "allelic_contact_decay_profiles/allelicContactDecayData.chr1/EBdiff_F121_d0.alt.chr1.logbinContactDecay.mm10.txt.gz") and genome-wide allelic CDPs ("allelic_contact_decay_profiles/allelicContactDecayData.genome/EBdiff_F121_d0.alt.logbinContactDecay.mm10.txt.gz").


The allelic CDPs for each chromosome were concatenated together for allelic trajectory analysis and topic modeling. Only cells with a total contact count at least 50 along both chr1 and chrX were included. The resulting CDP count matrices are saved to the folder "allelic_catenated_contact_decay_profiles" with one file per chromosome (e.g. "allelicContactDecayRates.catenatedAllelicCDPs.F121.chr1.tsv.gz").


Cell cycle analysis

-------------------
For cell cycle analysis, the cells were grouped into four clusters by k-means clustering using the Spearman correlation distance between the non-allelic CDPs aggregated across all autosomes within the 50kb to 8Mb range for each cell. The four clusters correspond to cells in different stages of the cell cycle progressing from mitosis into different stages of interphase. The cell cycle cluster groups of each cell are stored in the folder "autosomal_non-allelic_cell_cycle_cluster_ids" with one file for each cell type and time point "20191105_sciHiC_allSamples_Nmasked.EBdiff_F121_d0.countThresh1000.autosomalCDPs.50kb-8Mb.prop.kclust.disteuclidean.kCenters4.tsv." The files simply contain cell IDs and cluster IDs, e.g.
"mES_EBdiff_D0_rep1-AACGGTCG.AATCAGAG": 3
"mES_EBdiff_D0_rep1-AACGGTCG.ACTCTACG": 2
"mES_EBdiff_D0_rep1-AACGGTCG.ACTGCTAC": 3
...




Long-range to mid-range difference (LMD)
----------------------------------------

To determine which chrX homolog had acquired a 3D structure characteristic of the Xi in cells such as F121 cells, where XCI is only partially skewed during differentiation, the difference between the total long-range (6.5Mb – 87Mb) and mid-range (85kb – 1.1Mb) contact counts was assessed for each homolog of each cell, referred to hereafter as the long-range to mid-range difference (LMD). The difference between the resulting allelic LMDs for each homolog was calculated for each chromosome of every cell with sufficient coverage. A large LMD difference between the rebinned CDPs of chrX homologs A and B, say, is indicative of the fact that homolog A has assumed the Xi 3D structure. A threshold value (𝛿Xi) was chosen based on a 10% FPR using the distribution of LMD differences between the rebinned allelic CDPs of the chr1 homologs, which would be expected to show little difference in their LMDs. Cells with an absolute LMD difference between the rebinned allelic CDPs of the chrX homologs greater than 𝛿Xi were deemed to have an chrX homolog that had assumed the Xi 3D structure and the homolog with the higher LMD was classified as the Xi. The Xi status of each cell of each cell type and time point is saved in "LMD_XCI_cells_countthresh50" folder. For example, "F121_d11.altXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive castaneus chromosome, "F121_d11.refXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive 129 chromosome, and "F121_d11.nonXCIcells.tsv" contains the cell IDs of d11 F121 cells where XCI status could be confidently assigned to either X chromosome.

***************
file name: 4DNFI8WQR5EV.txt
genome_assembly: GRCm38
description: Bar codes (B1) used for selection of specific reads in the following Disteche Lab sci-HiC experiments with experiment selection tag in parens: 4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0)
file_type: barcodes
file_format: txt
 
Submission date Oct 10, 2021
Last update date Oct 22, 2021
Contact name 4DN DCIC
E-mail(s) support@4dnucleome.org
Organization name 4D Nucleome - Data Coordination and Integration Center
Street address 10 Shattuck St
City Boston
State/province MA
ZIP/Postal code 02115
Country USA
 
Platform ID GPL17021
Series (2)
GSE184554 Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation
GSE185608 4DNESB7XYI9V - sci-Hi-C on mESCs differentiated to embryoid body
Relations
SRA SRX12565941
BioSample SAMN22486724

Supplementary file Size Download File type/resource
GSM5620575_4DNFI8WQR5EV.txt.gz 480 b (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap