GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM5620575

Query DataSets for GSM5620575

Status

Public on Oct 21, 2021

Title

4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0)

Sample type

SRA

Source name

Multiple biosources: F121-6-CASTx129, F123-CASTx129

Organism

Mus musculus

Characteristics

cell_line: F121-6-CASTx129, F123-CASTx129
mouse_strain: 129/Sv X Cast
Sex: pooled male and female
description: Day 0 F121-6 or F123 cells from an embryoid body differentiaton time course experiment C
tissue_source: stem cell
biosample_type: stem cells
modifications_summary: None
treatments_summary: None
url: https://data.4dnucleome.org/biosamples/4DNBSHAXZE5I/ https://data.4dnucleome.org/biosamples/4DNBSBOPWYV1/

Extracted molecule

genomic DNA

Extraction protocol

url: https://www.ncbi.nlm.nih.gov/pubmed/31536770
description: This is the updated protocol for single-cell combinatorial indexed Hi-C (sci-Hi-C) from Ramani et al. 2020.

Library strategy

Hi-C

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2500

Description

lab: Christine Disteche, UW
award: 1U54DK107979-01
4DN accession: 4DNEXK2CP3J1, 4DNEXU9FJ24U
description: 4DNEXK2CP3J1 - sci-Hi-C on a differentiation time course of F121-6 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F121_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs
4DNEXU9FJ24U - sci-Hi-C on a differentiation time course of F123 cells - this experiment is undifferentiated day 0 cells and to obtain reads specific for these cells the label F123_d0 from the barcode file(s) 4DNFI8WQR5EV should be used in parsing the associated FASTQs
submitted_by: Giancarlo Bonora
ligation_time: 240
digestion_time: 960
tagging_method: Biotin-dT
experiment_type: sci-Hi-C
ligation_volume: 0.3
digestion_enzyme: DpnII
contributing_labs: William Noble, UW, Zhijun Duan, UW, Jay Ashok Shendure, UW
crosslinking_time: 10
biosample_quantity: 10000000 cells
crosslinking_method: 3.5% Formaldehyde
fragment_size_range: 350-800
ligation_temperature: 22
average_fragment_size: 296
digestion_temperature: 37
crosslinking_temperature: 22
url: https://data.4dnucleome.org/experiments-hi-c/4DNEXK2CP3J1/ https://data.4dnucleome.org/experiments-hi-c/4DNEXU9FJ24U/
This record includes combined data from multiple BioSamples: SAMN21435439, SAMN21435432

Data processing

See the file "TREE.sci-Hi-C.txt" for an overview of the folders and files described herein.

Sci-Hi-C libraries were processed using a publicly available pipeline (Ramani V. et al. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14:263–6), which was adapted to process allelically segregated reads. Reads were aligned to an N-masked C57BL6/J reference genome (mm10) where every SNP locus (129 or cast for the F123, F121, and ES_Tsix-stop cell lines; B6 or spret for the Patski cell line) was substituted with an N to reduce mapping bias. The pipeline resulted in lists of contact pairs for all cells with at least 1,000 uniquely mapped contact pairs, a cis:trans ratio ≥1 and ≥95% of reads mapping to either the mouse or human genome (see manuscript Additional file 1: Table S1).

Contact matrices for the mouse genome were generated from the valid pairs by summing counts within bins of a specified size (e.g. 500kb). To obtain allelic information, each valid contact pair was segregated to its parental allele of origin if either end of the read pair contained at least one SNP particular to one of the parental strains. SNPs were based on Sanger data for mouse strain 129 and mouse species M. spretus and M. castaneus, compared to the mm10 B6 reference assembly. Read pairs containing no SNPs or containing SNPs belonging to both parental strains could not be unambiguously assigned to a single allele and were discarded.

Contact decay profiles (CDPs)
-----------------------------
Contact decay profiles (CDPs) for each cell were generated by summing the intrachromosomal contact counts with respect to contact distance within exponentially increasing contact distance ranges, as described in the manuscript (Additional file 1: Fig. S2A). Specifically, bin boundaries corresponded to the series of distances 2x to 2(x+1) for x within the range [10; 27.750] incrementing by a step size of 0.125. For each cell, counts within the 143 bins were scale normalized to account for coverage differences between cells.

Matrices of non-allelic CDPs were generated for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "non-allelic_contact_decay_profiles" with matrices of CDPs for both genome-wide CDPs (e.g. "EBdiff_F121_d0.logbinContactDecay.mm10.txt.gz" and autosome-only CDPs ("EBdiff_F121_d0.autosomes.logbinContactDecay.mm10.txt.gz).

Matrices of allelic CDPs were generated for each chromosome, as well as genome-wide, for each cell type and time point, with each column being a cell and each row a distance range. These matrices are contained within the folder "allelic_contact_decay_profiles" with subfolders for each chromosome (e.g. "allelic_contact_decay_profiles/allelicContactDecayData.chr1/EBdiff_F121_d0.alt.chr1.logbinContactDecay.mm10.txt.gz") and genome-wide allelic CDPs ("allelic_contact_decay_profiles/allelicContactDecayData.genome/EBdiff_F121_d0.alt.logbinContactDecay.mm10.txt.gz").

The allelic CDPs for each chromosome were concatenated together for allelic trajectory analysis and topic modeling. Only cells with a total contact count at least 50 along both chr1 and chrX were included. The resulting CDP count matrices are saved to the folder "allelic_catenated_contact_decay_profiles" with one file per chromosome (e.g. "allelicContactDecayRates.catenatedAllelicCDPs.F121.chr1.tsv.gz").

Cell cycle analysis

-------------------
For cell cycle analysis, the cells were grouped into four clusters by k-means clustering using the Spearman correlation distance between the non-allelic CDPs aggregated across all autosomes within the 50kb to 8Mb range for each cell. The four clusters correspond to cells in different stages of the cell cycle progressing from mitosis into different stages of interphase. The cell cycle cluster groups of each cell are stored in the folder "autosomal_non-allelic_cell_cycle_cluster_ids" with one file for each cell type and time point "20191105_sciHiC_allSamples_Nmasked.EBdiff_F121_d0.countThresh1000.autosomalCDPs.50kb-8Mb.prop.kclust.disteuclidean.kCenters4.tsv." The files simply contain cell IDs and cluster IDs, e.g.
"mES_EBdiff_D0_rep1-AACGGTCG.AATCAGAG": 3
"mES_EBdiff_D0_rep1-AACGGTCG.ACTCTACG": 2
"mES_EBdiff_D0_rep1-AACGGTCG.ACTGCTAC": 3
...

Long-range to mid-range difference (LMD)
----------------------------------------

To determine which chrX homolog had acquired a 3D structure characteristic of the Xi in cells such as F121 cells, where XCI is only partially skewed during differentiation, the difference between the total long-range (6.5Mb – 87Mb) and mid-range (85kb – 1.1Mb) contact counts was assessed for each homolog of each cell, referred to hereafter as the long-range to mid-range difference (LMD). The difference between the resulting allelic LMDs for each homolog was calculated for each chromosome of every cell with sufficient coverage. A large LMD difference between the rebinned CDPs of chrX homologs A and B, say, is indicative of the fact that homolog A has assumed the Xi 3D structure. A threshold value (𝛿Xi) was chosen based on a 10% FPR using the distribution of LMD differences between the rebinned allelic CDPs of the chr1 homologs, which would be expected to show little difference in their LMDs. Cells with an absolute LMD difference between the rebinned allelic CDPs of the chrX homologs greater than 𝛿Xi were deemed to have an chrX homolog that had assumed the Xi 3D structure and the homolog with the higher LMD was classified as the Xi. The Xi status of each cell of each cell type and time point is saved in "LMD_XCI_cells_countthresh50" folder. For example, "F121_d11.altXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive castaneus chromosome, "F121_d11.refXCIcells.tsv" contains the cell IDs of d11 F121 cells that have an inactive 129 chromosome, and "F121_d11.nonXCIcells.tsv" contains the cell IDs of d11 F121 cells where XCI status could be confidently assigned to either X chromosome.

***************
file name: 4DNFI8WQR5EV.txt
genome_assembly: GRCm38
description: Bar codes (B1) used for selection of specific reads in the following Disteche Lab sci-HiC experiments with experiment selection tag in parens: 4DNEXK2CP3J1 (F121_d0), 4DNEXU9FJ24U (F123_d0)
file_type: barcodes
file_format: txt

Submission date

Oct 10, 2021

Last update date

Oct 22, 2021

Contact name

4DN DCIC

E-mail(s)

support@4dnucleome.org

Organization name

4D Nucleome - Data Coordination and Integration Center

Street address

10 Shattuck St

City

Boston

State/province

ZIP/Postal code

02115

Country

USA

Platform ID

GPL17021

Series (2)

GSE184554	Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation
GSE185608	4DNESB7XYI9V - sci-Hi-C on mESCs differentiated to embryoid body

Relations

SRA

SRX12565941

BioSample

SAMN22486724

Supplementary file	Size	Download	File type/resource
GSM5620575_4DNFI8WQR5EV.txt.gz	480 b	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record