|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jul 19, 2016 |
Title |
DarrowHuntley-2015-HIC014 |
Sample type |
SRA |
|
|
Source name |
Mouse Embryonic Kidney Fibroblasts
|
Organism |
Mus musculus |
Characteristics |
cell line: Patski cell type: Mouse Embryonic Kidney Fibroblasts protocol: in situ Hi-C
|
Growth protocol |
Cell lines were cultured according to manufacturer's instructions
|
Extracted molecule |
genomic DNA |
Extraction protocol |
Cells were crosslinked and then lysed with nuclei permeabilized but still intact. DNA was then restricted with MboI and the overhangs filled in incorporating a biotinylated base. Free ends were then ligated together in situ. Crosslinks were reversed, the DNA was sheared and then biotinylated ligation junctions were recovered with streptavidin beads. Standard Illumina library construction protocol was performed, and libraries were sequenced on the HiSeq X Ten/NextSeq/HiSeq2500 following the manufacturer's protocols.
|
|
|
Library strategy |
OTHER |
Library source |
genomic |
Library selection |
other |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
Processed data files were not available at the time of accessioning.
|
Data processing |
The paired end reads were aligned separately using BWA against the b37 (human), mm10 (mouse), or rheMac2 (rhesus macaque). PCR duplicates, low mapping quality and unligated reads were removed using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014l) Contact matrices were constructed at various resolutions and normalized using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014) genome build: b37 (human), mm10 (mouse), rheMac2 (rhesus macaque) processed data files format and content: Contact matrices: a text file with the raw observed contact matrix in sparse matrix notation at a given resolution. Only the upper triangle of the matrix is provided (i.e. i<=j), the matrix is symmetric, so M_i,j = M_j,i. At this stage of processing, read pairs where one or both ends do not align to the reference genome have already been removed, as well as chimeric ambiguous reads (see Section II.a.2 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a definition of chimeric ambiguous reads). In addition, duplicate reads (reads where both ends align to within +/- 4bp of each other) have been removed as well (see Section II.a.3 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a full description of duplicate removal). Full details of the Hi-C processing pipeline used in this study are provided in Section II.a. of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014. processed data files format and content: Normalization files: normalization vectors that can be used to transform the raw contact matrices M into normalized matrices M*. Each file is ordered such that the first line of the normalization vector file is the norm factor for the first row/column of the corresponding raw contact matrix, the second line is the factor for the second row/column of the contact matrix, and so on. To normalize, an entry M_i,j in a *RAWobserved file, divide the entry by the corresponding norm factors for i and j. (See section II.b of the Extended Experimental Procedures of Rao, Huntley, et al., Cell, 2014 for more information about the different types of normalizations.) processed data files format and content: HiCCUPS_looplist.txt files contain loop calls generated via HiCCUPS; first three fields represent the locus participating in the loop closer to the p-end of the chromosome; fields 4-6 represent the locus participating in the loop closer to the q-end of the chromosome; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 represents the observed number of counts at the loop; fields 9-12 represent the expected number of counts at the loop using four different expected models; fields 13-16 are the q-values over each of the expected values; field 17 is the number of enriched pixels that was clustered into a particular loop; field 18-19 are the centroid of the loop; field 20 is the radius of the loop processed data files format and content: Arrowhead_domainlist.txt files contain domain calls generated via Arrowhead; first 6 fields represent the boundaries of the domain; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 is the corner score for the domain (see Rao, Huntley, et al); fields 9-12 are the component scores used in the Arrowhead algorithm (see Rao, Huntley, et al) processed data files format and content: merged_nodups.txt files contain filtered, "normal" contacts. Each line represents a single Hi-C read pair that has passed the alignment and duplicate removal stages. The format of each line of the file is: read_name, strand1, chromosome1, position1, fragment-index1, strand2, chromosome2 ,position2, fragment-index2, mapq1, mapq2 processed data files format and content: collisions.txt.gz files contain the contacts that have 3 or more loci.
|
|
|
Submission date |
Aug 11, 2015 |
Last update date |
May 15, 2019 |
Contact name |
Miriam Huntley |
E-mail(s) |
mhuntley@fas.harvard.edu
|
Organization name |
Harvard University
|
Street address |
29 Oxford Street
|
City |
Cambridge |
State/province |
MA |
ZIP/Postal code |
02138 |
Country |
USA |
|
|
Platform ID |
GPL17021 |
Series (1) |
GSE71831 |
Deletion of DXZ4 on the human inactive X chromosome eliminates superdomains and impairs gene silencing |
|
Relations |
BioSample |
SAMN03979957 |
SRA |
SRX1165023 |
Supplementary file |
Size |
Download |
File type/resource |
GSM1847533_DarrowHuntley-2015-HIC014_merged_nodups.txt.gz |
66.3 Gb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|