GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM883635 Query DataSets for GSM883635
Status Public on Aug 29, 2012
Title K562_human
Sample type SRA
Source name Human K562 cell line
Organism Homo sapiens
Characteristics cell line: K562
Treatment protocol None.
Growth protocol Cells were grown in RPMI medium with 10% FBS in a 37ºC incubator with 5% CO2. See ATCC website for details.
Extracted molecule total RNA
Extraction protocol Total RNA was extracted with Trizol (Invitrogen) according to the manufacturer's instructions and treated with DNaseI (Roche). We checked on a Bioanalyzer (Agilent) that the RNA was of very high quality. 5’-monophosphate species -- mainly ribosomes -- were depleted by TEX digest (Epicentre). Our 5’-complete cDNA synthesis and selection strategy relies on the combination of two orthogonal enrichment methods: reverse-transcriptase template-switching, and cap-trapping. The template-switching approach is based on the ability of reverse-transcriptase to add linker sequences to the ends of 5’-complete cDNAs -- preferentially if they are made from capped transcripts. Cap-trapping relies on the biotinylation of capped RNA molecules and specific pulldown of their associated 5’-complete cDNAs. The library was run on a DNA HS Bioanalyzer chip for quality control, quantified by quantitative PCR, and sequenced on one lane on an Illumina GAIIx (2x76bp). Please see Supplemetary Material of the original publication for a detailed protocol.
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina Genome Analyzer IIx
Description RNA (5'-complete cDNAs selected by TEX digest, template-switching and cap-trapping).
Data processing Sequencing reads alignment: The sequences corresponding to the library identification barcode (first 6 bases of read 1) and the reverse-transcription primer (first 15 bases of read 2) were trimmed prior to mapping. Trimmed reads were mapped with STAR. All uniquely mapping reads were kept. As a rescue strategy for multiply mapping reads, if all alignments for those reads started within an annotated transposon and overlapped the same gene annotation, the alignment starting in the closest transposon insertion was selected. All non-rescued multi-mappers were discarded.

Data analysis pipeline: PCR duplicates, defined as reads sharing the same alignment coordinates (start, end and splice sites), were removed from the individual datasets. To avoid over-collapsing, we took advantage of the fact that the long random sequence (15-mer) of our reverse-transcription primer often primes with mismatches. We used this sequence as a pseudo-random barcode, allowing us to distinguish between true duplicates (same barcode) and independent identical inserts. All collapsed datasets were then combined prior to peak calling. The density of cDNA 5’ ends across the genome was determined from this combined dataset, as well as the density of coverage by second (i.e., downstream) sequencing reads. Peaks were called by a sliding window algorithm that assesses the significance of local signal enrichment given a null distribution. Downstream read coverage in the same window was used to correct for local transcript abundance, by subtracting from the raw signal a pseudocount proportional to this coverage. After FDR correction, significant windows in close proximity to each other were merged into peaks, and those were trimmed at the edges down to the first base with signal. These peaks were connected to annotated genes based on cDNA structure information. For each peak, if we could find at least 2 inserts having their 5' in the peak and overlapping an annotated exon of a gene, the peak was functionally linked to that gene. If a peak could potentially be linked to several genes, ties were broken by removing all links that were 5-fold weaker than the strongest one. For quantification, the signal for each peak and each timepoint was derived from the uncollapsed datasets, and normalized to dataset size (defined as the total number of reads attributed to any genic TSS). We built partial transcript models by running Cufflinks separately on the set of reads coming from each peak for each given dataset, and collapsing all transcripts for each peak using Cuffmerge. For a more detailed description of the analysis pipeline, please refer to the original publication. bigWig coverage by cDNA 5' ends (+ strand). bigWig coverage by cDNA 5' ends (- strand).
K562_All_RAMPAGE_peaks.bed: BED file describing all RAMPAGE peaks.
K562_GeneTSS_RAMPAGE_peaks.bed: BED file describing all peaks attributed to annotated genes.

Genome Build: hg19 hg19
K562_All_RAMPAGE_peaks.bed: hg19
K562_GeneTSS_RAMPAGE_peaks.bed: hg19
Submission date Mar 01, 2012
Last update date May 15, 2019
Contact name Philippe Batut
Phone 516-422-4122
Organization name CSHL
Lab Gingeras
Street address 500 Sunnyside Blvd.
City Woodbury
State/province NY
ZIP/Postal code 11797
Country USA
Platform ID GPL10999
Series (2)
GSE36200 RAMPAGE dataset for the human K562 cell line
GSE36213 Profiling of transcription start site expression in Drosophila and the human K562 cell line using RAMPAGE
SRA SRX124674
BioSample SAMN00794423

Supplementary file Size Download File type/resource
GSM883635_K562_All_RAMPAGE_peaks.bed.gz 328.4 Kb (ftp)(http) BED
GSM883635_K562_GeneTSS_RAMPAGE_peaks.bed.gz 251.3 Kb (ftp)(http) BED 1.7 Mb (ftp)(http) BW 1.7 Mb (ftp)(http) BW
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap