GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE23316 Query DataSets for GSE23316
Status Public on Jul 30, 2010
Title ENCODE Caltech RNA-seq
Project ENCODE
Organism Homo sapiens
Experiment type Expression profiling by high throughput sequencing
Summary This data was produced by the Wold lab at Caltech as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing. The resulting sequence reads are then informatically mapped onto the genome sequence.
For data usage terms and conditions, please refer to and
Overall design The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-Seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-Seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The "randomly primed" reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts. These tracks show 1x32 n.t. or 2x75 n.t. sequence reads of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The 32 n.t. sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs. The 2x75 n.t. reads were mapped serially, first with the Bowtie program (Langmead et al., 2009) against the genome and UCSC known-gene splice junctions (Splice Sites). Bowtie-unmapped reads were then mapped using BLAT to find evidence of novel splicing, by requiring at least 10 bp on the short-side of the splice.
Web link
Contributor(s) Wold B, Myers R, Mortazavi A, Williams B, Trout D, King B, McCue K, Schaeffer L, Neff N, Pauli F, Zhang F, Reddy T, Rauch R, Schroth G, Luo S, Vermaas E
Citation missing Has this study been published? Please login to update or notify GEO.
BioProject PRJNA30709
Submission date Jul 29, 2010
Last update date May 15, 2019
Contact name ENCODE DCC
Organization name ENCODE DCC
Street address 300 Pasteur Dr
City Stanford
State/province CA
ZIP/Postal code 94305-5120
Country USA
Platforms (1)
GPL9115 Illumina Genome Analyzer II (Homo sapiens)
Samples (36)
GSM572172 Myers_H1-hESC_cell_2x75_400_1
GSM572173 Myers_H1-hESC_cell_1x75D_ilNA_1
GSM591652 Myers_H1-hESC_cell_2x75_200_2
SRA SRP003497

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE23316_RAW.tar 42.5 Gb (http)(custom) TAR (of BED12, BEDGRAPH, BOWTIE, FASTA, PSL, RPKM)
SRA Run SelectorHelp
Processed data provided as supplementary file
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap