GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2064719

Query DataSets for GSM2064719

Status

Public on Sep 01, 2016

Title

Animal11_Female_WholeBlood_Sequencing batch 2

Sample type

SRA

Source name

female_WholeBlood

Organism

Rattus norvegicus

Characteristics

strain: Sprague Dawley
animal id: Animal11
age: 12-13 weeks
gender: female
tissue: WholeBlood
rin: 7.6
mir_counts: 5016803

Growth protocol

Naïve Sprague Dawley rats ,12-13 weeks old, were used for organ isolation.

Extracted molecule

total RNA

Extraction protocol

RNA was purified using the miRNAeasy protocol from Qiagen (Kit cat# 217004).
Library construction was performed according to Illumina’s TruSeq small RNA sample prep protocol (Cataloge # RS-200-0048) using a HiSeq2000. Briefly, RNA was extracted from tissues using the miRNeasy protocol. Libraries were constructed using the Illumina TruSeq small RNA preparation guide which entails ligation of adapters on the 3' and 5' ends of the RNA molecules. The libraries were quantitated using RiboGreen and were size selected through gel electrophoresis and libraries consistent with the size of miRNAs were gel purified. Sequencing was performed with each sample receiving 4-5 million, 30bp, single-end reads. generated

Library strategy

miRNA-Seq

Library source

transcriptomic

Library selection

size fractionation

Instrument model

Illumina HiSeq 2000

Description

Sequencing batch 2 from sample Animal11_Female_WholeBlood
US-1459951_BC2BUMACXX_US-1459951_CTCAGA_L005_R1_001

Data processing

FastQ files were processed to remove adaptor sequence and discard reads < 17bp in length with cutadapt v. 1.4.1 [23] with options –a TGGAATTCTCGGGTGCCAAGG –quality-base 33 –q 20 –match-read-wildcards –m 17. All trimmed reads that contained an ‘N’ were discarded. Identical sequences from the same sample were combined into a single sequence in the form expected by miRDeep2
Quantifier.pl from the mirdeep2 package [v. 2.0.0.5 ] was used to generate miR alignment files (.mrd files) against known miRs from miRbase 20 with options –P –p <org>_hairpin.fa –m <org>_mature.fa –r trimmed_reads.fa –d. This was done separately for rat, mouse, human, and C. elegans known miRs.
The .mrd files were then parsed with a custom Perl script as follows. Each isomiR sequence in an alignment was associated with the corresponding mature miR identifier. If it aligned to a miRNA precursor, but not with an expected mature miR sequence, it was identified as <miR>-pre to indicate this. Usually these correspond with reverse strand miRs that have not yet been annotated as alternate mature forms. Sometimes these appear to be microRNA offset RNAs [25] and sometimes they appear to represent incompletely processed sequence. If a given sequence was identified as aligning with more than one precursor, it was associated with all potential names as a composite name. That is, a sequence that aligned to both let-7a-5p and let-7f-5p was assigned to the composite mature miR let-7a-5p;let-7f-5p to indicate for subsequent analyses that this identification was ambiguous. All sequences that had not yet been identified against known rat miRs were then looked for in the analysis with respect to known mouse miRs, and so on against human and finally C. elegans miRs. This process allows for identification of conserved rodent miRs that have not yet been annotated in rat, and against conserved mammalian miRs that have not yet been annotated in rat and mouse. The comparison with C. elegans was to identify spiked in C. elegans miRs (not used in this data set).
In parallel to the above identification of known miRs, we used miRDeep2’s novel miR identification process as well. We aligned all the unique trimmed fasta reads to the RN5 reference rat genome with miRDeep2’s mapper.pl script with default parameters and then used the miRDeep2.pl script to generate novel miR predictions and .mrd files. The .mrd files were parsed as above to assign miR names to each unique read. We then attempted to identify and merge novel predictions that were identical to known miRs as follows. If the same read sequence identified a miR in the novel analysis and in the known analysis and the known miR appeared in miRDeep’s .mrd file, we assumed that this was a duplicate identification and changed the miR name from that reported by miRDeep (chr# followed by a unique ID) to that of the known miR. Otherwise, we assumed that the novel prediction might be different from the known miR identification and added it to the list of potential identifications for the sequence. So, for example, a sequence which was aligned by miRDeep2 to a predicted miR on chrX and also identified as miR-450a is identified as “chrX_48156-5p;miR-450a-5p” because miRDeep2 does not show miR-450a as aligned to this predicted miR, indicating that these are two potentially distinctsources of this sequence. Finally, the counts associated with each unique sequence are summed for each named miR (including complex, multi-named miRs) for each sample and these are reported as the miR level counts.
Genome_build: Rn5 for novel miR identification, mirbase 20 for known miR identification
Supplementary_files_format_and_content: Tab delimited text files with raw count information at either the mature miR level or isomiR level. Both files also contain the source of the predictiion that the indicated counts go with the indicated miR or isomiR.

Submission date

Feb 17, 2016

Last update date

May 15, 2019

Contact name

Aaron Thomas Smith

E-mail(s)

smithat@lilly.com

Phone

3172774712

Organization name

Eli Lilly