|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Feb 13, 2020 |
Title |
XY_e7GW_p1_EH975 |
Sample type |
SRA |
|
|
Source name |
Testis
|
Organism |
Homo sapiens |
Characteristics |
sexe: Male tissue: Testis age: 7GW+1d
|
Extracted molecule |
total RNA |
Extraction protocol |
Total RNA were extracted from tissues using the RNeasy mini Kit (Qiagen), quantified using a NanoDrop™ 8000 spectrophotometer (Thermo Scientific) and quality controlled using a 2100 Electrophoresis Bioanalyzer (Agilent). Libraries of template molecules suitable for strand specific high throughput DNA sequencing were created using “TruSeq Stranded Total RNA with Ribo-Zero Gold Prep Kit” (catalog # RS-122-2301, Illumina Inc.). Briefly, the removal of cytoplasmic and mitochondrial ribosomal RNA (rRNA) was performed from 500 ng of total RNA using biotinylated, target-specific oligos combined with Ribo-Zero rRNA removal beads. Following purification, the RNA was fragmented using divalent cations under elevated temperature. The cleaved RNA fragments were copied into first strand cDNA using reverse transcriptase and random primers, followed by second strand cDNA synthesis using DNA Polymerase I and RNase H. The double stranded cDNA fragments were blunted using T4 DNA polymerase, Klenow DNA polymerase and T4 PNK. A single ‘A’ nucleotide was added to the 3’ ends of the blunt DNA fragments using a Klenow fragment (3' to 5'exo minus) enzyme. The cDNA fragments were ligated to double stranded adapters using T4 DNA Ligase. The ligated products were enriched by PCR amplification (30 sec at 98°C; [10 sec at 98°C, 30 sec at 60°C, 30 sec at 72°C] x 12 cycles; 5 min at 72°C). Excess of PCR primers was removed by purification using AMPure XP beads (Agencourt Biosciences Corporation). Final cDNA libraries were quality-checked and quantified using a 2100 Electrophoresis Bioanalyzer (Agilent). The libraries were loaded in the flow cell at 7pM concentration and clusters were generated in the Cbot and sequenced in the Illumina Hiseq 2500 as paired-end 2x50 base reads following Illumina's instructions. Image analysis and base calling were performed using RTA 1.17.20 and CASAVA 1.8.2.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
Whole organ
|
Data processing |
Base calling. Image analysis and base calling were performed using RTA 1.17.20 and CASAVA 1.8.2. Assembly of a unique set of human reference transcripts. Ensembl (Cunningham, et al., 2015) and RefSeq (Brown, et al., 2015; Pruitt, et al., 2014) transcript annotations of the hg19 release of the human genome were downloaded from the University of California Santa Cruz (UCSC) genome browser website (Rosenbloom, et al., 2015) in June 30th, 2015. Both transcript annotation files (GTF format) were subsequently merged into a combined set of non-redundant human reference transcripts (HRT) using Cuffcompare (Pollier, et al., 2013). We also defined a non-redundant dataset of human splice junctions (HSJ) extracted from alignments of human transcripts and expressed sequence tags (ESTs) provided by UCSC. Mapping reads. RNA-seq-derived reads from each sample replicate were aligned independently to the the hg19 release of the human genome sequence with TopHat (version 2.0.10) (Trapnell, et al., 2009) using previously published approaches (Chalmel, et al., 2014; Pauli, et al., 2012; Trapnell, et al., 2012; Zimmermann, et al., 2015). Briefly TopHat program was run a first time for each RNA-seq fastq file using the HRT and HSJ datasets to improve read mapping. The resulting junction outputs produced by all TopHat runs were pooled and added to the HSJ dataset. TopHat was rerun a second time for each sample using the new HSJ dataset. The output of this second run comprised the final alignment (BAM format). Finally, BAM files corresponding to the same testis or ovary time of development were subsequently merged and sorted with the samtools suite (Li, et al., 2009). Transcriptome assembly and quantification. The transcriptome of each gonad was subsequently assembled, compared to known transcript annotation and quantified with the Cufflinks suite using default settings (Pollier, et al., 2013). Briefly, the assembly step that was performed by Cufflinks using the merged alignment files yielded a set of ~93000-192000 transcript fragments (transfrags) for each gonad. The Cuffcompare program was then used: (i) to define a non-redundant set of 180,242 assembled transcripts by tracking Cufflinks transfrags across all experiments; and, (ii) to compare the resulting transcripts to the HRT dataset (i.e. known transcript annotation). Finally, the abundance of each transcript in each experiment was assessed using Cuffdiff in fragments per kilobase of exon model per million reads mapped (FPKM). Abundance values were normalized using Cuffnorm to reduce systematic effects and to allow direct comparison between the individual samples. Refinement of assembled transcripts. As suggested by (Chalmel, et al., 2014; Prensner, et al., 2011), we sequentially applied four filtering steps to eliminate poor-quality quantifications and identify the most robust transfrags from background signal. First, we selected 60,437 “detectable” or “expressed” transfrags defined as those for which abundance levels were above 1 FPKM in at least one sample. We next selected 60,136 transcripts with a cumulative exon length ≥200 nt. Third, all transfrags that were not automatically annotated by Cuffcompare as complete match (Cuffcompare class “=”), potentially novel isoform (“j”), unknown intronic (“i”, i.e. loci falling entirely within a reference intron and without exon-exon overlap with another known locus), intergenic (“u”) or antisense (“x”) isoforms were discarded. Finally, all transcript fragments that were annotated as either novel isoforms or novel genes (class codes “j”, “i”, “u” or “x”) and that did not harbor at least two exons (multi-exonic) were filtered out. Together, this strategy produced a high-confidence set of 35,194 transcripts fulfilling these refinement conditions and supporting total RNA molecules expressed testis or ovary development. Genome_build: hg19 Supplementary_files_format_and_content: .bedgraph files contain genome-wide read coverage statistics, in bedGraph format, obtained with the "genomeCoverageBed" utility in BEDTools (release 2.12.0); .gtf file contains the reconstructed transcripts obtained with cufflinks and subsequently classified with cuffcompare in the cufflinks suite of tools (release 2.2.1); .txt file contains the FPKM expression levels quantified with Cufflinks for each transfrag and each sample; the columns are tab-separated.
|
|
|
Submission date |
Jun 26, 2018 |
Last update date |
Feb 13, 2020 |
Contact name |
Frédéric Chalmel |
E-mail(s) |
frederic.chalmel@inserm.fr
|
Organization name |
Inserm U1085-Irset
|
Department |
Physiology and physiopathology of the urogenital tract
|
Street address |
9 avenue du Pr. Léon Bernard
|
City |
Rennes |
State/province |
France |
ZIP/Postal code |
35000 |
Country |
France |
|
|
Platform ID |
GPL16791 |
Series (1) |
GSE116278 |
Dynamics of the transcriptional landscape during human gonad development during fetal life |
|
Relations |
BioSample |
SAMN09495482 |
Supplementary file |
Size |
Download |
File type/resource |
GSM3223669_NNRD15_S2.accepted_hits.bam.bed.gz |
272.2 Mb |
(ftp)(http) |
BED |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|