|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jul 18, 2018 |
Title |
SRX018875 |
Sample type |
SRA |
|
|
Source name |
0.4ng_library_methB_S2_2.5%ERCC_phaseV_pool15_mRNA
|
Organism |
Drosophila melanogaster |
Characteristics |
cell type: S2 attributes: ercc|phase|s2|pool
|
Extracted molecule |
total RNA |
Extraction protocol |
see original sample
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina Genome Analyzer II |
|
|
Data processing |
We created a pre-alignment pipeline to identify technical metadata and generate sample quality metrics. We downloaded FASTQs from SRA using fastq-dump (sra-tools v2.8.2) --split-files -M 0, and counted the number of reads and estimated average read lengths. A sample was considered paired end if two files were generated by fastq-dump and each file had an equal number of reads, ≥ 10,000 reads, and an average read length ≥ 10 bp. We filtered individual reads that were ≤ 25 bp using atropos (v1.1.18) with --minimum-length 25. We simultaneously verified samples were indeed Drosophila and estimated contamination with FastQ Screen (v0.11.3) and bowtie 2 (v2.3.3.1); by mapping 100,000 reads to 8 references (dm6, rRNA, wolbachia, human, yeast, e. coli, PhiX, ERCC-SRM2374). Next we aligned all reads with Hisat2 (v2.1.0) with --max_intronlen 300000 and --known-splicesite-file to the Drosophila melanogaster Release 6 plus ISO1 MT (GCA_000001215.4). This was followed with samtools (v1.7) and bamtools (v2.4.1) with default settings to generate summary statistics. We estimated various metrics with Picard CollectRNASeqMetrics (v2.15.0) using three separate parameters STRAND=NONE, STRAND=FIRST_READ_TRANSCRIPTION_STRAND, and STRAND=SECOND_READ_TRANSCRIPTION_STRAND. These metrics allowed us estimate library strandedness. Finally we identified duplicates using Picard MarkDuplicates (v2.15.0). To generate counts tables and coverage tracks we used parameters discovered in the pre-alignment pipeline in our alignment pipeline. The alignment pipeline uses FASTQ file(s) downloaded by the pre-alignment pipeline, but trimms adapter sequence and low quality bases using atropos (v1.1.18) with -q 20 --minimum-length 25. The remaining reads were mapped using Hisat2 (v2.1.0) with --dta --max-intronlen 300000 --known-splicesite-infile and the --rna-strandedness using ‘F’, ‘R’, ‘FR’, or ‘RF’ depending on the strandedness. We merged alignments from individual SRA runs (SRRs) to the library level (SRX) and generated gene level, junction level, and intergenic coverage counts using FeatureCounts from the subread package (v1.5.3). Finally we created browser tracks using bamCoverage from the deeptools package (v2.5.4) using --binSize 1 --normalizeTo1x 129000000 --ignoreForNormalization chrX. Genome_build: Drosophila melanogaster Release 6 plus ISO1 MT (GeneBank assembly accession: GCA_000001215.4) Supplementary_files_format_and_content: Processed data files include: *.bw are BigWig files generated using deeptools bamCoverage *.counts are gene level coverage counts *.jcounts are gene level junction counts *.intergenic.counts are intergenic coverage counts *.intergenic.jcounts are intergenic junction counts Series level supplementary files: dmel_r6-11.intergenic.gtf intergenic GTF generated by the pipeline for estimating intergenic coverage counts. supplemental_metadata.tsv supplemental metadata file containing additional metadata for each sample including QC values and various flags generated by each pipeline gene_counts.tsv supplemental file containing all gene counts as a single matrix intergenic_counts.tsv supplemental file containing all intergenic counts as a single matrix
|
|
|
Submission date |
Jul 17, 2018 |
Last update date |
Sep 04, 2018 |
Contact name |
Brian Oliver |
E-mail(s) |
briano@nih.gov
|
Phone |
301-204-9463
|
Organization name |
NIDDK, NIH
|
Department |
LBG
|
Lab |
Developmental Genomics
|
Street address |
50 South Drive
|
City |
Bethesda |
State/province |
MD |
ZIP/Postal code |
20892 |
Country |
USA |
|
|
Platform ID |
GPL9061 |
Series (1) |
GSE117217 |
Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive |
|
Relations |
Reanalysis of |
GSM516593 |
BioSample |
SAMN00010997 |
SRA |
SRX018875 |
Named Annotation |
GSM3274606_SRX018875.flybase.plus.bw |
Named Annotation |
GSM3274606_SRX018875.flybase.minus.bw |
Supplementary file |
Size |
Download |
File type/resource |
GSM3274606_SRX018875.bam.counts.jcounts.txt.gz |
62.2 Kb |
(ftp)(http) |
TXT |
GSM3274606_SRX018875.bam.counts.txt.gz |
823.4 Kb |
(ftp)(http) |
TXT |
GSM3274606_SRX018875.bam.intergenic.counts.jcounts.txt.gz |
51.3 Kb |
(ftp)(http) |
TXT |
GSM3274606_SRX018875.bam.intergenic.counts.txt.gz |
144.7 Kb |
(ftp)(http) |
TXT |
GSM3274606_SRX018875.flybase.minus.bw |
1.3 Mb |
(ftp)(http) |
BW |
GSM3274606_SRX018875.flybase.plus.bw |
1.3 Mb |
(ftp)(http) |
BW |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|