GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE85458

Query DataSets for GSE85458

Status

Public on Aug 11, 2016

Title

Filtering of RNA-seq datasets and differences between cell types in global coordination of splicing and proportion of highly expressed genes

Sample organism

Mus musculus

Experiment type

Expression profiling by high throughput sequencing
Third-party reanalysis

Summary

The goal of this study was to investigate whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. These data reveal specific candidate genes, which may be involved in global coordination of balance in the transcriptome.

Overall design

The following samples were reprocessed and reanalyzed:

Astrocytes, GSE52564/GSM1269903/GSM1269904
Endothelial cells, GSE52564/GSM1269915/GSM1269916
Cortical neurons, GSE52564/GSM1269905/GSM1269906
Oligodendrocytes, GSE52564/GSM1269911/GSM1269912
Microglia, GSE52564/GSM1269913/GSM1269914
Megakaryocyte-erythroid progenitors, GSE40522/GSM995525
Erythroid-committed precursors Gata1 KO, GSE40522/GSM995536

Libraries for all samples included two biological replicates, were prepared using polyA-selected RNA, and paired reads sequenced 100 bp from each end on HiSeq 2000 Sequencer (Illumina). The raw reads were reprocessed as follows. Reads were mapped to mouse reference genome mm10 (UCSC Genome Browser) and a comprehensive transcriptome annotation database GTF file, which was assembled by using the UCSC Table Browser Intersection utility to merge the GENCODE M4 transcripts in a non-redundant manner with the UCSC Gene Track transcripts that did not overlap more than 90% with the GENCODE transcripts. The raw reads were mapped using the TopHat/Bowtie2/Cufflinks pipeline, with -g option, to construct merged GTF file that included the annotated and novel transcript structures from all samples. Then, the IntersectBed tool (Bedtools) was used to retain only the reads that mapped to the merged GTF, which was converted to BED with Gtf2bed tool (Bedops). This filtering step allowed selecting the reads which contributed to the identified gene structures, and exclude noise and artifacts even if they mapped to the genome but did not contribute to gene structure. Next, only uniquely mapped and properly paired reads were selected using View -bq 4 -bh -f2 -F12 command (Samtools). After this step, DownsampleSam tool (Picard) was used to randomly subsample equal number of paired reads, which provided representative samples of the same size for all samples (34.6M per sample/replicate; reads count with Flagstat, Samtools). The reprocessed samples were reanalyzed as follows: The TopHat/Bowtie2/Cufflinks/Cuffdiff pipeline with -g option was used for determining normalized expression in FPKMs in each replicate of each sample with Cuffdiff’s across-sample normalization. After the cell types were segregated into two clusters based on higher or lower proportion of the highly expressed genes, the differential expression analysis was preformed between the two groups, which were treated as two conditions. For this analysis, each replicate of each cell type was assigned to one of only two cluster groups. For differential expression analysis, the Cuffdiff q-value cut off was set to 0.05. Software versions used: Tophat 2.0.12, Bowtie 2.2.4, Cufflinks 2.2.1, Samtools 0.1.19, Picard 1.79, Bedops 2.4.2, Bedtools 2.19.0. Analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School NIH supported shared facility.

Contributor(s)

Trakhtenberg EF, Pho N, Holton KM, Chittenden TW, Goldberg JL, Dong L

Citation(s)

27577089

Submission date

Aug 10, 2016

Last update date

Sep 01, 2016

Contact name

Ephraim F Trakhtenberg

E-mail(s)

trakhtenberg@uchc.edu

Organization name

University of Connecticut School of Medicine

Department

Neuroscience

Street address

263 Farmington Ave. RM L4005

City

Farmington

State/province

ZIP/Postal code

06030

Country

USA

Relations

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

Reanalysis of

BioProject

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE85458_Cell_types_genes.fpkm_tracking.gz	3.3 Mb	(ftp)(http)	FPKM_TRACKING
GSE85458_Cell_types_isoforms.fpkm_tracking.gz	14.5 Mb	(ftp)(http)	FPKM_TRACKING
GSE85458_Clusters_gene_exp.diff.gz	1.6 Mb	(ftp)(http)	DIFF
GSE85458_Clusters_genes.fpkm_tracking.gz	1.8 Mb	(ftp)(http)	FPKM_TRACKING
GSE85458_Clusters_isoforms.fpkm_tracking.gz	5.2 Mb	(ftp)(http)	FPKM_TRACKING
GSE85458_Clusters_isoforms_exp.diff.gz	3.7 Mb	(ftp)(http)	DIFF
GSE85458_Sample_GSM_ids_and_protocols.txt.gz	1.4 Kb	(ftp)(http)	TXT
Processed data are available on Series record