NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE85458 Query DataSets for GSE85458
Status Public on Aug 11, 2016
Title Filtering of RNA-seq datasets and differences between cell types in global coordination of splicing and proportion of highly expressed genes
Sample organism Mus musculus
Experiment type Expression profiling by high throughput sequencing
Third-party reanalysis
Summary The goal of this study was to investigate whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. These data reveal specific candidate genes, which may be involved in global coordination of balance in the transcriptome.
 
Overall design The following samples were reprocessed and reanalyzed:

Astrocytes, GSE52564/GSM1269903/GSM1269904
Endothelial cells, GSE52564/GSM1269915/GSM1269916
Cortical neurons, GSE52564/GSM1269905/GSM1269906
Oligodendrocytes, GSE52564/GSM1269911/GSM1269912
Microglia, GSE52564/GSM1269913/GSM1269914
Megakaryocyte-erythroid progenitors, GSE40522/GSM995525
Erythroid-committed precursors Gata1 KO, GSE40522/GSM995536

Libraries for all samples included two biological replicates, were prepared using polyA-selected RNA, and paired reads sequenced 100 bp from each end on HiSeq 2000 Sequencer (Illumina). The raw reads were reprocessed as follows. Reads were mapped to mouse reference genome mm10 (UCSC Genome Browser) and a comprehensive transcriptome annotation database GTF file, which was assembled by using the UCSC Table Browser Intersection utility to merge the GENCODE M4 transcripts in a non-redundant manner with the UCSC Gene Track transcripts that did not overlap more than 90% with the GENCODE transcripts. The raw reads were mapped using the TopHat/Bowtie2/Cufflinks pipeline, with -g option, to construct merged GTF file that included the annotated and novel transcript structures from all samples. Then, the IntersectBed tool (Bedtools) was used to retain only the reads that mapped to the merged GTF, which was converted to BED with Gtf2bed tool (Bedops). This filtering step allowed selecting the reads which contributed to the identified gene structures, and exclude noise and artifacts even if they mapped to the genome but did not contribute to gene structure. Next, only uniquely mapped and properly paired reads were selected using View -bq 4 -bh -f2 -F12 command (Samtools). After this step, DownsampleSam tool (Picard) was used to randomly subsample equal number of paired reads, which provided representative samples of the same size for all samples (34.6M per sample/replicate; reads count with Flagstat, Samtools). The reprocessed samples were reanalyzed as follows: The TopHat/Bowtie2/Cufflinks/Cuffdiff pipeline with -g option was used for determining normalized expression in FPKMs in each replicate of each sample with Cuffdiff’s across-sample normalization. After the cell types were segregated into two clusters based on higher or lower proportion of the highly expressed genes, the differential expression analysis was preformed between the two groups, which were treated as two conditions. For this analysis, each replicate of each cell type was assigned to one of only two cluster groups. For differential expression analysis, the Cuffdiff q-value cut off was set to 0.05. Software versions used: Tophat 2.0.12, Bowtie 2.2.4, Cufflinks 2.2.1, Samtools 0.1.19, Picard 1.79, Bedops 2.4.2, Bedtools 2.19.0. Analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School NIH supported shared facility.
 
Contributor(s) Trakhtenberg EF, Pho N, Holton KM, Chittenden TW, Goldberg JL, Dong L
Citation(s) 27577089
Submission date Aug 10, 2016
Last update date Sep 01, 2016
Contact name Ephraim F Trakhtenberg
E-mail(s) trakhtenberg@uchc.edu
Organization name University of Connecticut School of Medicine
Department Neuroscience
Street address 263 Farmington Ave. RM L4005
City Farmington
State/province CT
ZIP/Postal code 06030
Country USA
 
Relations
Reanalysis of GSM1269903
Reanalysis of GSM1269904
Reanalysis of GSM1269915
Reanalysis of GSM1269916
Reanalysis of GSM1269905
Reanalysis of GSM1269906
Reanalysis of GSM1269911
Reanalysis of GSM1269912
Reanalysis of GSM1269913
Reanalysis of GSM1269914
Reanalysis of GSM995525
Reanalysis of GSM995536
BioProject PRJNA338506

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE85458_Cell_types_genes.fpkm_tracking.gz 3.3 Mb (ftp)(http) FPKM_TRACKING
GSE85458_Cell_types_isoforms.fpkm_tracking.gz 14.5 Mb (ftp)(http) FPKM_TRACKING
GSE85458_Clusters_gene_exp.diff.gz 1.6 Mb (ftp)(http) DIFF
GSE85458_Clusters_genes.fpkm_tracking.gz 1.8 Mb (ftp)(http) FPKM_TRACKING
GSE85458_Clusters_isoforms.fpkm_tracking.gz 5.2 Mb (ftp)(http) FPKM_TRACKING
GSE85458_Clusters_isoforms_exp.diff.gz 3.7 Mb (ftp)(http) DIFF
GSE85458_Sample_GSM_ids_and_protocols.txt.gz 1.4 Kb (ftp)(http) TXT
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap