GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM7040240

Query DataSets for GSM7040240

Status

Public on Nov 28, 2023

Title

P-1488

Sample type

SRA

Source name

postmortem brain tissue

Organism

Homo sapiens

Characteristics

tissue: Putamen
cell type: striatal cells
pair: 4
subject: 1488
Sex: M
race: B
age: 39
bmi: 33.1
pmi: 21.5
ph: 6.6
rin: 8.7
tissue.storage.time.mo: 127.2
dx oud: None
dx substances: None
dx comorbid: None
dur.oud: 0
blood.toxicology: No substances detected
infxn.dx: Asthma
medications.atodc: N
tobacco.atod: Y
manner.of.death: Natural
cause.of.death: Pulmonary Embolism

Extracted molecule

nuclear RNA

Extraction protocol

Nuclei were isolated from 24 samples of frozen human postmortem brain tissue. Samples weighing 40mg were homogenized in 5mL glass douncers with 1mL lysis buffer containing DAPI using glass pestles using ~10 strokes per pestle. Homogenate was filtered using a 40um mesh strainer (FisherScientific #48680). Nuclei were sorted for DAPI fluorescence using a BD FACS Aria at the Boston University Flow Cytometry Core. Approximately 100,000 nuclei were sorted into 7ul of 0.04% bovine serum albumin (MilliporeSigma #126615) in phosphate buffered saline (ThermoFisher #10010031). Nuclei were counted using a hemocytometer and assessed for concentration and debris. 7,000 nuclei were targeted per sample except for two samples with lower concentrations where 5,000 nuclei were targeted. The 10x Chromium process was performed and next generation sequencing libraries were prepared using the 10x genomics single cell 3’ gene expression dual index kit.
Libraries were sequenced at the Boston University Single Cell Sequencing Core. The pool of snRNA-seq libraries were sequenced on 7 Next-seq P3 flow cells with intermediate re-pooling scheme to optimize for 50-80% sequencing saturation, > 8,000 average UMI per cell. Between sequencing runs, we preliminarily aligned the sequencing reads as outlined below to assess quality per sample and estimate the number of viable nuclei and sample complexity. We identified two samples, C-13291 and C-612, to have low quality QC metrics due to wetting failures, mean UMI per cell <1,000 and estimated # cells >50,000. These samples were excluded from subsequent re-pooling and further analyses (Supplemental Table 1.3 tab STARsolo QC).
snRNA-seq

Library strategy

RNA-Seq

Library source

transcriptomic single cell

Library selection

cDNA

Instrument model

Illumina NextSeq 500

Data processing

We aligned single nuclei RNA-seq (snRNA-seq) reads to the human genome (GRCh38.p13) for each output with the turn-key single-cell transcriptomics method STARsolo, which is folds faster than the CellRanger pipeline and equally accurate (v2.7.9a) 1. We chose parameters for the STARsolo UMI quantification to closely replicate the 10X Cell-Ranger pipeline v6 and use the filtered genome and gene annotation available from 10X Genomics (https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest, Human reference 2020-A). We ran STARsolo to allow for pre-mRNA gene counts as well as exonic counts for nuclear RNA and to separately count introns and exons for RNA velocity analyses, (--soloFeatures GeneFull Velocyto). We used the following parameters to correct cell barcodes, de-duplicate transcripts by their unique molecular identifier (UMI), assign UMI counts to genes, and pre-filter cells that are likely empty droplets (--soloType Droplet --soloCBmatchWLtype 1MM --soloCellFilter EmptyDrops_CR --soloMultiMappers EM --soloUMIdedup 1MM_CR). We pre-process the UMI gene x cell count matrix to reduce inherent biases in the technology. We identified likely ambient RNA contamination with SoupX 2, empty droplets with DropletQC 3, doublets with scds 4, and damaged nuclei with miQC 5. For each of these analyses, each sample (GEM well) was analyzed separate from each other. We ran SoupX as described to estimate the fraction of ambient RNA from both raw and unfiltered UMI count matrices from STARsolo and perform ambient RNA removal aware of the cell clusters in the filtered matrix. For just the SoupX analyses, we clustered the cells with Seurat v4 6 with FindClusters(algorithm = 2, resolution = 0.5). For DropletQC, we used the intronic and exonic UMI counts per cell per gene from STARsolo to get the fraction of intronic UMI per cell (referred to as the nuclear fraction). We identified empty droplets with default DropletQC parameters (nf_rescue = 0.50, umi_rescue = 1000). We identified droplets with scds’s hybrid algorithm using the function cxds_bcds_hybrid to estimate doublet scores and called doublets on cells with scds.hybrid_score > 1.0. We identified damaged cells with miQC which uses a Bayesian EM algorithm to learn the relationship between mitochondria UMI counts and number of captured genes. We used the posterior probability cutoff of 0.75 to call damaged cells by miQC.
To combine cells together across samples, we normalize the UMI counts with the variance-stabilizing SCTransform and glmGamPoi on each sample 7,8 and jointly embed cells across samples with reciprocal PCA integration 6 as outlined in https://satijalab.org/seurat/articles/integration_rpca.html. In this joint embedding, we over-clustered the dataset with FindClusters(algorithm = 2, resolution = 1)and removed any cluster with more than 10% of cells flagged by miQC, scds, or DropletQC as low-quality biased-clusters in the data.
We annotated our cells from the human striatum to a recently published high-resolution snRNA-seq reference dataset of the non-human primate striatum 9 using Seurat v4. We downloaded the monkey snRNA-seq processed, annotated gene UMI counts for all cells and MSNs from GSE167920. He, Kleyman et al. had aligned the snRNA-seq reads to the rheMac10 genome using the GRCh38 gene annotation liftover to rheMac10, so the gene-wise labels represent the UMI counts on the rheMac10 genome most orthologous to human. For both full nuclei and MSN subset datasets, we re-processed the macaque cells with SCTransform , glmGamPoi, and reciprocal PCA with default parameters as above to enable label transfer using the most recent integration algorithms in Seurat.
To transfer cell annotations from the reference macaque striatum dataset to the human striatum cells, we perform two label transfers at increasing resolutions: one with all cells and another with just MSNs. As He, Kleyman et al. described, the differences between transcriptionally and anatomically distinct MSN subtypes are subtle, so we split the annotations into two steps to optimally annotate the cells. The first label transfers the cell classes (Oligodendrocytes, MSNs, Interneurons, etc.) from the macaque to the human dataset with the Seurat functions FindTransferAnchors(reduction = ‘rpca’) and TransferData. Next, we identified cells or cell clusters that were labeled as MSNs and transferred MSN subtype labels (D1.Striosome, D2.Striomsome, etc.) from the macaque to human datasets. We filtered out cells where the cell class or cell subtype labels have max prediction scores less than 0.5 as these tend to represent noisy predictions due to low quality cells from either datasets. We confirmed accurate label transfer at the cell class and cell subtype levels with published marker genes and similar proportions across subjects and samples.
Even with the robust cutoffs that we applied to this dataset to remove likely low quality or doublet cells, we find a residual subset of the data that contain these cells. Upon clustering, doublet cells tend to project into the UMAP space as long streaks between two well-defined cell types. Low-quality cell types would project into the UMAP space as amorphous cell types without clear boundaries. Using these embedding features, we selected these clusters with Seurat’s FindClusters(resolution = 1) function, confirmed that they have the indicative QC metrics, and removed them from analyses.
Assembly: hg38
Supplementary files format and content: Annotated seurat h5 object with raw, processed, and integrated gene counts by cell aggregating samples across the entire dataset
Supplementary files format and content: Annotated scanpy h5ad object with raw, processed, and integrated gene counts by cell aggregating samples across the entire dataset

Submission date

Feb 13, 2023

Last update date

Nov 28, 2023

Contact name

BaDoi Nguyen Phan

E-mail(s)

badoi.phan@pitt.edu

Organization name

Carnegie Mellon University

Department

Computational Biology

Lab

Pfenning Lab

Street address

5000 Forbes Ave

City

Pittsburgh

State/province

ZIP/Postal code

15213

Country

USA

Platform ID

GPL18573

Series (2)

GSE225158	Transcriptional responses of the human dorsal striatum in opioid use disorder implicates cell type-specifc programs
GSE233279	Single nuclei transcriptomics of human and monkey striatum implicates DNA damage, neuroinflammation, and neurodegeneration signaling in opioid use disorder

Relations

BioSample

SAMN33270001

SRA

SRX19352033

Supplementary data files not provided

SRA Run Selector

Raw data are available in SRA

Processed data are available on Series record