NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM7040221 Query DataSets for GSM7040221
Status Public on Nov 28, 2023
Title C-1366
Sample type SRA
 
Source name postmortem brain tissue
Organism Homo sapiens
Characteristics tissue: Caudate
cell type: striatal cells
pair: 1
subject: 1366
Sex: F
race: W
age: 35
bmi: 35.2
pmi: 11
ph: 6.1
rin: 7.1
tissue.storage.time.mo: 149
dx oud: OUD
dx substances: Poly-SUD
dx comorbid: Pain Disorder
dur.oud: 8
dsm.iv.sud: Opioid Dependence; Sedative or Hypnotic or Anxiolytic Dependence
dsm.iv.psych: Pain Disorder Associated with Both Psychological Factors and a General Medical Condition
blood.toxicology: Methadone, Tramadol, Alprazolam, Diphenhydramine
infxn.dx: Asthma; Recurrent Bronchitis
medications.atodc: B, O, Z
tobacco.atod: Y
manner.of.death: Accidental
cause.of.death: Combined Drug Overdose
Extracted molecule nuclear RNA
Extraction protocol Nuclei were isolated from 24 samples of frozen human postmortem brain tissue. Samples weighing 40mg were homogenized in 5mL glass douncers with 1mL lysis buffer containing DAPI using glass pestles using ~10 strokes per pestle. Homogenate was filtered using a 40um mesh strainer (FisherScientific #48680). Nuclei were sorted for DAPI fluorescence using a BD FACS Aria at the Boston University Flow Cytometry Core. Approximately 100,000 nuclei were sorted into 7ul of 0.04% bovine serum albumin (MilliporeSigma #126615) in phosphate buffered saline (ThermoFisher #10010031). Nuclei were counted using a hemocytometer and assessed for concentration and debris. 7,000 nuclei were targeted per sample except for two samples with lower concentrations where 5,000 nuclei were targeted. The 10x Chromium process was performed and next generation sequencing libraries were prepared using the 10x genomics single cell 3’ gene expression dual index kit.
Libraries were sequenced at the Boston University Single Cell Sequencing Core. The pool of snRNA-seq libraries were sequenced on 7 Next-seq P3 flow cells with intermediate re-pooling scheme to optimize for 50-80% sequencing saturation, > 8,000 average UMI per cell. Between sequencing runs, we preliminarily aligned the sequencing reads as outlined below to assess quality per sample and estimate the number of viable nuclei and sample complexity. We identified two samples, C-13291 and C-612, to have low quality QC metrics due to wetting failures, mean UMI per cell <1,000 and estimated # cells >50,000. These samples were excluded from subsequent re-pooling and further analyses (Supplemental Table 1.3 tab STARsolo QC).
snRNA-seq
 
Library strategy RNA-Seq
Library source transcriptomic single cell
Library selection cDNA
Instrument model Illumina NextSeq 500
 
Data processing We aligned single nuclei RNA-seq (snRNA-seq) reads to the human genome (​​GRCh38.p13) for each output with the turn-key single-cell transcriptomics method STARsolo, which is folds faster than the CellRanger pipeline and equally accurate (v2.7.9a) 1. We chose parameters for the STARsolo UMI quantification to closely replicate the 10X Cell-Ranger pipeline v6 and use the filtered genome and gene annotation available from 10X Genomics (https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest, Human reference 2020-A). We ran STARsolo to allow for pre-mRNA gene counts as well as exonic counts for nuclear RNA and to separately count introns and exons for RNA velocity analyses, (--soloFeatures GeneFull Velocyto). We used the following parameters to correct cell barcodes, de-duplicate transcripts by their unique molecular identifier (UMI), assign UMI counts to genes, and pre-filter cells that are likely empty droplets (--soloType Droplet --soloCBmatchWLtype 1MM --soloCellFilter EmptyDrops_CR --soloMultiMappers EM --soloUMIdedup 1MM_CR). We pre-process the UMI gene x cell count matrix to reduce inherent biases in the technology. We identified likely ambient RNA contamination with SoupX 2, empty droplets with DropletQC 3, doublets with scds 4, and damaged nuclei with miQC 5. For each of these analyses, each sample (GEM well) was analyzed separate from each other. We ran SoupX as described to estimate the fraction of ambient RNA from both raw and unfiltered UMI count matrices from STARsolo and perform ambient RNA removal aware of the cell clusters in the filtered matrix. For just the SoupX analyses, we clustered the cells with Seurat v4 6 with FindClusters(algorithm = 2, resolution = 0.5). For DropletQC, we used the intronic and exonic UMI counts per cell per gene from STARsolo to get the fraction of intronic UMI per cell (referred to as the nuclear fraction). We identified empty droplets with default DropletQC parameters (nf_rescue = 0.50, umi_rescue = 1000). We identified droplets with scds’s hybrid algorithm using the function cxds_bcds_hybrid to estimate doublet scores and called doublets on cells with scds.hybrid_score > 1.0. We identified damaged cells with miQC which uses a Bayesian EM algorithm to learn the relationship between mitochondria UMI counts and number of captured genes. We used the posterior probability cutoff of 0.75 to call damaged cells by miQC.
To combine cells together across samples, we normalize the UMI counts with the variance-stabilizing SCTransform and glmGamPoi on each sample 7,8 and  jointly embed cells across samples with reciprocal PCA integration 6 as outlined in https://satijalab.org/seurat/articles/integration_rpca.html. In this joint embedding, we over-clustered the dataset with FindClusters(algorithm = 2, resolution = 1)and removed any cluster with more than 10% of cells flagged by miQC, scds, or DropletQC as low-quality biased-clusters in the data.
We annotated our cells from the human striatum to a recently published high-resolution snRNA-seq reference dataset of the non-human primate striatum 9 using Seurat v4. We downloaded the monkey snRNA-seq processed, annotated gene UMI counts for all cells and MSNs from GSE167920. He, Kleyman et al. had aligned the snRNA-seq reads to the rheMac10 genome using the GRCh38 gene annotation liftover to rheMac10, so the gene-wise labels represent the UMI counts on the rheMac10 genome most orthologous to human. For both full nuclei and MSN subset datasets, we re-processed the macaque cells with SCTransform , glmGamPoi, and reciprocal PCA with default parameters as above to enable label transfer using the most recent integration algorithms in Seurat.
To transfer cell annotations from the reference macaque striatum dataset to the human striatum cells, we perform two label transfers at increasing resolutions: one with all cells and another with just MSNs. As He, Kleyman et al. described, the differences between transcriptionally and anatomically distinct MSN subtypes are subtle, so we split the annotations into two steps to optimally annotate the cells. The first label transfers the cell classes (Oligodendrocytes, MSNs, Interneurons, etc.) from the macaque to the human dataset with the Seurat functions FindTransferAnchors(reduction = ‘rpca’) and  TransferData. Next, we identified cells or cell clusters that were labeled as MSNs and transferred MSN subtype labels (D1.Striosome, D2.Striomsome, etc.) from the macaque to human datasets. We filtered out cells where the cell class or cell subtype labels have max prediction scores less than 0.5 as these tend to represent noisy predictions due to low quality cells from either datasets. We confirmed accurate label transfer at the cell class and cell subtype levels with published marker genes and similar proportions across subjects and samples.
Even with the robust cutoffs that we applied to this dataset to remove likely low quality or doublet cells, we find a residual subset of the data that contain these cells. Upon clustering, doublet cells tend to project into the UMAP space as long streaks between two well-defined cell types. Low-quality cell types would project into the UMAP space as amorphous cell types without clear boundaries. Using these embedding features, we selected these clusters with Seurat’s FindClusters(resolution = 1) function, confirmed that they have the indicative QC metrics, and removed them from analyses.
Assembly: hg38
Supplementary files format and content: Annotated seurat h5 object with raw, processed, and integrated gene counts by cell aggregating samples across the entire dataset
Supplementary files format and content: Annotated scanpy h5ad object with raw, processed, and integrated gene counts by cell aggregating samples across the entire dataset
 
Submission date Feb 13, 2023
Last update date Nov 28, 2023
Contact name BaDoi Nguyen Phan
E-mail(s) badoi.phan@pitt.edu
Organization name Carnegie Mellon University
Department Computational Biology
Lab Pfenning Lab
Street address 5000 Forbes Ave
City Pittsburgh
State/province PA
ZIP/Postal code 15213
Country USA
 
Platform ID GPL18573
Series (2)
GSE225158 Transcriptional responses of the human dorsal striatum in opioid use disorder implicates cell type-specifc programs
GSE233279 Single nuclei transcriptomics of human and monkey striatum implicates DNA damage, neuroinflammation, and neurodegeneration signaling in opioid use disorder
Relations
BioSample SAMN33270020
SRA SRX19352014

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap