GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM4555890

Query DataSets for GSM4555890

Status

Public on Jul 16, 2020

Title

leptomeningeal metastasis Patient_D

Sample type

SRA

Source name

CSF cell fraction isolated from a patient with newly diagnosed leptomeningeal metastasis

Organism

Homo sapiens

Characteristics

disease state: leptomeningeal metastasis
metastatic site: leptomeninges/CSF

Extracted molecule

polyA RNA

Extraction protocol

CSF, collected with informed consent from patients under protocol IRB 13-039, was processed to isolate the sample into cell-free CSF and the cellular contents of the CSF: The whole CSF sample was centrifuged at 600 x g for 5 minutes without brake at 4 ºC to pellet the cells, and the supernatant was saved as cell-free CSF. The pellet was resuspended, washed with PBS supplemented with 0.4% BSA twice and processed immediatelly. The cells were manually counted with a hematocytometer. scRNA-Seq was performed with 10X genomics system using Chromium Single Cell 3’ Library and Gel Bead Kit V2 (catalog no. 120234). Briefly, 8,700 cells (viability 70-80%) were processed per sample, targeting recovery of ~5,000 cells with 3.9% multiplet rate. In cases, where cell count was too low to target 5,000 cells, maximum volume (34 µl) was loaded in the microfluidic droplet generation device. After reverse transcription reaction emulsions were broken, barcoded cDNA was purified with DynaBeads, followed by 12-cycles of PCR amplification. The resulting amplified barcoded-cDNA library was fragmented to ~400-600 bp, ligated to sequencing adapter and PCR-amplified to obtain sufficient amount of material for next-generation sequencing. The final libraries were sequenced on an Illumina NovaSeq 6000 system (Read 1 – 28 cycles, Index Read – 8 cycles, and Read 2 – 96 cycles).

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina NovaSeq 6000

Data processing

Raw FASTQ files for each patient were preprocessed using the SEQC pipeline(2) using hg38 human genome and the default SEQC parameters for 10X Genomics to obtain the molecule count matrix. The SEQC pipeline aligns the reads to the genome, corrects barcode and unique molecular identifier (UMI) errors, resolves multi-mapping reads, and generates a molecule count matrix. SEQC also performs a number of filtering steps: (1) Identification of true cells from cumulative distribution of molecule counts per barcode, (2) removal of apoptotic cells identified at cells with >20% of molecules derived from the mitochondria, and (3) removal of low-complexity cells identified as cells where the detected molecules are aligned to a small subset of genes. In addition, cells with less than 800-1,000 molecules detected were filtered out. After the filtering, we have retained ~19,000 cells with a median molecule count of ~4,100 and median gene count of ~1,200, indicating the high quality of the data. Each patient contributed from 1,800 to 5,000 single-cells to this dataset.
Cell doublets are a characteristic error source in droplet-based single-cell sequencing data where two cells are randomly co-encapsulated with the same barcode. To remove likely doublet cells we have employed unsupervised machine learning classifier (3). This classifier operates on a count matrix and leverages the creation of in silico synthetic doublets to determine which cells in the input count matrix have gene expression that is best explained by the combination of distinct cell types in the matrix. Each patient was processed separately, cells with p-values <1e-7 were identified as doublets and removed. In total, we have removed ~1,300 cells.
Filtered count matrices for each patient were median size normalized. To avoid numerical issues counts were multiplied by 10,000 and log transformed with pseudo count of 1 using SCANPY package (4). After normalization, count matrices were concatenated and batch effect corrected with mnnCorrect (5, 6). As a reference, we selected the patient with the most cells (patient E) and to facilitate identification of mutual nearest neighbors we used top 3,000 highly variable genes (HVGs) identified from all patients and opted for non-cosine normalized batch effect corrected output for downstream analyses.
Using earlier selected top 3,000 HVGs, batch effect corrected gene expression matrix was decomposed using randomized principal component analysis (PCA). We selected to retain 12 principal components using the knee point (minimum radius of curvature in eigenvalues). We clustered the PCA-reduced matrix was using PhenoGraph (69) resulting in 18 clusters. The same principal components were used to construct a k-nearest-neighbor (kNN) graph based on Euclidian distance. This kNN graph was used to generate UMAP projections (66) using the SCANPY package. In addition, we applied MAGIC (7) to the PCA-reduced matrix to denoise the data and impute missing values. Note, all UMAP plots show post-MAGIC expression values.
Genome_build: hg38
Supplementary_files_format_and_content: count matrices in comma separated value format

Submission date

May 15, 2020

Last update date

Jul 17, 2020

Contact name

Jan Remsik

E-mail(s)

jan.remsik@vib.be

Organization name

VIB

Department

Center for Cancer Biology

Street address

Campus Gasthuisberg, Herestraat 49

City

Leuven

ZIP/Postal code

3000

Country

Belgium

Platform ID

GPL24676

Series (2)

GSE150660	Single-cell atlas of human leptomeningeal metastasis
GSE150681	Cancer cells deploy lipocalin-2 to collect limiting iron in leptomeningeal metastasis

Relations

BioSample

SAMN14933656

SRA

SRX8349621

Supplementary file	Size	Download	File type/resource
GSM4555890_Patient_D.csv.gz	4.0 Mb	(ftp)(http)	CSV
GSM4555890_Patient_D.h5ad.gz	333.8 Mb	(ftp)(http)	H5AD
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file