NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM4555887 Query DataSets for GSM4555887
Status Public on Jul 16, 2020
Title leptomeningeal metastasis Patient_A
Sample type SRA
 
Source name CSF cell fraction isolated from a patient with newly diagnosed leptomeningeal metastasis
Organism Homo sapiens
Characteristics disease state: leptomeningeal metastasis
metastatic site: leptomeninges/CSF
Extracted molecule polyA RNA
Extraction protocol CSF, collected with informed consent from patients under protocol IRB 13-039, was processed to isolate the sample into cell-free CSF and the cellular contents of the CSF: The whole CSF sample was centrifuged at 600 x g for 5 minutes without brake at 4 ºC to pellet the cells, and the supernatant was saved as cell-free CSF. The pellet was resuspended, washed with PBS supplemented with 0.4% BSA twice and processed immediatelly. The cells were manually counted with a hematocytometer. scRNA-Seq was performed with 10X genomics system using Chromium Single Cell 3’ Library and Gel Bead Kit V2 (catalog no. 120234). Briefly, 8,700 cells (viability 70-80%) were processed per sample, targeting recovery of ~5,000 cells with 3.9% multiplet rate. In cases, where cell count was too low to target 5,000 cells, maximum volume (34 µl) was loaded in the microfluidic droplet generation device. After reverse transcription reaction emulsions were broken, barcoded cDNA was purified with DynaBeads, followed by 12-cycles of PCR amplification. The resulting amplified barcoded-cDNA library was fragmented to ~400-600 bp, ligated to sequencing adapter and PCR-amplified to obtain sufficient amount of material for next-generation sequencing. The final libraries were sequenced on an Illumina NovaSeq 6000 system (Read 1 – 28 cycles, Index Read – 8 cycles, and Read 2 – 96 cycles).
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NovaSeq 6000
 
Data processing Raw FASTQ files for each patient were preprocessed using the SEQC pipeline(2) using hg38 human genome and the default SEQC parameters for 10X Genomics to obtain the molecule count matrix. The SEQC pipeline aligns the reads to the genome, corrects barcode and unique molecular identifier (UMI) errors, resolves multi-mapping reads, and generates a molecule count matrix. SEQC also performs a number of filtering steps: (1) Identification of true cells from cumulative distribution of molecule counts per barcode, (2) removal of apoptotic cells identified at cells with >20% of molecules derived from the mitochondria, and (3) removal of low-complexity cells identified as cells where the detected molecules are aligned to a small subset of genes. In addition, cells with less than 800-1,000 molecules detected were filtered out. After the filtering, we have retained ~19,000 cells with a median molecule count of ~4,100 and median gene count of ~1,200, indicating the high quality of the data. Each patient contributed from 1,800 to 5,000 single-cells to this dataset.
Cell doublets are a characteristic error source in droplet-based single-cell sequencing data where two cells are randomly co-encapsulated with the same barcode. To remove likely doublet cells we have employed unsupervised machine learning classifier (3). This classifier operates on a count matrix and leverages the creation of in silico synthetic doublets to determine which cells in the input count matrix have gene expression that is best explained by the combination of distinct cell types in the matrix. Each patient was processed separately, cells with p-values <1e-7 were identified as doublets and removed. In total, we have removed ~1,300 cells.
Filtered count matrices for each patient were median size normalized. To avoid numerical issues counts were multiplied by 10,000 and log transformed with pseudo count of 1 using SCANPY package (4). After normalization, count matrices were concatenated and batch effect corrected with mnnCorrect (5, 6). As a reference, we selected the patient with the most cells (patient E) and to facilitate identification of mutual nearest neighbors we used top 3,000 highly variable genes (HVGs) identified from all patients and opted for non-cosine normalized batch effect corrected output for downstream analyses.
Using earlier selected top 3,000 HVGs, batch effect corrected gene expression matrix was decomposed using randomized principal component analysis (PCA). We selected to retain 12 principal components using the knee point (minimum radius of curvature in eigenvalues). We clustered the PCA-reduced matrix was using PhenoGraph (69) resulting in 18 clusters. The same principal components were used to construct a k-nearest-neighbor (kNN) graph based on Euclidian distance. This kNN graph was used to generate UMAP projections (66) using the SCANPY package. In addition, we applied MAGIC (7) to the PCA-reduced matrix to denoise the data and impute missing values. Note, all UMAP plots show post-MAGIC expression values.
Genome_build: hg38
Supplementary_files_format_and_content: count matrices in comma separated value format
 
Submission date May 15, 2020
Last update date Jul 17, 2020
Contact name Jan Remsik
E-mail(s) jan.remsik@vib.be
Organization name VIB
Department Center for Cancer Biology
Street address Campus Gasthuisberg, Herestraat 49
City Leuven
ZIP/Postal code 3000
Country Belgium
 
Platform ID GPL24676
Series (2)
GSE150660 Single-cell atlas of human leptomeningeal metastasis
GSE150681 Cancer cells deploy lipocalin-2 to collect limiting iron in leptomeningeal metastasis
Relations
BioSample SAMN14933654
SRA SRX8349618

Supplementary file Size Download File type/resource
GSM4555887_Patient_A.csv.gz 9.5 Mb (ftp)(http) CSV
GSM4555887_Patient_A.h5ad.gz 694.8 Mb (ftp)(http) H5AD
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap