|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jul 16, 2020 |
Title |
leptomeningeal metastasis Patient_D |
Sample type |
SRA |
|
|
Source name |
CSF cell fraction isolated from a patient with newly diagnosed leptomeningeal metastasis
|
Organism |
Homo sapiens |
Characteristics |
disease state: leptomeningeal metastasis metastatic site: leptomeninges/CSF
|
Extracted molecule |
polyA RNA |
Extraction protocol |
CSF, collected with informed consent from patients under protocol IRB 13-039, was processed to isolate the sample into cell-free CSF and the cellular contents of the CSF: The whole CSF sample was centrifuged at 600 x g for 5 minutes without brake at 4 ºC to pellet the cells, and the supernatant was saved as cell-free CSF. The pellet was resuspended, washed with PBS supplemented with 0.4% BSA twice and processed immediatelly. The cells were manually counted with a hematocytometer. scRNA-Seq was performed with 10X genomics system using Chromium Single Cell 3’ Library and Gel Bead Kit V2 (catalog no. 120234). Briefly, 8,700 cells (viability 70-80%) were processed per sample, targeting recovery of ~5,000 cells with 3.9% multiplet rate. In cases, where cell count was too low to target 5,000 cells, maximum volume (34 µl) was loaded in the microfluidic droplet generation device. After reverse transcription reaction emulsions were broken, barcoded cDNA was purified with DynaBeads, followed by 12-cycles of PCR amplification. The resulting amplified barcoded-cDNA library was fragmented to ~400-600 bp, ligated to sequencing adapter and PCR-amplified to obtain sufficient amount of material for next-generation sequencing. The final libraries were sequenced on an Illumina NovaSeq 6000 system (Read 1 – 28 cycles, Index Read – 8 cycles, and Read 2 – 96 cycles).
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina NovaSeq 6000 |
|
|
Data processing |
Raw FASTQ files for each patient were preprocessed using the SEQC pipeline(2) using hg38 human genome and the default SEQC parameters for 10X Genomics to obtain the molecule count matrix. The SEQC pipeline aligns the reads to the genome, corrects barcode and unique molecular identifier (UMI) errors, resolves multi-mapping reads, and generates a molecule count matrix. SEQC also performs a number of filtering steps: (1) Identification of true cells from cumulative distribution of molecule counts per barcode, (2) removal of apoptotic cells identified at cells with >20% of molecules derived from the mitochondria, and (3) removal of low-complexity cells identified as cells where the detected molecules are aligned to a small subset of genes. In addition, cells with less than 800-1,000 molecules detected were filtered out. After the filtering, we have retained ~19,000 cells with a median molecule count of ~4,100 and median gene count of ~1,200, indicating the high quality of the data. Each patient contributed from 1,800 to 5,000 single-cells to this dataset. Cell doublets are a characteristic error source in droplet-based single-cell sequencing data where two cells are randomly co-encapsulated with the same barcode. To remove likely doublet cells we have employed unsupervised machine learning classifier (3). This classifier operates on a count matrix and leverages the creation of in silico synthetic doublets to determine which cells in the input count matrix have gene expression that is best explained by the combination of distinct cell types in the matrix. Each patient was processed separately, cells with p-values <1e-7 were identified as doublets and removed. In total, we have removed ~1,300 cells. Filtered count matrices for each patient were median size normalized. To avoid numerical issues counts were multiplied by 10,000 and log transformed with pseudo count of 1 using SCANPY package (4). After normalization, count matrices were concatenated and batch effect corrected with mnnCorrect (5, 6). As a reference, we selected the patient with the most cells (patient E) and to facilitate identification of mutual nearest neighbors we used top 3,000 highly variable genes (HVGs) identified from all patients and opted for non-cosine normalized batch effect corrected output for downstream analyses. Using earlier selected top 3,000 HVGs, batch effect corrected gene expression matrix was decomposed using randomized principal component analysis (PCA). We selected to retain 12 principal components using the knee point (minimum radius of curvature in eigenvalues). We clustered the PCA-reduced matrix was using PhenoGraph (69) resulting in 18 clusters. The same principal components were used to construct a k-nearest-neighbor (kNN) graph based on Euclidian distance. This kNN graph was used to generate UMAP projections (66) using the SCANPY package. In addition, we applied MAGIC (7) to the PCA-reduced matrix to denoise the data and impute missing values. Note, all UMAP plots show post-MAGIC expression values. Genome_build: hg38 Supplementary_files_format_and_content: count matrices in comma separated value format
|
|
|
Submission date |
May 15, 2020 |
Last update date |
Jul 17, 2020 |
Contact name |
Jan Remsik |
E-mail(s) |
jan.remsik@vib.be
|
Organization name |
VIB
|
Department |
Center for Cancer Biology
|
Street address |
Campus Gasthuisberg, Herestraat 49
|
City |
Leuven |
ZIP/Postal code |
3000 |
Country |
Belgium |
|
|
Platform ID |
GPL24676 |
Series (2) |
GSE150660 |
Single-cell atlas of human leptomeningeal metastasis |
GSE150681 |
Cancer cells deploy lipocalin-2 to collect limiting iron in leptomeningeal metastasis |
|
Relations |
BioSample |
SAMN14933656 |
SRA |
SRX8349621 |
Supplementary file |
Size |
Download |
File type/resource |
GSM4555890_Patient_D.csv.gz |
4.0 Mb |
(ftp)(http) |
CSV |
GSM4555890_Patient_D.h5ad.gz |
333.8 Mb |
(ftp)(http) |
H5AD |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
|