|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Aug 08, 2023 |
Title |
Multimodal hierarchical classification allows for efficient annotation of CITE-seq data |
Organism |
Homo sapiens |
Experiment type |
Expression profiling by high throughput sequencing Other
|
Summary |
Single-cell RNA sequencing (scRNA-seq) is an invaluable tool for profiling cells in complex tissues and dissecting activation states that lack well-defined surface protein expression. For immune cells, the transcriptomic profile captured by scRNA- seq cannot always identify cell states and subsets defined by conventional flow cytometry. Emerging technologies have enabled multimodal sequencing of single cells, such as paired sequencing of the transcriptome and surface proteome by CITE-seq, but integrating these high dimensional modalities for accurate cell type annotation remains a challenge in the field. Here, we describe a machine learning tool called MultiModal Classifier Hierarchy (MMoCHi) for the cell-type annotation of CITE-seq data. Our classifier involves several steps: 1) we use landmark registration to remove batch-related staining artifacts in CITE-Seq protein expression, 2) the user defines a hierarchy of classifications based on cell type similarity and ontology and provides markers (protein or gene expression) for the identification of ground truth populations within the dataset by threshold gating, 3) progressing through this user-defined hierarchy, we train a random forest classifier using all available modalities (surface proteome and transcriptome data), and 4) we use these forests to predict cell types across the entire dataset. Applying MMoCHi to CITE-seq data of immune cells isolated from eight distinct tissue sites of two human organ donors yields high-purity cell type annotations encompassing the broad array of immune cell states in the dataset. This includes T and B cell memory subsets, macrophages and monocytes, and natural killer cells, as well as rare populations of plasmacytoid dendritic cells, innate T cells, and innate lymphoid cell subsets. We validate the use of feature importances extracted from the classifier hierarchy to select robust genes for improved identification of T cell memory subsets by scRNA-seq. Together, MMoCHi provides a comprehensive system of tools for the batch-correction and cell- type annotation of CITE-seq data. Moreover, this tool provides flexibility in classification hierarchy design allowing for cell type annotations to reflect a researcher’s specific experimental design. This flexibility also renders MMoCHi readily extendable beyond immune cell annotation, and potentially adaptable to other sequencing modalities.
|
|
|
Overall design |
We performed CITE-seq on immune cell populations from human blood and different human organ donor tissues.
|
|
|
Contributor(s) |
Caron D, Wells S, Szabo P, Chen D, Farber D, Sims PA |
Citation(s) |
37461466 |
|
Submission date |
Apr 14, 2023 |
Last update date |
Aug 08, 2023 |
Contact name |
Peter A Sims |
E-mail(s) |
pas2182@columbia.edu
|
Organization name |
Columbia University
|
Street address |
3960 Broadway, Lasker 203AC
|
City |
New York |
State/province |
NY |
ZIP/Postal code |
10032 |
Country |
USA |
|
|
Platforms (2) |
GPL18573 |
Illumina NextSeq 500 (Homo sapiens) |
GPL24676 |
Illumina NovaSeq 6000 (Homo sapiens) |
|
Samples (102)
|
|
Relations |
BioProject |
PRJNA955827 |
Supplementary file |
Size |
Download |
File type/resource |
GSE229791_D496.adt.matrix.txt.gz |
11.9 Mb |
(ftp)(http) |
TXT |
GSE229791_D496.gex.matrix.txt.gz |
276.3 Mb |
(ftp)(http) |
TXT |
GSE229791_D503.adt.matrix.txt.gz |
11.0 Mb |
(ftp)(http) |
TXT |
GSE229791_D503.gex.matrix.txt.gz |
300.5 Mb |
(ftp)(http) |
TXT |
GSE229791_PDC101.adt.matrix.txt.gz |
1.7 Mb |
(ftp)(http) |
TXT |
GSE229791_PDC101.gex.matrix.txt.gz |
17.7 Mb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|