NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE207933 Query DataSets for GSE207933
Status Public on Oct 01, 2022
Title Long non-coding RNAs are markers of Schistosoma mansoni single-cell populations
Organism Schistosoma mansoni
Experiment type Expression profiling by high throughput sequencing
Third-party reanalysis
Summary Schistosoma mansoni is a flatworm that causes schistosomiasis, a neglected tropical disease that affects over 200 million people worldwide. New therapeutic targets are needed with only one drug available for treatment and no vaccine. Long non-coding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with low or no protein-coding potential. In other organisms, they have been shown as involved with reproduction, stem cell maintenance and drug resistance, and they tend to exhibit tissue-specific expression patterns. S. mansoni expresses thousands of lncRNA genes; however, the cell type expression patterns of lncRNAs in the parasite remain uncharacterized. Here, we have re-analyzed public single-cell RNA-sequencing (scRNA-seq) data obtained from adult S. mansoni to identify the lncRNAs signature of adult schistosome cell types. A total of 8023 lncRNAs (79% of all lncRNAs) was detected. Analyses of the lncRNAs expression profiles in the cells using statistically stringent criteria were performed to identify 74 lncRNA gene markers of cell clusters. Male gamete and tegument lineage clusters contained most of the cluster-specific lncRNA markers. We also identified lncRNA markers of specific neural clusters. Whole-mount in situ hybridization (WISH) and double fluorescence in situ hybridization were used to validate the cluster-specific expression of 13 out of 16 lncRNA gene markers (81%) in the male and female adult parasite tissues; for one of these 16 gene loci, probes for two different lncRNA isoforms were used, which showed differential isoforms usage in testis and ovary. An atlas of the expression profiles across the cell clusters of all lncRNAs detected in our analysis is available as a public website resource. The results presented here give strong support to a tissue-specific expression and to a regulated expression program of lncRNAs in S. mansoni. This will be the basis for further exploration of lncRNA genes as potential therapeutic targets.
 
Overall design We reanalyzed the scRNA-seq from Wendt et al. (2020) SRA project PRJNA611777 to measure gene expression of lncRNAs described in Maciel et al. (2019) from a gtf file downloaded from http://verjolab.usp.br/public/schMan/schMan3/macielEtAl2019/files/, along with the genome assembly Smansoni_v7 from WormBase (Howe et al., 2017).

DATA PROCESSING SUMMARY:
We reanalyzed the scRNA-seq from Wendt et al. (2020) SRA project PRJNA611777, gene expression was calculated using Starsolo version 2.7.9a (Kaminow et al., 2021) along with a merged gene annotation file containing protein-coding genes, pseudogenes (Schistosoma mansoni WormBase gene annotation version 16 (Howe et al., 2017) ) and lncRNA genes (Maciel et al., 2019) from a gtf file downloaded from http://verjolab.usp.br/public/schMan/schMan3/macielEtAl2019/files/ , along with the genome assembly Smansoni_v7 from WormBase (Howe et al., 2017) with the following parameters “-- soloType CB_UMI_Simple --soloCellFilter EmptyDrops_CR --soloFeatures Gene Velocyto GeneFull --soloMultiMappers EM --soloCBwhitelist barcodes_whitelist”. For all samples except SRX7888067, we used the barcode whitelist from Cell Ranger chemistry V2; for sample SRX7888067, we used the barcode whitelist from chemistry V3. Filtered count matrices for all samples were imported into R (Team, 2018) using Seurat v4.0.6.9900 (Hao et al., 2021) and cells were further removed from each matrix when the number of features was less than 500, number of counts less than 1000 and greater than 20000, and percentage of mitochondrial genes greater than 3%. Matrices from all samples were normalized using the NormalizeData function, and variable features were identified using FindVariableFeatures with the following parameters “selection. Method = ‘vst’, nfeatures = 2000”. Additionally, we scaled the matrices and found principal components using the functions ScaleData, and RunPCA with the parameters “npcs = 100”. To generate the count matrix of all samples, we used the scRNA-seq integration approach from Seurat (Stuart et al., 2019). For that, we first identified integration features using the function SelectIntegrationFeatures, then the integration anchors were identified using the function FindIntegrationAnchors with the following parameters “k.anchor = 20,dims = 1:78, anchor.features = features, reduction = ‘rpca’” and finally integrated the matrices using IntegrateData function. Then, the integrated matrix was scaled using the function ScaleData, and principal components were identified using the function runPCA with the following parameters “npcs = 100”. A final sparse matrix with 48094 cells was obtained containing expression data for protein-coding genes, pseudogenes, and lncRNAs; and it was used for the following procedures.

Grant Number: 18/23693-5
Grant Title: Mechanisms of action of long non-coding RNAs involved with gene activation programs in eukaryotes.
Funding source: FAPESP - The São Paulo research Foundation
Grantee: Sergio Verjovski-Almeida
 
Contributor(s) Morales-Vicente D, Tahira AC, Verjovski-Almeida S
Citation missing Has this study been published? Please login to update or notify GEO.
NIH grant(s)
Grant ID Grant title Affiliation Name
R01 AI121037 The Biology of Stem Cells in the Human Parasite Schistosoma Mansoni UT SOUTHWESTERN MEDICAL CENTER James J Collins
Submission date Jul 11, 2022
Last update date Oct 01, 2022
Contact name Ana Carolina Tahira
E-mail(s) tahira.ana@gmail.com
Phone +551126673851
Organization name Instituto Butantan
Department Parasitologia
Lab Laboratório de Parasitologia, Laboratório de expressão gênica em eucariotos
Street address Avenida Vital Brasil, 1500 prédio 55
City Sao Paulo
State/province São Paulo
ZIP/Postal code 05503-900
Country Brazil
 
Relations
Reanalysis of GSE146736

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE207933_Table-counts-scRNA.txt.gz 76.7 Mb (ftp)(http) TXT
GSE207933_sm_all_genes_seurat.rds.gz 2.9 Gb (ftp)(http) RDS
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap