NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM5100070 Query DataSets for GSM5100070
Status Public on Apr 28, 2021
Title BAL - Subject_2_3
Sample type SRA
 
Source name BAL - healthy volunteers
Organism Homo sapiens
Characteristics tissue: lung
timepoint: N/A
disease state: Uninfected
strain: N/A
pathogen: N/A
Extracted molecule polyA RNA
Extraction protocol Bronchoscopy and BAL were performed as described previously (Mwandumba HC et al., J Immunol 2004; 172:4592-4598; Jambo KC et al., Thorax 2011; 66:375-382). A cell suspension containing 5 x106 BAC was centrifuged at 500g for 8 min at 4oC, the supernatant was discarded, and the cells were resuspended in 1mL of chilled 90% methanol (Merck Life Science, Dorset, UK).
Fixed BAL samples were equilibrated on ice for 15’, washed twice in rehydration buffer to remove residual methanol, stained with different HTO antibodies for 30’ at 4C (as described above), and washed twice in cell staining buffer + 0.5U/ul of Rnase inhibitor (Sigma). Subsequently, the amount of cells recovered from each sample has been quantified and different samples were mixed together in a 1:1 ratio and multiplexed into the same 10X run. Libraries were generated using the same modifications described for mouse scRNA-seq samples.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NextSeq 500
 
Description Subject 2 and 3 for the human integrated dataset - HTO barcoded
scRNA-seq
bal_integrated.csv
bal_integrated.RDS
Data processing scRNA-seq: Raw sequencing reads from each run were processed using the software CellRanger (v.3.0) from 10X for the mRNA libraries and CITE-Seq-Count (v. 1.4.3) (https://hoohm.github.io/CITE-seq-Count/ ) for the ADT and HTO libraries, to generate raw count matrices for both mRNA and proteins. Downstream analysis of the datasets was performed in Seurat (v. 3.1.4) as previously described.(Stuart et al., 2019) Briefly, we first filtered out cells with < 200 unique genes/cell and with very-high mitochondrial content (> 30% mitochondrial reads), then multiplexed samples were demultiplexed using the MULTIseqDemux function in Seurat ((McGinnis et al., 2019)), doublets and empty droplets were identified and removed, and the correct identity assigned to each sample. scRNA-seq datasets for each sample were pre-processed using the regularized negative binomial regression in Seurat (regressing out both the number of counts and percentage of mitochondrial reads) (Hafemeister and Satija, 2019) and analyzed to identify myeloid cell subsets. Myeloid cell subsets from the different samples were then merged to generate an annotated object containing information from all the different datasets. Subsequently, the RNA slot (containing the raw counts) of the merged object was used as an input for Harmony, as previously described (Korsunsky et al., 2019). Briefly, raw counts were log-normalized and the first 3000 highly variable genes identified. The expression of these genes was scaled and centered, PCA was computed on the scaled expression values and data integration with Harmony was performed using both “Batch” and “Infection Status” as covariates for the murine datasets and “Batch” for the human datasets. The aligned Harmony embeddings were then used to perform graph-based cluster detection using PCs (principal components) that were identified as statistically significant by the jackstraw method. (Chung and Storey, 2015) The Louvain algorithm was used for community detection. (Blondel, 2008) We annotated clusters of cell types by reference-based and canonical marker genes analysis, as described in the text. Trajectory and pseudotime analysis was performed in Monocle (v3.0), using the SeuratWrappers package in R (v. 0.2.0) to convert the integrated Seurat object. For unbiased trajectory and pseudotime analysis of the macrophage populations in both murine and human datasets, all cells previously classified as macrophages were assigned to the same partition and trajectory/pseudotime analysis was conducted as previously described.(Qiu et al., 2017) Pathway enrichment analysis was performed in g:Profiler, using an ordered by fold change list of genes as a query, using the g:SCS method for multiple testing correction and the Reactome database as a data source. Only pathways enriched with FDR <0.05 were considered statistically significative.
Dual RNA-seq: Data analysis for the Dual RNA-seq datasets was performed as previously described. (Pisu et al., 2020a; Pisu et al., 2020b) Briefly, low quality reads and Illumina adapters were removed using FlexBar (v. 3.4) (Roehr et al., 2017) while remaining rRNA reads were removed using Bowtie2 (-sensitive mode) (Langmead and Salzberg, 2012) and a custom GTF file. The filtered fastq files were split using Bowtie2 (–very-sensitive mode) into species-specific files using the two reference genomes, GRCm38.94 for Mus musculus and NCBI assembly GCA_00668235.1 for Mtb Erdman. Hisat2 (v. 2.1.0) (Kim et al., 2015) was then used to align Mtb reads to the M.tuberculosis transcriptome and raw read counts for each sample were obtained using HTSeq. (v. 0.11.0)(Anders et al., 2015) The raw read count matrices obtained in this study (hspx-high and hspx-low) were then combined with the matrices obtained from samples of a previous study (Mtb in AM and Mtb in IM) (Pisu et al., 2020a) to compare bacterial responses belonging to ontogeny vs activation of the host immune cell. Exploratory, visualization and differential gene expression analysis was then carried out in R using DESeq2 and APEGLM for LFC estimation, as previously described (Pisu et al., 2020a). Genes with less than 10 raw counts across all samples were excluded from downstream analysis. The protein-protein interaction network for the Mtb genes was created in Cytoscape using the String app. (v. 1.4.1) . Only high-confidence interactions (co-expression, experiments, neighborhood, co-occurrence with a score > 0.7) relative to query proteins were considered.
ATAC-seq: Paired-end sequencing reads of length 2 x 42 bp were trimmed to remove adapter sequences using cutadapt (ver. 2.5) (Martin, 2011) and quality checked using FASTQC. The reads were aligned to the mm10 (GRCm38) mouse reference genome using bwa (ver. 0.7.17) (Li and Durbin, 2009). Low quality alignments (MAPQ < 30), secondary alignments, unmapped reads and reads with unmapped mate were discarded using samtools. For reads with multiple alignments only five best alignments were retained. PCR duplicates were removed using Picard MarkDuplicates. The remaining clean alignments were analysed for fragment length distribution using the ATACseqQC R/Bioconductor library (Ou et al., 2018). Read alignments to the positive strand were shifted 4 bp downstream and alignments to the negative strand were shifted 5bp upstream to centre the reads on the transposon binding events. Subsequently, peaks were called using macs2 (ver. 2.1.1.20160309) callpeak command with the parameters “-p 0.01 --shift -75 --extsize 150 --nomodel -B --SPMR --keep-dup all --call-summits”. Fold enrichment and p-value tracks normalized to input were produced using macs2 bdgcmp command (options -m FE and -m ppois respectively). Peaks with a q-value < 0.01 were shortlisted. Peaks in the blacklisted regions specified by ENCODE project were removed. For each of the four sample groups, viz. Uninfected AM, Infected AM, Uninfected IM and Infected IM, pooled samples were generated by pooling together reads from all individual replicates of the same group. These pooled samples were also analysed in the same manner as above. Peaks called in the pooled samples were compared against peaks called in the individual samples of the same condition using irreproducible discovery rate (IDR) framework. High quality reproducible peaks were shortlisted based on IDR < 0.05. Subsequently, the IDR peaks from the four groups were merged using “bedtools merge” to generate a final set of peaks (n = 127,513). Counts of reads aligned at each peak interval in each sample was determined using summarizeOverlaps function of the GenomicAlignments R/Bioconductor package (Lawrence et al., 2013) and TMM normalized using edgeR (Robinson et al., 2010).
Genome_build: Mus Musculus: GRCm38.94 / Homo Sapiens: GRCh38.98 / M.tuberculosis: GCA_00668235.1
Supplementary_files_format_and_content: TSV files: contains raw count matrices to import in R for downstream analysis
Supplementary_files_format_and_content: CSV files: contains raw or normalized count data, they were generated from the integrated Seurat objects and contain count data from all datasets
Supplementary_files_format_and_content: RDS file: Integrated Seurat objects that contain all datasets, metadata, dimensional reductions and cluster informations described in the manuscript
 
Submission date Feb 22, 2021
Last update date Apr 28, 2021
Contact name Davide Pisu
E-mail(s) dp554@cornell.edu
Phone 6072620103
Organization name Cornell University
Department Microbiology and Immunology
Lab David G. Russell
Street address 930 Campus Road
City Ithaca
State/province NY
ZIP/Postal code 14853
Country USA
 
Platform ID GPL18573
Series (1)
GSE167232 Integration of M. tuberculosis phenotype with single cell RNA-seq to interrogate host macrophage heterogeneity in vivo.
Relations
BioSample SAMN18026320
SRA SRX10149676

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap