GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM2092518 Query DataSets for GSM2092518
Status Public on May 18, 2016
Title TCR109A.BC7
Sample type SRA
Source name patient 3 (non-glioma), TCR-alpha repertoire of peripheral blood
Organism Homo sapiens
Characteristics individual: patient 3 (non-glioma)
tissue: peripheral blood
tcr repertoire: TCR-alpha
Extracted molecule total RNA
Extraction protocol Total RNA was isolated from cryofrozen human samples using the TissueLyzer system with Qiazol and steel beads (Qiagen). RNAseq was performed on total RNA through the Columbia Genome Center, and concentration and RIN score obtained by BioAnalyzer. For TCR library preparation, mRNA was isolated using magnetic oligo-dT Dynabeads (Life Technologies), according to manufacturer protocol, and final concentration determined by Qubit (Life Technologies). Total RNA was obtained from 2.5x10^6 PBMCs using the Qiagen RNeasy system according to manufacturer instructions.
We utilized the commercially available iRepertoire primer set for nested amplicon rescued multiplex PCR (arm-PCR) of the complementarity-determining region 3 (CDR3) of the human TCR and TCR chains and addition of adaptors for Illumina platform sequencing. ~400ng of mRNA or total RNA was included for tumor libraries or PBMC libraries respectively. In our modification of the manufacturer’s protocol, reverse transcription was conducted with a one-step reverse transcription and amplification kit (Qiagen) at 50°C for 40min, 95°C for 15min, seventeen non-exponential thermocycles (94°C/30s, 60°C/2min, 72°C/30s) and ten exponential thermocycles (94°C/30s, 72°C/2min). PCR1 product was purified using AmpureX-100 magnetic beads (Agencourt). Secondary amplification of 40% of the resulting product was performed by multiplex PCR (Qiagen), allowing addition of Illumina adapter sequences, with 34 thermocycles (94°C/30s, 55°C/30s, 72°C/30s). Libraries were purified by agarose gel electrophoresis, cutting between 200-400bp (predicted amplicon size 210-310bp), gel extracted (Qiagen), and sequenced on an Illumina MiSeq, obtaining paired-end reads 220-250nt as described below.
Library strategy OTHER
Library source transcriptomic
Library selection other
Instrument model Illumina MiSeq
Description TCR-alpha RNA
Data processing Library strategy: TCR-seq
TCRseq libraries, error correction: Raw paired-end data (FASTQ output option on MiSeq) was processed for sequencing error correction using a custom software pipeline. Each mate-pair was first validated using the Burrows-Wheeler Aligner local alignment algorithm, then fully aligned (Smith-Waterman) for site-by-site comparison. Overlapping regions were individually scanned for nucleotide consistency at every position. Mismatches were resolved as follows: 1) If the mismatch occurred at a mapped position, the reference allele was selected. 2) If the location was unmapped or varied from the reference, the nucleotide identity was selected from the mate-paired stands by base quality, if the difference in quality differed by an order of magnitude. 3) If the quality score of the site on both paired reads was similar (<10x difference), the nucleotide identity from R1 was used by default. The output reads were constructed by merging the mate-pair overlapping, corrected regions, and appending the 3’ and 5’ overhangs.
TCRseq libraries, demultiplexing: Internal 6mer barcode sequences (available through the support documents at [] were identified within the merged, error-corrected reads using string-search algorithm with linear average search time, whereby a 14-nt sliding window scanned the read from the end proximal to the barcode location (J cassette), the first eight nucleotides corresponding to the end of the conserved constant region of either alpha –chain or beta –chain, and the remaining six nucleotides constituting the manufacturer barcode. Every possible combination of 14-nucleotides was searched for simultaneously using a preprocessed hash table, with a single nucleotide mismatch allowed in each of the constant region and the barcode region, resulting in the assignment of each read to a barcode, or flagging it as “unmatched.” All reads and their assigned barcode ID proceeded to CDR3 motif filtering.
TCRseq libraries, in silico translation and filtering: Error corrected output files were in silico translated as previously described, using a custom pipeline writen in Perl, then filtered for sequences encoding an in-frame C/FXFG motif with no intervening stop codons, and thus a productive CDR3. This output was tabulated by V-J cassette combination, aaCDR3, and combined VJ+aaCDR3 clonotype identifier, totCDR3 (Supplementary Table S2), in which a single count represents an error corrected R1-R2 mate pair of reads.
TCRseq libraries, V and J cassette mapping: The presence of V or J cassettes was initially identified based on the coordinates to which the read was mapped in a GRCh37.p8 human reference, in which the Beta-chain on Chr. 7 had been masked (location 7q34 91557-667340) and replaced by a patch from Genbank (GRCh37.p9), which offered more up-to-date sequence data on this locus. Our cassette reference contained the entire set of amino acid sequences and CDR3 associated motifs, including pseudogenes, provided by the ImMunoGeneTics (IMGT) information system, for a comprehensive analysis. Sequences were translated into all three reference frames, and the correct CDR3 sequence identified based on the conserved cysteine in the V cassette and the conserved four residue motif in the J cassette. In order to define the interval on which the TRAV and TRBV cassette family members might be resolved, we used the TCRseq data from all subjects to find the most uptream nucleotide position for each V in the reference sequence included in ≥95% of its mapped reads (for V). Within this interval and exclusive of the final 5nt, the hamming distances between members of the same family were calculated. All cases in which two or more validated cassettes were found to have a hamming distance of 0 within the shorter of the two nt intervals were annotated as indistinguishable (e.g. TRBV6-2/3). Non-validated (pseudogene and ORF) V and J cassettes to which < 100 reads per library were mapped (across all libraries in this study) were considered spurious and discarded, while non-validated but productive, sequence-resolvable cassettes to which > 100 reads per library were mapped were included in the final data set despite their lack of previous validation (TRBV12-2, TRBV26, and TRBV7-1).
RNAseq libraries were mapped to the GRCh37 genome and Illumina iGenomes transcriptome using Tophat 2 and uniquely mapped read counts were computed using HTSeq. The resulting read counts were then normalized using the 'estimateSizeFactors' function in DESeq2.
Genome_build: GRCh37.p8, with TCRb locus (7q34 91557-667340) masked and replaced by a patch from Genbank (GRCh37.p9); V and J cassette sequences for reference mapping imported from IMGT
Supplementary_files_format_and_content: [TCRseq] text files in tsv format that contain the VJ cassette combination, number of reads, VJ cassette gene names, amino acid CDR3 sequence, nucleotide CDR3 sequence, and insertion length for each observed clone.
Supplementary_files_format_and_content: [RNAseq] Excel spreadhseet containing the number of uniquely mapped reads associated with each gene calculated using HTSeq.
Submission date Mar 17, 2016
Last update date May 15, 2019
Contact name Peter A Sims
Organization name Columbia University
Street address 3960 Broadway, Lasker 203AC
City New York
State/province NY
ZIP/Postal code 10032
Country USA
Platform ID GPL15520
Series (1)
GSE79338 The Glioma-Infiltrating T Cell Receptor Repertoire
BioSample SAMN04565531
SRA SRX1639596

Supplementary file Size Download File type/resource
GSM2092518_TCR109A.BC7.productive.tsv.gz 6.0 Mb (ftp)(http) TSV
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap