![](/coreweb/template1/pix/main_left_bg.gif) |
![](/coreweb/template1/pix/pixel.gif) |
GEO help: Mouse over screen elements for information. |
|
Status |
Public on May 18, 2016 |
Title |
TCR109B.BC8 |
Sample type |
SRA |
|
|
Source name |
patient 3 (non-glioma), TCR-beta repertoire of peripheral blood
|
Organism |
Homo sapiens |
Characteristics |
individual: patient 3 (non-glioma) tissue: peripheral blood tcr repertoire: TCR-beta
|
Extracted molecule |
total RNA |
Extraction protocol |
Total RNA was isolated from cryofrozen human samples using the TissueLyzer system with Qiazol and steel beads (Qiagen). RNAseq was performed on total RNA through the Columbia Genome Center, and concentration and RIN score obtained by BioAnalyzer. For TCR library preparation, mRNA was isolated using magnetic oligo-dT Dynabeads (Life Technologies), according to manufacturer protocol, and final concentration determined by Qubit (Life Technologies). Total RNA was obtained from 2.5x10^6 PBMCs using the Qiagen RNeasy system according to manufacturer instructions. We utilized the commercially available iRepertoire primer set for nested amplicon rescued multiplex PCR (arm-PCR) of the complementarity-determining region 3 (CDR3) of the human TCR and TCR chains and addition of adaptors for Illumina platform sequencing. ~400ng of mRNA or total RNA was included for tumor libraries or PBMC libraries respectively. In our modification of the manufacturer’s protocol, reverse transcription was conducted with a one-step reverse transcription and amplification kit (Qiagen) at 50°C for 40min, 95°C for 15min, seventeen non-exponential thermocycles (94°C/30s, 60°C/2min, 72°C/30s) and ten exponential thermocycles (94°C/30s, 72°C/2min). PCR1 product was purified using AmpureX-100 magnetic beads (Agencourt). Secondary amplification of 40% of the resulting product was performed by multiplex PCR (Qiagen), allowing addition of Illumina adapter sequences, with 34 thermocycles (94°C/30s, 55°C/30s, 72°C/30s). Libraries were purified by agarose gel electrophoresis, cutting between 200-400bp (predicted amplicon size 210-310bp), gel extracted (Qiagen), and sequenced on an Illumina MiSeq, obtaining paired-end reads 220-250nt as described below.
|
|
|
Library strategy |
OTHER |
Library source |
transcriptomic |
Library selection |
other |
Instrument model |
Illumina MiSeq |
|
|
Description |
TCR-beta RNA N03-Blood-TCRB
|
Data processing |
Library strategy: TCR-seq TCRseq libraries, error correction: Raw paired-end data (FASTQ output option on MiSeq) was processed for sequencing error correction using a custom software pipeline. Each mate-pair was first validated using the Burrows-Wheeler Aligner local alignment algorithm, then fully aligned (Smith-Waterman) for site-by-site comparison. Overlapping regions were individually scanned for nucleotide consistency at every position. Mismatches were resolved as follows: 1) If the mismatch occurred at a mapped position, the reference allele was selected. 2) If the location was unmapped or varied from the reference, the nucleotide identity was selected from the mate-paired stands by base quality, if the difference in quality differed by an order of magnitude. 3) If the quality score of the site on both paired reads was similar (<10x difference), the nucleotide identity from R1 was used by default. The output reads were constructed by merging the mate-pair overlapping, corrected regions, and appending the 3’ and 5’ overhangs. TCRseq libraries, demultiplexing: Internal 6mer barcode sequences (available through the support documents at [http://media.wix.com/ugd/c9f231_5f7f780864ec41f8a5cc57ffcb2a3277.pdf] were identified within the merged, error-corrected reads using string-search algorithm with linear average search time, whereby a 14-nt sliding window scanned the read from the end proximal to the barcode location (J cassette), the first eight nucleotides corresponding to the end of the conserved constant region of either alpha –chain or beta –chain, and the remaining six nucleotides constituting the manufacturer barcode. Every possible combination of 14-nucleotides was searched for simultaneously using a preprocessed hash table, with a single nucleotide mismatch allowed in each of the constant region and the barcode region, resulting in the assignment of each read to a barcode, or flagging it as “unmatched.” All reads and their assigned barcode ID proceeded to CDR3 motif filtering. TCRseq libraries, in silico translation and filtering: Error corrected output files were in silico translated as previously described, using a custom pipeline writen in Perl, then filtered for sequences encoding an in-frame C/FXFG motif with no intervening stop codons, and thus a productive CDR3. This output was tabulated by V-J cassette combination, aaCDR3, and combined VJ+aaCDR3 clonotype identifier, totCDR3 (Supplementary Table S2), in which a single count represents an error corrected R1-R2 mate pair of reads. TCRseq libraries, V and J cassette mapping: The presence of V or J cassettes was initially identified based on the coordinates to which the read was mapped in a GRCh37.p8 human reference, in which the Beta-chain on Chr. 7 had been masked (location 7q34 91557-667340) and replaced by a patch from Genbank (GRCh37.p9), which offered more up-to-date sequence data on this locus. Our cassette reference contained the entire set of amino acid sequences and CDR3 associated motifs, including pseudogenes, provided by the ImMunoGeneTics (IMGT) information system, for a comprehensive analysis. Sequences were translated into all three reference frames, and the correct CDR3 sequence identified based on the conserved cysteine in the V cassette and the conserved four residue motif in the J cassette. In order to define the interval on which the TRAV and TRBV cassette family members might be resolved, we used the TCRseq data from all subjects to find the most uptream nucleotide position for each V in the reference sequence included in ≥95% of its mapped reads (for V). Within this interval and exclusive of the final 5nt, the hamming distances between members of the same family were calculated. All cases in which two or more validated cassettes were found to have a hamming distance of 0 within the shorter of the two nt intervals were annotated as indistinguishable (e.g. TRBV6-2/3). Non-validated (pseudogene and ORF) V and J cassettes to which < 100 reads per library were mapped (across all libraries in this study) were considered spurious and discarded, while non-validated but productive, sequence-resolvable cassettes to which > 100 reads per library were mapped were included in the final data set despite their lack of previous validation (TRBV12-2, TRBV26, and TRBV7-1). RNAseq libraries were mapped to the GRCh37 genome and Illumina iGenomes transcriptome using Tophat 2 and uniquely mapped read counts were computed using HTSeq. The resulting read counts were then normalized using the 'estimateSizeFactors' function in DESeq2. Genome_build: GRCh37.p8, with TCRb locus (7q34 91557-667340) masked and replaced by a patch from Genbank (GRCh37.p9); V and J cassette sequences for reference mapping imported from IMGT Supplementary_files_format_and_content: [TCRseq] text files in tsv format that contain the VJ cassette combination, number of reads, VJ cassette gene names, amino acid CDR3 sequence, nucleotide CDR3 sequence, and insertion length for each observed clone. Supplementary_files_format_and_content: [RNAseq] Excel spreadhseet containing the number of uniquely mapped reads associated with each gene calculated using HTSeq.
|
|
|
Submission date |
Mar 17, 2016 |
Last update date |
May 15, 2019 |
Contact name |
Peter A Sims |
E-mail(s) |
pas2182@columbia.edu
|
Organization name |
Columbia University
|
Street address |
3960 Broadway, Lasker 203AC
|
City |
New York |
State/province |
NY |
ZIP/Postal code |
10032 |
Country |
USA |
|
|
Platform ID |
GPL15520 |
Series (1) |
GSE79338 |
The Glioma-Infiltrating T Cell Receptor Repertoire |
|
Relations |
BioSample |
SAMN04565532 |
SRA |
SRX1639597 |
Supplementary file |
Size |
Download |
File type/resource |
GSM2092519_TCR109B.BC8.productive.tsv.gz |
6.2 Mb |
(ftp)(http) |
TSV |
SRA Run Selector![Help](/coreweb/images/long_help4.gif) |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
![](/coreweb/template1/pix/main_right_bg.gif) |