GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM5032995

Query DataSets for GSM5032995

Status

Public on Mar 13, 2021

Title

CYP2C9_abundance_PacBio_SMRT_cell_2

Sample type

SRA

Source name

Synthetic sequence

Organism

Escherichia coli

Characteristics

strain: Top10
collection_date: 01-May-2019

Growth protocol

Yeast cells were grown in YPD media supplemented with 200 μg/mL G418 until saturated, then diluted into YP media containing 2% (w/v) raffinose supplemented with 200 μg/mL G418 and grown for at least two cell doublings, then cells were inoculated to OD 0.0125 into fresh YP media containing 2% (w/v) galactose and 200 μg/mL G418 and collected after 7 doublings. HEK 293T TetBxb1BFP cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum, 100 U/mL penicillin, 0.1 mg/mL streptomycin, and 2.5 μg/mL doxycycline.

Extracted molecule

genomic DNA

Extraction protocol

Cells were collected, pelleted by centrifugation and stored at −20 °C. For the activity library, plasmid DNA was prepared using a Zymoprep Yeast Plasmid Miniprep I kit (Zymo). For the abundance library, genomic DNA was prepared using a DNeasy kit, according to the manufacturer’s instructions (Qiagen), with the addition of a 30 min incubation at 37 °C with RNase in the re-suspension step.
For the activity library: for each bin, the purified DNA was split over two 25 uL PCR reactions containing Kapa Robust master mix, and CJA120/CJA124 primers to tag the barcodes with a unique molecular index (UMI). UMI-tagging PCR was performed using the following conditions: initial denaturation 95˚C 3 minutes, followed by two cycles of (95 ˚C 20 seconds, 60 ˚C 15 seconds, 72 ˚C 30 seconds), then purified using 1X AMPure XP beads (Beckman Coulter). Purified UMI-tagged amplicons were then amplified with various forward and reverse indexing primers to index each sample and add the p5 and p7 Illumina cluster-generating sequences. This PCR was performed with Kapa Robust and SYBR green on a Bio-Rad MiniOpticon qPCR machine with the following PCR conditions: 95°C for 3 minutes, up to 30 cycles of (95°C for 20 seconds, 65°C for 15 seconds, 72°C for 30 seconds), reactions were monitored and removed before saturation of the SYBR green signal, at around 20 cycles. The amplicons were pooled and gel purified. For the abundance library: for each bin, the purified DNA was split over eight 50 μL PCR reactions containing Q5 High-Fidelity Master Mix and KAM499/VKORampR 1.1 primers. The reaction conditions were 98 °C for 30 seconds, five cycles of (98°C for 10 seconds, 65°C for 20 seconds, 72°C for 1 minute), 72°C for 2 minutes. The eight PCR reactions were pooled and purified using 1x Ampure XP (Beckman Coulter). Next, to add indices and Illumina cluster-generating sequences, the cleaned product was mixed with Q5 High-Fidelity Master Mix; VKOR_indexF_1.1 and one of the indexed reverse primers, PTEN_seq_R1a through PTEN_seq_R2a. These reactions were run with SYBR Green on a Bio-Rad MiniOpticon; reactions conditions were 3 minutes at 95°C, 20 cycles of (95°C for 15 seconds, 60°C for 15 seconds, 72°C for 30 seconds), and 72°C for 3 min. The amplicons were pooled and gel purified.

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Sequel II

Description

Library and barcode cassette sequencing from plasmid DNA.

Data processing

Libraries were sequenced on NextSeq 550 instruments (Illumina) and base called using the instruments' Real Time Analysis software.
Sequencing reads were converted to fastq format and de-multiplexed with bcl2fastq.
Barcode and UMI sequencing reads for all CYP2C9 activity replicates were trimmed and filtered for minimum base quality Q20 using FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and collapsed according to UMI sequences using a custom script. Barcode paired sequencing reads for all CYP2C9 abundance replicates were joined using PEAR software (https://cme.h-its.org/exelixis/web/software/pear/). FASTQ files from technical replicate amplification were concatenated for analysis with Enrich2.
The abundance and activity scores for CYP2C9 were calculated as follows: The count for each variant in a bin was divided by the sum of counts recorded in that bin to obtain the frequency of each variant within that bin. This calculation was repeated for every bin in each replicate experiment. The summed counts of each variant in all four bins of an experiment was divided by the summed counts of all variants in all four bins of an experiment to obtain the total frequency value for each variant for each experiment. This total frequency value was used for filtering low-frequency variants out of the subsequent calculations. Next, a weighted average was calculated for each variant. Finally, for each experiment, an abundance or activity score for each variant was obtained by subjecting the weighted average of each variant to min-max normalization, using the weighted average value of synonymous variants, which was given a score of 1, and the median weighted average value for nonsense variants, which was given a score of 0. The final activity or abundance score for each variant was calculated by taking the mean of the min-max normalized scores across the replicate experiments in which it could have been observed. Only variants which were scored in two or more replicate experiments were retained in the analysis. A standard error for each score was calculated by dividing the standard deviation of the min-max normalized values for each variant by the square root of the number of replicate experiments in which it was observed.
Genome_build: None
Supplementary_files_format_and_content: The CYP2C9_activity_barcode_variant_map.tsv and CYP2C9_abundance_barcode_variant_map.tsv processed data files are tab-separated text file format, containing the variant-barcode map determined by PacBio sequencing for the activity and abundance libraries, respectively. The CYP2C9_activity_abundance_scores.csv processed data file is a comma-separated text file format, containing the CYP2C9 variant activity and abundance scores calculated as described above.

Submission date

Jan 24, 2021

Last update date

Mar 13, 2021

Contact name

Clara Amorosi

Organization name

University of Washington

Department

Genome Sciences

Lab

Dunham