NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM4292420 Query DataSets for GSM4292420
Status Public on Jun 01, 2020
Title rLP5_frag_DNA1_1
Sample type SRA
 
Source name Escherichia coli
Organism Escherichia coli str. K-12 substr. MG1655
Characteristics strain/background: MG1655
Growth protocol Luria-Bertani (LB) Rich Media
Extracted molecule genomic DNA
Extraction protocol Qiagen Puregene Yeast/Bact. Kit 2
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina NextSeq 500
 
Description Sequencing DNA barcode counts. Genomic DNA extracted from an E. coli population with single barcoded promoter variants integrated into the nth-ydgr intergenic region. This is the 1st technical replicate of the 1st biological replicate.
processed data file: U00096.2_frag-rLP5_LB_expression.txt
Data processing Counts for each unique barcode in the census files were complete in Unix as follows: Raw sequences were extracted from each fastq file and the first 20 bp (corresponding to the barcode) was extracted. This sequence was reverse complemented and the entire file was sorted before counting the number of counts of each barcode. Counts were normalized as a proportion of totals reads per sample and all samples aggregated together in R.
Barcode mapping was completed as follows: Demultiplexed reads were paired using Paired-End reAd mergeR (PEAR v0.9.1, default settings). Custom python code was used to identify reads corresponding to perfectly synthesized promoters and their respective barcodes. Briefly, this code searched the first 150 bp of each read for perfect matches to library variants. For reads with perfect matches, the last 20 bp of each read (the barcode) was extracted and a list was compiled mapping each barcode to the most frequently associated library variant. A single barcode appears many times in the sequencing data, and we took steps to ensure a barcode consistently mapped to the same variant. We required that all variants mapped to a single barcode be within an edit distance (Levenshtein distance) of 5 from one another (five single bp changes between the two sequences). We determined this number by bootstrapping a distribution of the edit distance between any two random sequences in our variant library, and setting the threshold to the first percentile (1%) of this bootstrapped distribution. Additionally, each barcode had to appear at least three times in order to be considered for downstream analysis. This step hopefully eliminates barcodes which contained sequencing errors.
Promoter variant expression was calculated in R by assigning barcodes to their mapped promoter, and for each promoter, dividing the sum of all RNA counts for all of its barcodes by the sum of all DNA counts for all of its barcodes.
Genome_build: U00096.2
Supplementary_files_format_and_content: *txt: Tab-delimited text files
 
Submission date Jan 31, 2020
Last update date Jun 01, 2020
Contact name Guillaume Urtecho
E-mail(s) gurtecho@g.ucla.edu
Organization name University of California, Los Angeles
Department Molecular Biology
Lab Kosuri Lab
Street address 607 Charles E. Young Drive
City Los Angeles
State/province CA
ZIP/Postal code 90095
Country USA
 
Platform ID GPL21117
Series (1)
GSE144621 Genome-wide Functional Characterization of ​Escherichia coli Promoters and Sequence Elements Encoding Their Regulation
Relations
BioSample SAMN13957759
SRA SRX7656728

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap