|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jan 25, 2010 |
Title |
U87MG-cell culture [AB SOLiD] |
Sample type |
SRA |
|
|
Source name |
brain tumor cell line
|
Organism |
Homo sapiens |
Characteristics |
cell line: U87MG
|
Growth protocol |
Cells were grown in DMEM with 10% FBS.
|
Extracted molecule |
genomic DNA |
Extraction protocol |
Long-Mate-Paired Library Construction: The U87MG genomic DNA 2x 50bp long mate-paired library construction was carried out using the reagents and protocol provided by Applied Biosystems (SOLiD 3 System Library Preparation Guide). Briefly, 45ug of genomic DNA was fragmented by HydroShear (Digilab Genomic Solutions Inc) to 1.0-2.5kb. The fragmented DNA was repaired by the End-It DNA End-Repair Kit (Epicentre). Subsequently, the LMP CAP adaptor was ligated to the ends. DNA Fragments between 1.2-1.7kb were selected by 1.0 % agarose gel to avoid concatamers and circularized with a biotinylated internal adaptor. Non-circularized DNA fragments were eliminated by Plasmid-Safe ATP-Dependent DNase (Epicentre) and 3ug of circularized DNA was recovered after purification. Original DNA nicks at the LMP CAP oligo/genomic insert border were translated into the target genomic DNA about 100bp by nick translation using E. coli DNA polymerase I. Fragments containing the target genomic DNA and adaptors were cleaved from the circularized DNA by single-strand specific S1 nuclease. P1 and P2 adaptors were ligated to the fragments and the ligated mixture was used to create two separate libraries with 10 cycles of PCR amplification. Finally, 250-300bp fragments were selected to generate mate paired sequencing libraries with average target genomic DNA on each end around 90bp by excision from PAGE gel and use as emulsion PCR template. Templated Beads Preparation: The templated beads preparation was performed using the reagents and protocol from the manufacturer (Applied Biosystems SOLiD 3 Templated Beads Preparation Guide). SOLiD 3 Sequencing: The 2x50b mate-paired sequencing was performed exactly according to the Applied Biosystems SOLiD 3 System Instrument Operation Guide and using the reagents from Applied Biosystems.
|
|
|
Library strategy |
WGS |
Library source |
genomic |
Library selection |
other |
Instrument model |
AB SOLiD System 3.0 |
|
|
Description |
genomic DNA library from 10ug of source DNA extracted from cultured U87MG cells
|
Data processing |
Alignment: We used Blat-like Fast Accurate Search Tool version 0.5.3 (BFAST http://bfast.sourceforge.net) to perform sequence alignment of the two-base encoded reads off the ABI SOLiD to the NCBI human reference genome (build 36.1). Utilizing the local alignment algorithm included in BFAST, we were able to simultaneously decode the short reads, while searching for color errors (encoding errors), base changes, insertions, and deletions. We also set parameters to use only informative keys when looking up reads in each index (BFAST parameter -K 8), and to ignore reads with too many CALs aggregated across all indexes (BFAST parameter -M 384). If reads mapped to greater than 384 locations, then they were categorized as ‘unmapped’. We then performed local alignment for each of the returned CALs, simultaneously decoding the read from color space searching for color errors (encoding errors), base changes, insertions, and deletions. We choose the "best scoring" alignment, accepting an alignment only if it was at least the equivalent edit distance of two color errors away from the next best alignment. This is approximately similar to a ‘mapping quality’ of 20 or better from the MAQ program output, for reference. We removed duplicate reads using the alignment filtering utility found in DNAA (http://dnaa.sourceforge.net). For single-end and mate-paired reads where only one end mapped, we removed duplicates based on reads having identical stat positions. For mate-paired reads, we removed duplicates where both ends had the same start position. Variant BED Files: Variant calls from the SAMtools pileup tool were first loaded into a SeqWare QueryEngine database and subsequently filtered to produce BED files. This filtering criteria required that a variant be seen at least 4 times and at most 60 times with an observation occurring on each strand at least once. For SNVs we further enforced the criteria that SNVs should only be called in reads lacking indels and the last 5 bases of the reads were also ignored. This reduced the likelihood that spurious mismappings were used to predict SNVs and eliminated the lowest quality bases from consideration. For small indels (<21bp) we enforced a slightly different filter by requiring that any reads supporting an indel were only allowed to contain one contiguous indel and these reads were not considered if the indel occurred on either the beginning or end of the read. These criteria, like the SNV criteria, were used to reduce the likelihood of using mismapped reads or locally misaligned reads in the variant calling algorithm. The elimination of reads with indels at the beginning or end of the read was intended to remove potential alignment artifacts caused by ambiguous gap introduction due to lack of information at the ends to guide proper alignment. Together, these filtering criteria reduced the likelihood that sequencing errors were identified as SNV or indel variants. We used scripts available in the BFAST toolset and SeqWare Pipeline to filter and annotate the variant calls. Variants passing these filters were further annotated by their overlap with dbSNP version 129. Variants were required to share the same genomic position as a dbSNP entry along with matching the allele present in the database to be considered overlapping. Mapping to dbSNP allowed us to filter out known SNPs from de novo variants.
|
|
|
Submission date |
Jan 21, 2010 |
Last update date |
May 15, 2019 |
Contact name |
Stanley F Nelson |
E-mail(s) |
mjclark@ucla.edu
|
Phone |
3108257920
|
Fax |
3107945446
|
URL |
http://www.genetics.ucla.edu/labs/nelson/
|
Organization name |
UCLA
|
Department |
Human Genetics
|
Lab |
Nelson Lab
|
Street address |
695 Young Dr S
|
City |
Los Angeles |
State/province |
CA |
ZIP/Postal code |
90095 |
Country |
USA |
|
|
Platform ID |
GPL9442 |
Series (1) |
GSE19986 |
U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line |
|
Relations |
SRA |
SRX015657 |
BioSample |
SAMN02196496 |
Supplementary file |
Size |
Download |
File type/resource |
GSM499400_UCLA-NL_U87MG_SNV_in_dbSNP.txt.gz |
97.2 Mb |
(ftp)(http) |
TXT |
GSM499400_UCLA-NL_U87MG_SNV_not_dbSNP.txt.gz |
10.1 Mb |
(ftp)(http) |
TXT |
GSM499400_UCLA-NL_U87MG_indels_in_dbSNP.txt.gz |
3.4 Mb |
(ftp)(http) |
TXT |
GSM499400_UCLA-NL_U87MG_indels_not_dbSNP.txt.gz |
4.7 Mb |
(ftp)(http) |
TXT |
SRA Run Selector |
Processed data provided as supplementary file |
Raw data are available in SRA |
|
|
|
|
|