NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM2941971 Query DataSets for GSM2941971
Status Public on Jan 19, 2018
Title Lib 1 DNA rep 3
Sample type SRA
 
Source name DNA
Organism Human immunodeficiency virus 1
Characteristics library: Lib 1
population: DNA
replicate: 3
Extracted molecule genomic DNA
Extraction protocol Trizol extraction of RNA
RNA was reverse transcribed into cDNA and a specific region of the RNA was PCRed into dsDNA. This dsDNA was then randomly fragmented before library construction for sequencing on the Illumina HiSeq2000 in 100 paired end read mode. Libraries were prepared by ligating preanneled Illumina multiplex adaptors P-GATCGGAAGAGCACACGTCT and ACACTCTTTCCCTACACGACGCTCTTCCGATCT to the blunted and A-tailed dsDNA. This was then amplified using primer PCR1.0 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT and an index containing primer CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT where NNNNNN is replaced with one of the Illumina TruSeq indexes
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2000
 
Description plasmid DNA
Data processing Demultiplex reads using novobarcode (Novocraft) using parameters -F ILMFQ -l 6
Trim reads using trim galore! (Babraham Bioinformatics) with the parameters -q 30 --phred64 --paired
Trimmed reads we aligned to the HIV-1 genome using novoalign (Novocraft) with the parameters -F STDFQ -o SAM -o SoftClip -r None
Data processing was performed using a Python script using the Numpy, Pysam and Numba (Continuum Analytics) modules. Briefly, this script parses SAM files containing reads aligned to the reference sequence. It fetches alignment matches by decoding the CIGAR and reads are translated into sequence matches or mismatches in comparison to the reference sequence. It then merges match patterns derived from aligned paired-end reads removing redundancy and sequencing errors where overlapping paired reads disagree. Absolute nucleotide occurrences are counted using one of the three modes: The ‘1d’ mode counts the number of A, C, G, U at each position, the ‘2d’ mode is used for inferring the effect of single mutations on protein binding. It counts all non-redundant combinations of two positions (i,j); i ∈I,j ∈J;I={1…532}; J=I \j; and fills an array containing the number of dinucleotide occurrences for each pair of positions [number of AA,AC,…TG,TT]. Results are serialized to a binary .npy file. A separate script converts these numpy arrays into text files compatible with statistical tools written in MATLAB (MathWorks).
Genome_build: HIV-1 NL4-3
Supplementary_files_format_and_content: 1D processed text file list genome position and the number of A, T, C, and G nucleotide found at that position. 2D list non redundant pairs of genome positions and the number of dinucleotides (AA, AT, AC…) found at these positions
 
Submission date Jan 18, 2018
Last update date Jan 24, 2018
Contact name Redmond P Smyth
E-mail(s) r.smyth@ibmc-cnrs.unistra.fr
Organization name IBMC CNRS
Department Architecture et Réactivité de L'ARN
Lab Marquet-Paillart
Street address 15 rue Rene Descartes
City Strasbourg
State/province Alsace
ZIP/Postal code 67000
Country France
 
Platform ID GPL20319
Series (1)
GSE109386 In cell Mutational Interference Mapping Experiment (in cell MIME) identifies the 5’ PolyA signal as a dual regulator of HIV-1 genomic RNA production and packaging
Relations
BioSample SAMN08377322
SRA SRX3586731

Supplementary file Size Download File type/resource
GSM2941971_20_1d.txt.gz 6.5 Kb (ftp)(http) TXT
GSM2941971_20_2d.txt.gz 2.8 Mb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap