GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2941971

Query DataSets for GSM2941971

Status

Public on Jan 19, 2018

Title

Lib 1 DNA rep 3

Sample type

SRA

Source name

DNA

Organism

Human immunodeficiency virus 1

Characteristics

library: Lib 1
population: DNA
replicate: 3

Extracted molecule

genomic DNA

Extraction protocol

Trizol extraction of RNA
RNA was reverse transcribed into cDNA and a specific region of the RNA was PCRed into dsDNA. This dsDNA was then randomly fragmented before library construction for sequencing on the Illumina HiSeq2000 in 100 paired end read mode. Libraries were prepared by ligating preanneled Illumina multiplex adaptors P-GATCGGAAGAGCACACGTCT and ACACTCTTTCCCTACACGACGCTCTTCCGATCT to the blunted and A-tailed dsDNA. This was then amplified using primer PCR1.0 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT and an index containing primer CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT where NNNNNN is replaced with one of the Illumina TruSeq indexes

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2000

Description

plasmid DNA

Data processing

Demultiplex reads using novobarcode (Novocraft) using parameters -F ILMFQ -l 6
Trim reads using trim galore! (Babraham Bioinformatics) with the parameters -q 30 --phred64 --paired
Trimmed reads we aligned to the HIV-1 genome using novoalign (Novocraft) with the parameters -F STDFQ -o SAM -o SoftClip -r None
Data processing was performed using a Python script using the Numpy, Pysam and Numba (Continuum Analytics) modules. Briefly, this script parses SAM files containing reads aligned to the reference sequence. It fetches alignment matches by decoding the CIGAR and reads are translated into sequence matches or mismatches in comparison to the reference sequence. It then merges match patterns derived from aligned paired-end reads removing redundancy and sequencing errors where overlapping paired reads disagree. Absolute nucleotide occurrences are counted using one of the three modes: The ‘1d’ mode counts the number of A, C, G, U at each position, the ‘2d’ mode is used for inferring the effect of single mutations on protein binding. It counts all non-redundant combinations of two positions (i,j); i ∈I,j ∈J;I={1…532}; J=I \j; and fills an array containing the number of dinucleotide occurrences for each pair of positions [number of AA,AC,…TG,TT]. Results are serialized to a binary .npy file. A separate script converts these numpy arrays into text files compatible with statistical tools written in MATLAB (MathWorks).
Genome_build: HIV-1 NL4-3
Supplementary_files_format_and_content: 1D processed text file list genome position and the number of A, T, C, and G nucleotide found at that position. 2D list non redundant pairs of genome positions and the number of dinucleotides (AA, AT, AC…) found at these positions

Submission date

Jan 18, 2018

Last update date

Jan 24, 2018

Contact name

Redmond P Smyth

E-mail(s)

r.smyth@ibmc-cnrs.unistra.fr

Organization name

IBMC CNRS

Department

Architecture et Réactivité de L'ARN

Lab

Marquet-Paillart

Street address

15 rue Rene Descartes

City

Strasbourg

State/province

Alsace

ZIP/Postal code

67000

Country

France

Platform ID

GPL20319

Series (1)

GSE109386

In cell Mutational Interference Mapping Experiment (in cell MIME) identifies the 5’ PolyA signal as a dual regulator of HIV-1 genomic RNA production and packaging

Relations

BioSample

SAMN08377322

SRA

SRX3586731

Supplementary file	Size	Download	File type/resource
GSM2941971_20_1d.txt.gz	6.5 Kb	(ftp)(http)	TXT
GSM2941971_20_2d.txt.gz	2.8 Mb	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file