SRA Toolkit Documentation

Tool: vdb-dump

Usage:

vdb-dump [options] <path/file> [<path/file> ...]

vdb-dump [options] <accession>

VDB is the native format for SRA datafiles. vdb-dump provides a rapid, highly configurable output of SRA data that is best suited to uses like data re-formatting and custom scripting/programing.

Frequently Used Options:

General:
-h	\|	--help	Displays ALL options, general usage, and version information.
-V	\|	--version	Display the version of the program.
Data formatting:
-I	\|	--row_id_on	Print row id.
-N	\|	--colname_off	Do not print column-names.
-X	\|	--in_hex	Print numbers in hex.
-E	\|	--table_enum	Enumerates tables.
-T	\|	--table <table(s)>	Dumps only those tables included in a comma-separated list
-o	\|	--column_enum_short	Lists available columns in short form (recommended)
-O	\|	--column_enum	Lists columns in extended form
		--phys	Lists physical columns
-C	\|	--columns <column(s)>	Dumps only those columns included in a comma-separated list
-x	\|	--exclude	Exclude the specified columns
-R	\|	--rows <rows>	Dumps the specified rows
-M	\|	--max_length <max_length>	limits line length
-f	\|	--format <format>	dump format (csv,xml,json,piped,tab,fastq,fasta)
Workflow and piping:
		--output-file	write output to this file
		--gzip	compress output using gzip
		--bzip2	compress output using bzip2
		--output-buffer-size	size of output-buffer, 0...none
		--disable-multithreading	disable multithreading
		--option-file <file>	Read more options and parameters from the file.

Use examples:

vdb-dump -f tab -C READ <accession> | awk '{print ">" "<accession>." NR "\n" $0}' > <accession>.fasta

Using core Linux utilities, produces a fasta format output. Note that this produces a single output file, even for paired-end data.

vdb-dump -f tab -C QUALITY <accession> | sed 's/,//g' | awk '{print ">" "<accession>." NR "n" $0}' > <accession>.qual

Using core Linux utilities, produces a qual format output. Note that this produces a single output file, even for paired-end data.

vdb-dump -T REFERENCE -C SEQ_ID -N <accession> | sort -u

vdb-dump output is formatted using the Linux utility 'sort' to produce a list of reference sequence names if the data are reference compressed.

perl ref-dump.pl <accession>

Perl script available here that calls vdb-dump to output reference sequence(s) in fasta format. Note that vdb-dump must be in the user’s $PATH in order for this script to function.

Possible errors and their solution:


                …failed with curl-error 'CURLE_COULDNT_RESOLVE_HOST'…

The toolkit is attempting to contact or download data from NCBI, but is unable to connect. Please confirm that your computer or server has Internet connectivity.