SRA Toolkit Documentation
Back to List of the ToolsTool: vdb-dump
Usage:
VDB is the native format for SRA datafiles. vdb-dump provides a rapid, highly configurable output of
SRA data that is best suited to uses like data re-formatting and custom scripting/programing.
vdb-dump [options] <path/file> [<path/file> ...]
vdb-dump [options] <accession>
Frequently Used Options:
General: | ||||
-h | | | --help | Displays ALL options, general usage, and version information. | |
-V | | | --version | Display the version of the program. | |
Data formatting: | ||||
-I | | | --row_id_on | Print row id. | |
-N | | | --colname_off | Do not print column-names. | |
-X | | | --in_hex | Print numbers in hex. | |
-E | | | --table_enum | Enumerates tables. | |
-T | | | --table <table(s)> | Dumps only those tables included in a comma-separated list | |
-o | | | --column_enum_short | Lists available columns in short form (recommended) | |
-O | | | --column_enum | Lists columns in extended form | |
--phys | Lists physical columns | |||
-C | | | --columns <column(s)> | Dumps only those columns included in a comma-separated list | |
-x | | | --exclude | Exclude the specified columns | |
-R | | | --rows <rows> | Dumps the specified rows | |
-M | | | --max_length <max_length> | limits line length | |
-f | | | --format <format> | dump format (csv,xml,json,piped,tab,fastq,fasta) | |
Workflow and piping: | ||||
--output-file | write output to this file | |||
--gzip | compress output using gzip | |||
--bzip2 | compress output using bzip2 | |||
--output-buffer-size | size of output-buffer, 0...none | |||
--disable-multithreading | disable multithreading | |||
--option-file <file> | Read more options and parameters from the file. |
Use examples:
Using core Linux utilities, produces a fasta format output.
Note that this produces a single output file, even for paired-end data.
Using core Linux utilities, produces a qual format output.
Note that this produces a single output file, even for paired-end data.
vdb-dump output is formatted using the Linux utility 'sort' to produce a list
of reference sequence names if the data are reference compressed.
Perl script available here that calls vdb-dump to output reference sequence(s) in fasta format.
Note that vdb-dump must be in the user’s $PATH in order for this script to function.
vdb-dump -f tab -C READ <accession> | awk '{print ">" "<accession>." NR "\n" $0}' > <accession>.fasta
vdb-dump -f tab -C QUALITY <accession> | sed 's/,//g' | awk '{print ">" "<accession>." NR "n" $0}' > <accession>.qual
vdb-dump -T REFERENCE -C SEQ_ID -N <accession> | sort -u
perl ref-dump.pl <accession>
Possible errors and their solution:
The toolkit is attempting to contact or download data from NCBI, but is unable to connect.
Please confirm that your computer or server has Internet connectivity.
…failed with curl-error 'CURLE_COULDNT_RESOLVE_HOST'…