AGP Validation
AGP file structure and content can be validated using the agp_validate program. The on-line version of agp_validate checks that the input AGP file conforms to the AGP format Specification, checks the file for internal consistency, and generates a report of component, gap, scaffold and object statistics. The command-line version of agp_validate performs all the same checks as the on-line version but also has options to perform additional checks by comparing the AGP to sequences in FASTA files (see below).
AGP validation on-line
The on-line AGP Validation form can be used to run agp_validate on an uploaded AGP file or on AGP text pasted into the form.
agp_validate command-line program
agp_validate is available by anonymous FTP. Copy the appropriate version for your platform, then uncompress the file, rename it to "agp_validate", and set the "execute" permission (see platform-specific details).
usage overview: agp_validate [-options] [FASTA files...] [AGP files...]
- Run without any options agp_validate will perform a large number of validations on the input AGP files (see below), and will also generate a report of component, gap, scaffold and object statistics.
- If component FASTA sequence files are provided, agp_validate will also check that component spans do not exceed the sequence length.
- If the component sequences are available in GenBank, then agp_validate can perform additional checks using the sequence lengths, versions, and taxonomy ID retrieved from GenBank (-alt and -species options).
- If FASTA sequences for the assembled objects are provided, agp_validate can also check that the sequences match what can be constructed from the AGP and the component sequences (-comp option).
- Information on all the available options can be obtained by executing agp_validate with the -help option.
Validations performed by agp_validate
Error level violations reported include:
- Incorrect number of columns: there should be 9 tab-separated columns.
- Non-positive integers in the following columns:
- 2: object_beg
- 3: object_end
- 4: part_number
- 6b: gap_length
- 7a: component_beg
- 8a: component_end
- object_end is less than the object_beg.
- component_end is less than the component_beg.
- The length of the span specified for the component (in column 7a and 8a) does not match the length of the span specified for the object (in column 2 and 3).
- The length specified for the gap (in column 6b) does not match the length of the span specified for the object (in column 2 and 3).
- Linkage=yes with a gap_type other than scaffold or repeat.
- Object does not start with an object_beg coordinate of 1.
- Object has ranges that are non-sequential and/or overlapping.
- Object does not start with a part_number of 1.
- Object has non-sequential lines and/or lines mixed with other objects.
- Multiple objects with the same object name (column 1).
- Component orientation of 0 or na used for a non-singleton scaffold.
- Invalid terms or symbols in the following columns:
- 5: component_type
- 7b: gap_type
- 8b: linkage
- 9a: orientation
Warning level violations reported include:
- Gap at the beginning or the end of an object.
- Consecutive gap lines of the same type.
- Overlapping spans used for a given component_id.
- Non-draft component_id used more than once.
- Non-draft component spans out of order.
- Extra tab character at the end of the line.
- Component type is not consistent with the line format.
- Component type is not consistent with the component_id accession.
Additional errors and warnings reported when optional validations are invoked:
- Invalid component_id. [-alt or -g option]
- Component is not in GenBank. [-alt option]
- component_id is ambiguous without an explicit version. [-alt option]
- component_end is greater than the sequence length. [-alt option, or FASTA files provided]
Genome Resources
- About WGS
- WGS Browser
- Genome Submission Guide
- Genome Submission Portal
- Update Genome Records
- FAQ
- table2asn
- Submitting Multiple Haplotype Assemblies
- Create Submission Template
- Eukaryotic Annotation Guide
- Prokaryotic Annotation Guide
- Annotation Example Files
- Annotating Genomes with GFF3 or GTF files
- Validation Error Explanations for Genomes
- Discrepancy Report
- NCBI Prokaryotic Genome Annotation Pipeline
- AGP Format
- Metagenome Submission Guide
- Structured Comment
- BioProject
- BioSample