Create Protein Alignments using ProSplign
Introduction
This tutorial will take you through the steps to generate protein to genomic sequence alignment. The underlying algorithm, ProSplign, was developed at NCBI for handling frameshifts and mRNA splicing events. Detailed documentation of this algorithm can be found at https://www.ncbi.nlm.nih.gov/sutils/static/prosplign/prosplign.html.
This tutorial assumes the user has already reviewed the Basic Operation tutorial .
Step 1: Select Sequences to Align
Open Genome Workbench and import a genomic range and a protein from GenBank: NC_000006.12:30699k-30718k, XP_011513306.1
Select the genomic range in the Project View and open it in Graphical Sequence View. Make sure that the Alignments track is visible.
Step 2: Generate the Alignment
Select both items in the Project View.
Right-click the selected items, and then click the Run Tool.
In the Run Tool dialog, in the Alignment Creation section, select the ProSPLIGN tool.
Click Next.
ProSplign generates pairwise alignment between a protein and a genomic sequence. Here you can select multiple genomic ranges or transcripts to be aligned to one protein. The General Options tab allows to set various options. You can choose the genomic sequence strand that may be ‘Plus’, ‘Minus’ or ‘Both’. For sequences that have no introns, uncheck the ‘With introns’ checkbox. The genetic code is automatically determined from the organism associated to the sequence, or you can select it manually. You can also choose three of the scoring parameters: the frameshift and the gap opening cost as well as the gap extension cost for one amino acid.
The Refinement Options tab allows to set options for post-processing the alignment. By default, this option is set. To unset, remove checkmark in the Refine the alignment checkbox. The other checkboxes are responsible for removing only the flank regions and for removing Ns from the end of good regions from the full alignment.
The flank positives and the total positives are the minimum percentage of positives the final refined alignment will have. If the percentage of positives is less than the total positives, more bad pieces will be removed. Any flank with percentage of positives less than the flank positives will be trimmed. Good regions shorter than the minimum length of good region will also be trimmed.
The minimum exon identity/positives represent the smallest percentage of exon identity/positives that may appear in the refined alignment for either a full or partial exon. The number of bases in the first and the last exon that will appear in the refined alignment will be at least the minimum flanking exon length.
To restore both the general and the refinement parameters to their default values, click Defaults.
When you are finished choosing your settings, click Finish. The generated alignment is added to the Project:
Meanwhile, observe that the new alignment is displayed in the Graphical Sequence View in the Alignments track. A tooltip appears when you hover over it.
Step 3: Cancel the Alignment Creation
When you are not sure whether the protein aligns to the forward or the reverse strand of the genomic sequence, select ‘Both’ in the ProSPLIGN dialog. ProSplign generates alignments on both strands and retains the one that has a better match.
Click Next, and then Finish on the next page. The task of generating the protein alignment is listed in the Task View. Select the row and right-click to see the context menu. To cancel the task, click Cancel Task.
This interrupts the alignment creation process, and the application notifies the user that no alignments were created.
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools