NCBI C++ ToolKit
|
Search Toolkit Book for CSequence
Class for representing protein sequences. More...
#include <algo/cobalt/seq.hpp>
Public Types | |
typedef CNcbiMatrix< double > | TFreqMatrix |
Represents the sequence as position-specific amino acid frequencies. More... | |
Public Member Functions | |
CSequence () | |
Default constructor: build an empty sequence. More... | |
CSequence (const objects::CSeq_loc &seq, objects::CScope &scope) | |
Build a sequence. More... | |
void | Reset (const objects::CSeq_loc &seq, objects::CScope &scope) |
Replace the sequence represented by a CSequence object. More... | |
void | Reset (int length) |
Replace the sequence with sequence of gaps of given length. More... | |
TFreqMatrix & | GetFreqs () |
Access the list of position frequencies associated with a sequence. More... | |
const TFreqMatrix & | GetFreqs () const |
unsigned char * | GetSequence () |
Access the raw sequence data, in ncbistdaa format. More... | |
const unsigned char * | GetSequence () const |
Get the raw sequence data in ncbistdaa format. More... | |
unsigned char | GetLetter (int pos) const |
Access the sequence letter at a specified position. More... | |
void | SetLetter (int pos, unsigned char letter) |
Change letter in a given position to a given one. More... | |
unsigned char | GetPrintableLetter (int pos) const |
Access the sequence letter at a specified position, and return an ASCII representation of that letter. More... | |
int | GetLength () const |
Get the length of the current sequence. More... | |
void | PropagateGaps (const CNWAligner::TTranscript &transcript, CNWAligner::ETranscriptSymbol gap_choice) |
Given an edit script, insert gaps into a sequence. More... | |
void | InsertGaps (const vector< Uint4 > &gap_locations, bool consider_gaps=false) |
Insert gaps into a sequence. More... | |
Static Public Member Functions | |
static void | CompressSequences (vector< CSequence > &seq, vector< int > index_list) |
Given a collection of sequences, remove all sequence positions where a subset of the sequences all contain a gap. More... | |
static void | CreateMsa (const objects::CSeq_align &seq_align, objects::CScope &scope, vector< CSequence > &msa) |
Create a vector of CSequence objects that represents the alignment in given Seq_align. More... | |
Static Public Attributes | |
static const unsigned char | kGapChar = 0 |
The ncbistdaa code for a gap. More... | |
Private Attributes | |
vector< unsigned char > | m_Sequence |
The sequence (ncbistdaa format) More... | |
TFreqMatrix | m_Freqs |
Position-specific frequency profile corresponding to sequence. More... | |
typedef CNcbiMatrix<double> CSequence::TFreqMatrix |
|
inline |
CSequence::CSequence | ( | const objects::CSeq_loc & | seq, |
objects::CScope & | scope | ||
) |
Given a collection of sequences, remove all sequence positions where a subset of the sequences all contain a gap.
seq | The list of sequences [in] |
index_list | List of elements of 'seq' that will be compressed. The other elements of 'seq' will not be changed [in] |
Definition at line 232 of file seq.cpp.
References i, kAlphabetSize, kGapChar, and m_Sequence.
Referenced by CMultiAligner::GetResults(), CMultiAligner::x_AlignMSAs(), and CMultiAligner::x_RealignSequences().
|
static |
Create a vector of CSequence objects that represents the alignment in given Seq_align.
seq_align | Alignment in ASN.1 format [in] |
scope | Scope [in] |
msa | Alignment as strings of residues and gaps [out] |
Definition at line 274 of file seq.cpp.
References _ASSERT, i, ITERATE, kAlphabetSize, kGapChar, and NON_CONST_ITERATE.
Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::SetInputMSAs().
|
inline |
Access the list of position frequencies associated with a sequence.
Definition at line 89 of file seq.hpp.
Referenced by CMultiAligner::x_AddRpsFreqsToCluster(), and CMultiAligner::x_MakeClusterResidueFrequencies().
|
inline |
|
inline |
Get the length of the current sequence.
Definition at line 125 of file seq.hpp.
Referenced by GetPrintableLetter(), InsertGaps(), PropagateGaps(), s_SeqToProfilePosition(), CMultiAligner::x_AlignInClusters(), x_ExpandRange(), x_GetClusterGapLocations(), x_GetProfileMatchRanges(), CMultiAligner::x_MakeClusterResidueFrequencies(), and CMultiAligner::x_MultiAlignClusters().
|
inline |
Access the sequence letter at a specified position.
pos | Position to access [in] |
Definition at line 107 of file seq.hpp.
Referenced by CEditScript::GetScore(), s_SeqToProfilePosition(), CMultiAligner::x_AddRpsFreqsToCluster(), CMultiAligner::x_AlignInClusters(), x_ExpandRange(), x_GetClusterGapLocations(), and x_GetProfileMatchRanges().
unsigned char CSequence::GetPrintableLetter | ( | int | pos | ) | const |
Access the sequence letter at a specified position, and return an ASCII representation of that letter.
pos | Position to access [in] |
Definition at line 48 of file seq.cpp.
References _ASSERT, GetLength(), m_Sequence, and val.
Referenced by CMultiAligner::x_AlignInClusters(), and CMultiAligner::x_MultiAlignClusters().
|
inline |
Access the raw sequence data, in ncbistdaa format.
Definition at line 96 of file seq.hpp.
Referenced by CMultiAligner::x_AlignInClusters().
|
inline |
Insert gaps into a sequence.
Gap is inserted before each given location. The profile of the sequence is adjusted automatically, and new gaps are assigned a column of all zero profile frequencies.
gap_locations | Locations of single gaps |
consider_gaps | If false location n denotes before n-th letter, otherwise simple position in a string, each location considers gaps with smaller locations already added |
Definition at line 170 of file seq.cpp.
References _ASSERT, GetLength(), CNcbiMatrix< T >::GetRows(), i, kAlphabetSize, kGapChar, location, m_Freqs, m_Sequence, CNcbiMatrix< T >::Resize(), and CNcbiMatrix< T >::Swap().
void CSequence::PropagateGaps | ( | const CNWAligner::TTranscript & | transcript, |
CNWAligner::ETranscriptSymbol | gap_choice | ||
) |
Given an edit script, insert gaps into a sequence.
The profile of the sequence is adjusted automatically, and new gaps are assigned a column of all zero profile frequencies
transcript | The edit script to apply [in] |
gap_choice | Which edit script character, eTS_Delete or eTS_Insert, will cause a gap to be inserted [in] |
Definition at line 133 of file seq.cpp.
References _ASSERT, GetLength(), i, kAlphabetSize, kGapChar, m_Freqs, m_Sequence, and CNcbiMatrix< T >::Swap().
Referenced by CMultiAligner::x_AlignInClusters().
void CSequence::Reset | ( | const objects::CSeq_loc & | seq, |
objects::CScope & | scope | ||
) |
Replace the sequence represented by a CSequence object.
seq | The new sequence [in] |
Definition at line 87 of file seq.cpp.
References i, kAlphabetSize, m_Freqs, m_Sequence, NCBI_THROW, CNcbiMatrix< T >::Resize(), and CNcbiMatrix< T >::Set().
Referenced by CSequence().
void CSequence::Reset | ( | int | length | ) |
Replace the sequence with sequence of gaps of given length.
length | Number of gaps [in] |
Definition at line 117 of file seq.cpp.
References i, kGapChar, and m_Sequence.
|
inline |
|
static |
The ncbistdaa code for a gap.
Definition at line 58 of file seq.hpp.
Referenced by BOOST_AUTO_TEST_CASE(), CompressSequences(), CreateMsa(), InsertGaps(), PropagateGaps(), Reset(), s_SeqToProfilePosition(), s_TestResultAlignment(), CMultiAligner::x_AddRpsFreqsToCluster(), CMultiAligner::x_AlignInClusters(), x_ExpandRange(), x_FillResidueFrequencies(), CMultiAligner::x_FindConservedColumns(), x_GetClusterGapLocations(), x_GetProfileMatchRanges(), CMultiAligner::x_GetSeqalign(), CMultiAligner::x_MultiAlignClusters(), and CMultiAligner::x_ValidateQueries().
|
private |
Position-specific frequency profile corresponding to sequence.
Definition at line 171 of file seq.hpp.
Referenced by InsertGaps(), PropagateGaps(), and Reset().
|
private |
The sequence (ncbistdaa format)
Definition at line 170 of file seq.hpp.
Referenced by CompressSequences(), GetPrintableLetter(), InsertGaps(), PropagateGaps(), and Reset().