NCBI C++ ToolKit
Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | Private Attributes | List of all members
CSequence Class Reference

Search Toolkit Book for CSequence

Class for representing protein sequences. More...

#include <algo/cobalt/seq.hpp>

+ Collaboration diagram for CSequence:

Public Types

typedef CNcbiMatrix< double > TFreqMatrix
 Represents the sequence as position-specific amino acid frequencies. More...
 

Public Member Functions

 CSequence ()
 Default constructor: build an empty sequence. More...
 
 CSequence (const objects::CSeq_loc &seq, objects::CScope &scope)
 Build a sequence. More...
 
void Reset (const objects::CSeq_loc &seq, objects::CScope &scope)
 Replace the sequence represented by a CSequence object. More...
 
void Reset (int length)
 Replace the sequence with sequence of gaps of given length. More...
 
TFreqMatrixGetFreqs ()
 Access the list of position frequencies associated with a sequence. More...
 
const TFreqMatrixGetFreqs () const
 
unsigned char * GetSequence ()
 Access the raw sequence data, in ncbistdaa format. More...
 
const unsigned char * GetSequence () const
 Get the raw sequence data in ncbistdaa format. More...
 
unsigned char GetLetter (int pos) const
 Access the sequence letter at a specified position. More...
 
void SetLetter (int pos, unsigned char letter)
 Change letter in a given position to a given one. More...
 
unsigned char GetPrintableLetter (int pos) const
 Access the sequence letter at a specified position, and return an ASCII representation of that letter. More...
 
int GetLength () const
 Get the length of the current sequence. More...
 
void PropagateGaps (const CNWAligner::TTranscript &transcript, CNWAligner::ETranscriptSymbol gap_choice)
 Given an edit script, insert gaps into a sequence. More...
 
void InsertGaps (const vector< Uint4 > &gap_locations, bool consider_gaps=false)
 Insert gaps into a sequence. More...
 

Static Public Member Functions

static void CompressSequences (vector< CSequence > &seq, vector< int > index_list)
 Given a collection of sequences, remove all sequence positions where a subset of the sequences all contain a gap. More...
 
static void CreateMsa (const objects::CSeq_align &seq_align, objects::CScope &scope, vector< CSequence > &msa)
 Create a vector of CSequence objects that represents the alignment in given Seq_align. More...
 

Static Public Attributes

static const unsigned char kGapChar = 0
 The ncbistdaa code for a gap. More...
 

Private Attributes

vector< unsigned char > m_Sequence
 The sequence (ncbistdaa format) More...
 
TFreqMatrix m_Freqs
 Position-specific frequency profile corresponding to sequence. More...
 

Detailed Description

Class for representing protein sequences.

Definition at line 53 of file seq.hpp.

Member Typedef Documentation

◆ TFreqMatrix

Represents the sequence as position-specific amino acid frequencies.

Matrix is of dimension (sequence length) x kAlphabetSize

Definition at line 63 of file seq.hpp.

Constructor & Destructor Documentation

◆ CSequence() [1/2]

CSequence::CSequence ( )
inline

Default constructor: build an empty sequence.

Definition at line 67 of file seq.hpp.

◆ CSequence() [2/2]

CSequence::CSequence ( const objects::CSeq_loc &  seq,
objects::CScope &  scope 
)

Build a sequence.

Parameters
seqThe input sequence.

Definition at line 126 of file seq.cpp.

References Reset().

Member Function Documentation

◆ CompressSequences()

void CSequence::CompressSequences ( vector< CSequence > &  seq,
vector< int index_list 
)
static

Given a collection of sequences, remove all sequence positions where a subset of the sequences all contain a gap.

Parameters
seqThe list of sequences [in]
index_listList of elements of 'seq' that will be compressed. The other elements of 'seq' will not be changed [in]

Definition at line 232 of file seq.cpp.

References i, kAlphabetSize, kGapChar, and m_Sequence.

Referenced by CMultiAligner::GetResults(), CMultiAligner::x_AlignMSAs(), and CMultiAligner::x_RealignSequences().

◆ CreateMsa()

void CSequence::CreateMsa ( const objects::CSeq_align &  seq_align,
objects::CScope &  scope,
vector< CSequence > &  msa 
)
static

Create a vector of CSequence objects that represents the alignment in given Seq_align.

Parameters
seq_alignAlignment in ASN.1 format [in]
scopeScope [in]
msaAlignment as strings of residues and gaps [out]

Definition at line 274 of file seq.cpp.

References _ASSERT, i, ITERATE, kAlphabetSize, kGapChar, and NON_CONST_ITERATE.

Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::SetInputMSAs().

◆ GetFreqs() [1/2]

TFreqMatrix& CSequence::GetFreqs ( )
inline

Access the list of position frequencies associated with a sequence.

Returns
The frequency matrix

Definition at line 89 of file seq.hpp.

Referenced by CMultiAligner::x_AddRpsFreqsToCluster(), and CMultiAligner::x_MakeClusterResidueFrequencies().

◆ GetFreqs() [2/2]

const TFreqMatrix& CSequence::GetFreqs ( ) const
inline

Definition at line 91 of file seq.hpp.

◆ GetLength()

int CSequence::GetLength ( void  ) const
inline

◆ GetLetter()

unsigned char CSequence::GetLetter ( int  pos) const
inline

Access the sequence letter at a specified position.

Parameters
posPosition to access [in]
Returns
The sequence letter

Definition at line 107 of file seq.hpp.

Referenced by CEditScript::GetScore(), s_SeqToProfilePosition(), CMultiAligner::x_AddRpsFreqsToCluster(), CMultiAligner::x_AlignInClusters(), x_ExpandRange(), x_GetClusterGapLocations(), and x_GetProfileMatchRanges().

◆ GetPrintableLetter()

unsigned char CSequence::GetPrintableLetter ( int  pos) const

Access the sequence letter at a specified position, and return an ASCII representation of that letter.

Parameters
posPosition to access [in]
Returns
The ASCII sequence letter

Definition at line 48 of file seq.cpp.

References _ASSERT, GetLength(), m_Sequence, and val.

Referenced by CMultiAligner::x_AlignInClusters(), and CMultiAligner::x_MultiAlignClusters().

◆ GetSequence() [1/2]

unsigned char* CSequence::GetSequence ( )
inline

Access the raw sequence data, in ncbistdaa format.

Returns
Pointer to array of sequence data

Definition at line 96 of file seq.hpp.

Referenced by CMultiAligner::x_AlignInClusters().

◆ GetSequence() [2/2]

const unsigned char* CSequence::GetSequence ( ) const
inline

Get the raw sequence data in ncbistdaa format.

Returns
Pointer to array of sequence data

Definition at line 101 of file seq.hpp.

◆ InsertGaps()

void CSequence::InsertGaps ( const vector< Uint4 > &  gap_locations,
bool  consider_gaps = false 
)

Insert gaps into a sequence.

Gap is inserted before each given location. The profile of the sequence is adjusted automatically, and new gaps are assigned a column of all zero profile frequencies.

Parameters
gap_locationsLocations of single gaps
consider_gapsIf false location n denotes before n-th letter, otherwise simple position in a string, each location considers gaps with smaller locations already added

Definition at line 170 of file seq.cpp.

References _ASSERT, GetLength(), CNcbiMatrix< T >::GetRows(), i, kAlphabetSize, kGapChar, location, m_Freqs, m_Sequence, CNcbiMatrix< T >::Resize(), and CNcbiMatrix< T >::Swap().

◆ PropagateGaps()

void CSequence::PropagateGaps ( const CNWAligner::TTranscript transcript,
CNWAligner::ETranscriptSymbol  gap_choice 
)

Given an edit script, insert gaps into a sequence.

The profile of the sequence is adjusted automatically, and new gaps are assigned a column of all zero profile frequencies

Parameters
transcriptThe edit script to apply [in]
gap_choiceWhich edit script character, eTS_Delete or eTS_Insert, will cause a gap to be inserted [in]

Definition at line 133 of file seq.cpp.

References _ASSERT, GetLength(), i, kAlphabetSize, kGapChar, m_Freqs, m_Sequence, and CNcbiMatrix< T >::Swap().

Referenced by CMultiAligner::x_AlignInClusters().

◆ Reset() [1/2]

void CSequence::Reset ( const objects::CSeq_loc &  seq,
objects::CScope &  scope 
)

Replace the sequence represented by a CSequence object.

Parameters
seqThe new sequence [in]

Definition at line 87 of file seq.cpp.

References i, kAlphabetSize, m_Freqs, m_Sequence, NCBI_THROW, CNcbiMatrix< T >::Resize(), and CNcbiMatrix< T >::Set().

Referenced by CSequence().

◆ Reset() [2/2]

void CSequence::Reset ( int  length)

Replace the sequence with sequence of gaps of given length.

Parameters
lengthNumber of gaps [in]

Definition at line 117 of file seq.cpp.

References i, kGapChar, and m_Sequence.

◆ SetLetter()

void CSequence::SetLetter ( int  pos,
unsigned char  letter 
)
inline

Change letter in a given position to a given one.

Parameters
posPosition in the sequence [in]
letterLetter [in]

Definition at line 113 of file seq.hpp.

References letter().

Member Data Documentation

◆ kGapChar

const unsigned char CSequence::kGapChar = 0
static

◆ m_Freqs

TFreqMatrix CSequence::m_Freqs
private

Position-specific frequency profile corresponding to sequence.

Definition at line 171 of file seq.hpp.

Referenced by InsertGaps(), PropagateGaps(), and Reset().

◆ m_Sequence

vector<unsigned char> CSequence::m_Sequence
private

The sequence (ncbistdaa format)

Definition at line 170 of file seq.hpp.

Referenced by CompressSequences(), GetPrintableLetter(), InsertGaps(), PropagateGaps(), and Reset().


The documentation for this class was generated from the following files:
Modified on Wed Mar 27 11:25:33 2024 by modify_doxy.py rev. 669887