NCBI C++ ToolKit
Public Member Functions | Private Member Functions | Private Attributes | List of all members
CAmbigDataBuilder Class Reference

Search Toolkit Book for CAmbigDataBuilder

Encode ambiguities in blast database format. More...

+ Collaboration diagram for CAmbigDataBuilder:

Public Member Functions

 CAmbigDataBuilder (int sz)
 Constructor. More...
 
int Check (int data, int offset)
 Check (and maybe store) a possibly ambiguous letter. More...
 
void GetAmbig (string &amb)
 Compute and return the encoded list of ambiguities. More...
 
void x_PackNewAmbig (string &amb, const CAmbiguousRegion &r)
 Append the 'new' encoding of one ambiguous region to a string. More...
 
void x_PackOldAmbig (string &amb, CAmbiguousRegion &r)
 Append the 'old' encoding of one ambiguous region to a string. More...
 

Private Member Functions

void x_AddAmbig (int value, int offset)
 Add an ambiguity letter. More...
 
int x_Random (int value)
 Pick a random letter from the set represented by an ambiguity. More...
 

Private Attributes

int m_Log2 [16]
 Table mapping 1248 to 0123. More...
 
int m_Size
 Size of the input sequence. More...
 
vector< CAmbiguousRegionm_Regions
 Ambiguous regions for the sequence. More...
 
CRandom m_Random
 Random number generator. More...
 

Detailed Description

Encode ambiguities in blast database format.

This class encodes nucleotide ambiguities in blast database format from a series of ambiguous letter values and offsets.

Definition at line 154 of file writedb_convert.cpp.

Constructor & Destructor Documentation

◆ CAmbigDataBuilder()

CAmbigDataBuilder::CAmbigDataBuilder ( int  sz)
inline

Constructor.

Parameters
szSize of the sequence in letters. [in]

Definition at line 158 of file writedb_convert.cpp.

References i, and m_Log2.

Member Function Documentation

◆ Check()

int CAmbigDataBuilder::Check ( int  data,
int  offset 
)
inline

Check (and maybe store) a possibly ambiguous letter.

If the letter is not an ambiguity, this method converts it to Blast-NA2 format, and returns it. If the letter value is an ambiguity, it is added to the list of ambiguities, and a randomly selected letter value is returned. Each ambiguity letter (there are 12 for nucleotide) represents a possiblity between two or more nucleotide bases. The random letter is always selected from the set of these values corresponding to the input ambiguity value.

Parameters
dataLetter value in BlastNA8. [in]
offsetOffset of letter. [in]
Returns
Value to encode as BlastNA2.

Definition at line 186 of file writedb_convert.cpp.

References _ASSERT, data, m_Log2, m_Size, offset, x_AddAmbig(), and x_Random().

Referenced by WriteDB_Ncbi4naToBinary().

◆ GetAmbig()

void CAmbigDataBuilder::GetAmbig ( string amb)
inline

Compute and return the encoded list of ambiguities.

The list of ambiguous regions is packed in blast database format and returned to the user. If the length of the sequence is larger than 2^24-1, or any of the ambiguous regions is larger than 0xF, the 'new' format of ambiguity is used, which allows for larger ambiguous regions at higher sequence offsets, but requires 8 bytes per ambiguous region instead of four bytes required by the 'old' format.

Parameters
ambThe ambiguity data in blast database format. [out]

Definition at line 238 of file writedb_convert.cpp.

References i, int, m_Regions, m_Size, s_AppendInt4(), x_PackNewAmbig(), and x_PackOldAmbig().

Referenced by WriteDB_Ncbi4naToBinary().

◆ x_AddAmbig()

void CAmbigDataBuilder::x_AddAmbig ( int  value,
int  offset 
)
inlineprivate

Add an ambiguity letter.

The internal encoding contains a list of ambiguous ranges. This method adds the given letter at the given offset to the most recent region, if possible, or creates a new region for it.

Parameters
valueAmbiguous letter to add. [in]
offsetOffset at which letter occurs. [in]

Definition at line 349 of file writedb_convert.cpp.

References m_Regions, offset, r(), and rapidjson::value.

Referenced by Check().

◆ x_PackNewAmbig()

void CAmbigDataBuilder::x_PackNewAmbig ( string amb,
const CAmbiguousRegion r 
)
inline

Append the 'new' encoding of one ambiguous region to a string.

Parameters
ambString encoding of all ambiguous regions. [in|out]
rAmbiguous region. [in]

Definition at line 289 of file writedb_convert.cpp.

References _ASSERT, A1, ch0, ch1, r(), and s_AppendInt4().

Referenced by GetAmbig().

◆ x_PackOldAmbig()

void CAmbigDataBuilder::x_PackOldAmbig ( string amb,
CAmbiguousRegion r 
)
inline

Append the 'old' encoding of one ambiguous region to a string.

Parameters
ambString encoding of all ambiguous regions. [in|out]
rAmbiguous region. [in]

Definition at line 317 of file writedb_convert.cpp.

References _ASSERT, A1, ch0, and r().

Referenced by GetAmbig().

◆ x_Random()

int CAmbigDataBuilder::x_Random ( int  value)
inlineprivate

Pick a random letter from the set represented by an ambiguity.

This method takes an ambiguous value as input, and returns a letter randomly chosen from the set of letters the ambiguity represents.

Parameters
valueAn ambiguous letter. [in]
Returns
A non-ambiguous letter.

Definition at line 369 of file writedb_convert.cpp.

References _ASSERT, CRandom::GetRand(), i, m_Random, and rapidjson::value.

Referenced by Check().

Member Data Documentation

◆ m_Log2

int CAmbigDataBuilder::m_Log2[16]
private

Table mapping 1248 to 0123.

Definition at line 415 of file writedb_convert.cpp.

Referenced by CAmbigDataBuilder(), and Check().

◆ m_Random

CRandom CAmbigDataBuilder::m_Random
private

Random number generator.

Definition at line 424 of file writedb_convert.cpp.

Referenced by x_Random().

◆ m_Regions

vector<CAmbiguousRegion> CAmbigDataBuilder::m_Regions
private

Ambiguous regions for the sequence.

Definition at line 421 of file writedb_convert.cpp.

Referenced by GetAmbig(), and x_AddAmbig().

◆ m_Size

int CAmbigDataBuilder::m_Size
private

Size of the input sequence.

Definition at line 418 of file writedb_convert.cpp.

Referenced by Check(), and GetAmbig().


The documentation for this class was generated from the following file:
Modified on Wed Sep 04 15:04:26 2024 by modify_doxy.py rev. 669887