NCBI C++ ToolKit
|
Search Toolkit Book for CAmbigDataBuilder
Encode ambiguities in blast database format. More...
Public Member Functions | |
CAmbigDataBuilder (int sz) | |
Constructor. More... | |
int | Check (int data, int offset) |
Check (and maybe store) a possibly ambiguous letter. More... | |
void | GetAmbig (string &amb) |
Compute and return the encoded list of ambiguities. More... | |
void | x_PackNewAmbig (string &amb, const CAmbiguousRegion &r) |
Append the 'new' encoding of one ambiguous region to a string. More... | |
void | x_PackOldAmbig (string &amb, CAmbiguousRegion &r) |
Append the 'old' encoding of one ambiguous region to a string. More... | |
Private Member Functions | |
void | x_AddAmbig (int value, int offset) |
Add an ambiguity letter. More... | |
int | x_Random (int value) |
Pick a random letter from the set represented by an ambiguity. More... | |
Private Attributes | |
int | m_Log2 [16] |
Table mapping 1248 to 0123. More... | |
int | m_Size |
Size of the input sequence. More... | |
vector< CAmbiguousRegion > | m_Regions |
Ambiguous regions for the sequence. More... | |
CRandom | m_Random |
Random number generator. More... | |
Encode ambiguities in blast database format.
This class encodes nucleotide ambiguities in blast database format from a series of ambiguous letter values and offsets.
Definition at line 154 of file writedb_convert.cpp.
|
inline |
Constructor.
sz | Size of the sequence in letters. [in] |
Definition at line 158 of file writedb_convert.cpp.
Check (and maybe store) a possibly ambiguous letter.
If the letter is not an ambiguity, this method converts it to Blast-NA2 format, and returns it. If the letter value is an ambiguity, it is added to the list of ambiguities, and a randomly selected letter value is returned. Each ambiguity letter (there are 12 for nucleotide) represents a possiblity between two or more nucleotide bases. The random letter is always selected from the set of these values corresponding to the input ambiguity value.
data | Letter value in BlastNA8. [in] |
offset | Offset of letter. [in] |
Definition at line 186 of file writedb_convert.cpp.
References _ASSERT, data, m_Log2, m_Size, offset, x_AddAmbig(), and x_Random().
Referenced by WriteDB_Ncbi4naToBinary().
|
inline |
Compute and return the encoded list of ambiguities.
The list of ambiguous regions is packed in blast database format and returned to the user. If the length of the sequence is larger than 2^24-1, or any of the ambiguous regions is larger than 0xF, the 'new' format of ambiguity is used, which allows for larger ambiguous regions at higher sequence offsets, but requires 8 bytes per ambiguous region instead of four bytes required by the 'old' format.
amb | The ambiguity data in blast database format. [out] |
Definition at line 238 of file writedb_convert.cpp.
References i, int, m_Regions, m_Size, s_AppendInt4(), x_PackNewAmbig(), and x_PackOldAmbig().
Referenced by WriteDB_Ncbi4naToBinary().
Add an ambiguity letter.
The internal encoding contains a list of ambiguous ranges. This method adds the given letter at the given offset to the most recent region, if possible, or creates a new region for it.
value | Ambiguous letter to add. [in] |
offset | Offset at which letter occurs. [in] |
Definition at line 349 of file writedb_convert.cpp.
References m_Regions, offset, r(), and rapidjson::value.
Referenced by Check().
|
inline |
Append the 'new' encoding of one ambiguous region to a string.
amb | String encoding of all ambiguous regions. [in|out] |
r | Ambiguous region. [in] |
Definition at line 289 of file writedb_convert.cpp.
References _ASSERT, A1, ch0, ch1, r(), and s_AppendInt4().
Referenced by GetAmbig().
|
inline |
Append the 'old' encoding of one ambiguous region to a string.
amb | String encoding of all ambiguous regions. [in|out] |
r | Ambiguous region. [in] |
Definition at line 317 of file writedb_convert.cpp.
References _ASSERT, A1, ch0, and r().
Referenced by GetAmbig().
Pick a random letter from the set represented by an ambiguity.
This method takes an ambiguous value as input, and returns a letter randomly chosen from the set of letters the ambiguity represents.
value | An ambiguous letter. [in] |
Definition at line 369 of file writedb_convert.cpp.
References _ASSERT, CRandom::GetRand(), i, m_Random, and rapidjson::value.
Referenced by Check().
|
private |
Table mapping 1248 to 0123.
Definition at line 415 of file writedb_convert.cpp.
Referenced by CAmbigDataBuilder(), and Check().
|
private |
Random number generator.
Definition at line 424 of file writedb_convert.cpp.
Referenced by x_Random().
|
private |
Ambiguous regions for the sequence.
Definition at line 421 of file writedb_convert.cpp.
Referenced by GetAmbig(), and x_AddAmbig().
|
private |
Size of the input sequence.
Definition at line 418 of file writedb_convert.cpp.
Referenced by Check(), and GetAmbig().