NCBI C++ ToolKit
|
Data conversion tools for CWriteDB and associated code. More...
#include <objects/seq/seq__.hpp>
#include <objects/blastdb/blastdb__.hpp>
#include <objmgr/bioseq_handle.hpp>
#include <objmgr/seq_vector.hpp>
Go to the source code of this file.
Go to the SVN repository for this file.
Functions | |
USING_SCOPE (objects) | |
Import definitions from the objects namespace. More... | |
void | WriteDB_StdaaToBinary (const CSeq_inst &si, string &seq) |
Build blast db protein format from Stdaa protein Seq-inst. More... | |
void | WriteDB_EaaToBinary (const CSeq_inst &si, string &seq) |
Build blast db protein format from Eaa protein Seq-inst. More... | |
void | WriteDB_IupacaaToBinary (const CSeq_inst &si, string &seq) |
Build blast db protein format from Iupacaa protein Seq-inst. More... | |
void | WriteDB_Ncbi2naToBinary (const CSeq_inst &si, string &seq) |
Build blast db nucleotide format from Ncbi2na Seq-inst. More... | |
void | WriteDB_Ncbi4naToBinary (const CSeq_inst &seqinst, string &seq, string &amb) |
Build blast db nucleotide format from Ncbi4na Seq-inst. More... | |
void | WriteDB_Ncbi4naToBinary (const char *ncbi4na, int byte_length, int base_length, string &seq, string &amb) |
Build binary blast2na + ambig encoding based on ncbi4na input. More... | |
void | WriteDB_IupacnaToBinary (const CSeq_inst &si, string &seq, string &amb) |
Build blast db nucleotide format from Iupacna Seq-inst. More... | |
void | s_AppendInt4 (string &outp, int x) |
Append a value to a string as a 4 byte big-endian integer. More... | |
void | s_WriteInt4 (ostream &str, int x) |
Write a four byte integer to a stream in big endian format. More... | |
void | s_WriteInt8LE (ostream &str, Uint8 x) |
Write an eight byte integer to a stream in little-endian format. More... | |
void | s_WriteInt8BE (ostream &str, Uint8 x) |
Write an eight byte integer to a stream in big-endian format. More... | |
void | s_WriteString (ostream &str, const string &s) |
Write a length-prefixed string to a stream. More... | |
Data conversion tools for CWriteDB and associated code.
Defines classes: CAmbiguousRegion
Implemented for: UNIX, MS-Windows
Definition in file writedb_convert.hpp.
Append a value to a string as a 4 byte big-endian integer.
x | Value to append. |
outp | String to modify. |
Definition at line 143 of file writedb_convert.hpp.
References buf.
Referenced by CAmbigDataBuilder::GetAmbig(), and CAmbigDataBuilder::x_PackNewAmbig().
|
inline |
Write a four byte integer to a stream in big endian format.
str | Stream to write to. |
x | Integer to write. |
Definition at line 157 of file writedb_convert.hpp.
Referenced by s_WriteString(), CBinaryListBuilder::Write(), CWriteDB_File::WriteInt4(), and CWriteDB_IndexFile::x_Flush().
|
inline |
Write an eight byte integer to a stream in big-endian format.
str | Stream to write to. |
x | Integer to write. |
Definition at line 189 of file writedb_convert.hpp.
Referenced by CBinaryListBuilder::Write(), and CWriteDB_File::WriteInt8().
|
inline |
Write an eight byte integer to a stream in little-endian format.
str | Stream to write to. |
x | Integer to write. |
Definition at line 171 of file writedb_convert.hpp.
Referenced by CWriteDB_IndexFile::x_Flush().
Write a length-prefixed string to a stream.
This method writes a string to a stream, prefixing the string with it's length, written as a big-endian four byte integer.
str | Stream to write to. |
s | String to write. |
Definition at line 211 of file writedb_convert.hpp.
References s_WriteInt4(), and str().
Referenced by CWriteDB_IndexFile::x_Flush().
USING_SCOPE | ( | objects | ) |
Import definitions from the objects namespace.
Build blast db protein format from Eaa protein Seq-inst.
The data is converted and returned in the string.
si | Seq-inst containing data in NcbiEaa format. [in] |
seq | Sequence in blast db disk format. [out] |
Definition at line 539 of file writedb_convert.cpp.
References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Ncbieaa, CSeqUtil::e_Ncbistdaa, and si.
Referenced by CWriteDB_Impl::x_CookSequence().
Build blast db protein format from Iupacaa protein Seq-inst.
The data is converted and returned in the string.
si | Seq-inst containing data in Iupacaa format. [in] |
seq | Sequence in blast db disk format. [out] |
Definition at line 554 of file writedb_convert.cpp.
References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Iupacaa, CSeqUtil::e_Ncbistdaa, and si.
Referenced by CWriteDB_Impl::x_CookSequence().
Build blast db nucleotide format from Iupacna Seq-inst.
The data is compressed to ncbi2na, the length remainder is coded into the last byte, and ambiguous region data is produced.
si | Seq-inst containing data in Iupacna format. [in] |
seq | Sequence in blast db disk format. [out] |
amb | Ambiguities in blast db disk format. [out] |
Definition at line 590 of file writedb_convert.cpp.
References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Iupacna, CSeqUtil::e_Ncbi4na, si, tmp, and WriteDB_Ncbi4naToBinary().
Referenced by CWriteDB_Impl::x_CookSequence().
Build blast db nucleotide format from Ncbi2na Seq-inst.
The data is in the correct format, and can be copied as-is, but the length remainder must be coded into the last byte. It is not necessary to deal with ambiguities - if there were any, ncbi2na would not be the input format.
si | Seq-inst containing data in Iupacaa format. [in] |
seq | Sequence in blast db disk format. [out] |
Definition at line 569 of file writedb_convert.cpp.
References _ASSERT, base_length, s_DivideRoundUp(), and si.
Referenced by CWriteDB_Impl::x_CookSequence().
void WriteDB_Ncbi4naToBinary | ( | const char * | ncbi4na, |
int | byte_length, | ||
int | base_length, | ||
string & | seq, | ||
string & | amb | ||
) |
Build binary blast2na + ambig encoding based on ncbi4na input.
ncbi4na | Input data with possible ambiguities. |
byte_length | Number of bytes in the input data. |
base_length | Valid nucleotide bases in the input data. |
seq | Sequence data in blast db format. |
amb | Ambiguity data in blast db format. Build blast db nucleotide format from Ncbi4na data in memory. |
For a given sequence in ncbi4na format, the blast database format data is constructed; this consists of ncbi2na format with values in ambiguous locations selected randomly, plus the precise values of the ambiguous regions encoded in a seperate string.
ncbi4na | Pointer to Ncbi4na format sequence data. [in] |
byte_length | Length of ncbi4na data in bytes. [in] |
base_length | Number of letters of valid data. [in] |
seq | Sequence in blast db disk format. [out] |
seq | Ambiguities in blast db disk format. [out] |
Definition at line 444 of file writedb_convert.cpp.
References _ASSERT, base_length, CAmbigDataBuilder::Check(), ctable, CAmbigDataBuilder::GetAmbig(), i, s_BuildNa4ToNa2Table(), and s_DivideRoundUp().
Referenced by WriteDB_IupacnaToBinary(), and WriteDB_Ncbi4naToBinary().
Build blast db nucleotide format from Ncbi4na Seq-inst.
The data is compressed to ncbi2na, the length remainder is coded into the last byte, and ambiguous region data is produced.
si | Seq-inst containing data in Ncbi4na format. [in] |
seq | Sequence in blast db disk format. [out] |
amb | Ambiguities in blast db disk format. [out] |
Definition at line 520 of file writedb_convert.cpp.
References base_length, CAliasBase< TPrim >::Get(), CSeq_inst_Base::GetLength(), CSeq_data_Base::GetNcbi4na(), CSeq_inst_Base::GetSeq_data(), and WriteDB_Ncbi4naToBinary().
Referenced by CWriteDB_Impl::x_CookSequence().
Build blast db protein format from Stdaa protein Seq-inst.
No conversion is actually done here, because this is already the correct format for disk. Instead the sequence data is just copied from the Seq-inst to the string.
si | Seq-inst containing data in NcbiStdaa format. [in] |
seq | Sequence in blast db disk format. [out] |
Definition at line 530 of file writedb_convert.cpp.
Referenced by CWriteDB_Impl::x_CookSequence().