NCBI C++ ToolKit
Classes | Functions
writedb_convert.cpp File Reference

Data conversion tools for CWriteDB and associated code. More...

#include <ncbi_pch.hpp>
#include <util/sequtil/sequtil_convert.hpp>
#include <util/random_gen.hpp>
#include <objtools/blast/seqdb_writer/writedb_general.hpp>
#include <objtools/blast/seqdb_writer/writedb_convert.hpp>
#include <iostream>
+ Include dependency graph for writedb_convert.cpp:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

class  CAmbiguousRegion
 Ambiguous portion of a sequence. More...
 
class  CAmbigDataBuilder
 Encode ambiguities in blast database format. More...
 

Functions

 USING_SCOPE (std)
 
vector< unsigned char > s_BuildNa4ToNa2Table ()
 Builds a table from NA4 to NA2 (with ambiguities marked as 0xFF.) More...
 
void WriteDB_Ncbi4naToBinary (const char *ncbi4na, int byte_length, int base_length, string &seq, string &amb)
 Build binary blast2na + ambig encoding based on ncbi4na input. More...
 
void WriteDB_Ncbi4naToBinary (const CSeq_inst &seqinst, string &seq, string &amb)
 Build blast db nucleotide format from Ncbi4na Seq-inst. More...
 
void WriteDB_StdaaToBinary (const CSeq_inst &si, string &seq)
 Build blast db protein format from Stdaa protein Seq-inst. More...
 
void WriteDB_EaaToBinary (const CSeq_inst &si, string &seq)
 Build blast db protein format from Eaa protein Seq-inst. More...
 
void WriteDB_IupacaaToBinary (const CSeq_inst &si, string &seq)
 Build blast db protein format from Iupacaa protein Seq-inst. More...
 
void WriteDB_Ncbi2naToBinary (const CSeq_inst &si, string &seq)
 Build blast db nucleotide format from Ncbi2na Seq-inst. More...
 
void WriteDB_IupacnaToBinary (const CSeq_inst &si, string &seq, string &amb)
 Build blast db nucleotide format from Iupacna Seq-inst. More...
 

Detailed Description

Data conversion tools for CWriteDB and associated code.

class for WriteDB.

Definition in file writedb_convert.cpp.

Function Documentation

◆ s_BuildNa4ToNa2Table()

vector<unsigned char> s_BuildNa4ToNa2Table ( )
inline

Builds a table from NA4 to NA2 (with ambiguities marked as 0xFF.)

Returns
A vector indexed by NA4 value, with values from 0-3 or 0xFF.

Definition at line 429 of file writedb_convert.cpp.

References ctable, and i.

Referenced by WriteDB_Ncbi4naToBinary().

◆ USING_SCOPE()

USING_SCOPE ( std  )

◆ WriteDB_EaaToBinary()

void WriteDB_EaaToBinary ( const CSeq_inst si,
string seq 
)

Build blast db protein format from Eaa protein Seq-inst.

The data is converted and returned in the string.

Parameters
siSeq-inst containing data in NcbiEaa format. [in]
seqSequence in blast db disk format. [out]

Definition at line 539 of file writedb_convert.cpp.

References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Ncbieaa, CSeqUtil::e_Ncbistdaa, and si.

Referenced by CWriteDB_Impl::x_CookSequence().

◆ WriteDB_IupacaaToBinary()

void WriteDB_IupacaaToBinary ( const CSeq_inst si,
string seq 
)

Build blast db protein format from Iupacaa protein Seq-inst.

The data is converted and returned in the string.

Parameters
siSeq-inst containing data in Iupacaa format. [in]
seqSequence in blast db disk format. [out]

Definition at line 554 of file writedb_convert.cpp.

References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Iupacaa, CSeqUtil::e_Ncbistdaa, and si.

Referenced by CWriteDB_Impl::x_CookSequence().

◆ WriteDB_IupacnaToBinary()

void WriteDB_IupacnaToBinary ( const CSeq_inst si,
string seq,
string amb 
)

Build blast db nucleotide format from Iupacna Seq-inst.

The data is compressed to ncbi2na, the length remainder is coded into the last byte, and ambiguous region data is produced.

Parameters
siSeq-inst containing data in Iupacna format. [in]
seqSequence in blast db disk format. [out]
ambAmbiguities in blast db disk format. [out]

Definition at line 590 of file writedb_convert.cpp.

References _ASSERT, CSeqConvert::Convert(), CSeqUtil::e_Iupacna, CSeqUtil::e_Ncbi4na, si, tmp, and WriteDB_Ncbi4naToBinary().

Referenced by CWriteDB_Impl::x_CookSequence().

◆ WriteDB_Ncbi2naToBinary()

void WriteDB_Ncbi2naToBinary ( const CSeq_inst si,
string seq 
)

Build blast db nucleotide format from Ncbi2na Seq-inst.

The data is in the correct format, and can be copied as-is, but the length remainder must be coded into the last byte. It is not necessary to deal with ambiguities - if there were any, ncbi2na would not be the input format.

Parameters
siSeq-inst containing data in Iupacaa format. [in]
seqSequence in blast db disk format. [out]

Definition at line 569 of file writedb_convert.cpp.

References _ASSERT, base_length, s_DivideRoundUp(), and si.

Referenced by CWriteDB_Impl::x_CookSequence().

◆ WriteDB_Ncbi4naToBinary() [1/2]

void WriteDB_Ncbi4naToBinary ( const char *  ncbi4na,
int  byte_length,
int  base_length,
string seq,
string amb 
)

Build binary blast2na + ambig encoding based on ncbi4na input.

Parameters
ncbi4naInput data with possible ambiguities.
byte_lengthNumber of bytes in the input data.
base_lengthValid nucleotide bases in the input data.
seqSequence data in blast db format.
ambAmbiguity data in blast db format. Build blast db nucleotide format from Ncbi4na data in memory.

For a given sequence in ncbi4na format, the blast database format data is constructed; this consists of ncbi2na format with values in ambiguous locations selected randomly, plus the precise values of the ambiguous regions encoded in a seperate string.

Parameters
ncbi4naPointer to Ncbi4na format sequence data. [in]
byte_lengthLength of ncbi4na data in bytes. [in]
base_lengthNumber of letters of valid data. [in]
seqSequence in blast db disk format. [out]
seqAmbiguities in blast db disk format. [out]

Definition at line 444 of file writedb_convert.cpp.

References _ASSERT, base_length, CAmbigDataBuilder::Check(), ctable, CAmbigDataBuilder::GetAmbig(), i, s_BuildNa4ToNa2Table(), and s_DivideRoundUp().

Referenced by WriteDB_IupacnaToBinary(), and WriteDB_Ncbi4naToBinary().

◆ WriteDB_Ncbi4naToBinary() [2/2]

void WriteDB_Ncbi4naToBinary ( const CSeq_inst seqinst,
string seq,
string amb 
)

Build blast db nucleotide format from Ncbi4na Seq-inst.

The data is compressed to ncbi2na, the length remainder is coded into the last byte, and ambiguous region data is produced.

Parameters
siSeq-inst containing data in Ncbi4na format. [in]
seqSequence in blast db disk format. [out]
ambAmbiguities in blast db disk format. [out]

Definition at line 520 of file writedb_convert.cpp.

References base_length, CAliasBase< TPrim >::Get(), CSeq_inst_Base::GetLength(), CSeq_data_Base::GetNcbi4na(), CSeq_inst_Base::GetSeq_data(), and WriteDB_Ncbi4naToBinary().

Referenced by CWriteDB_Impl::x_CookSequence().

◆ WriteDB_StdaaToBinary()

void WriteDB_StdaaToBinary ( const CSeq_inst si,
string seq 
)

Build blast db protein format from Stdaa protein Seq-inst.

No conversion is actually done here, because this is already the correct format for disk. Instead the sequence data is just copied from the Seq-inst to the string.

Parameters
siSeq-inst containing data in NcbiStdaa format. [in]
seqSequence in blast db disk format. [out]

Definition at line 530 of file writedb_convert.cpp.

References _ASSERT, and si.

Referenced by CWriteDB_Impl::x_CookSequence().

Modified on Thu Feb 22 17:13:29 2024 by modify_doxy.py rev. 669887