NCBI C++ ToolKit
|
Search Toolkit Book for CWriteDB
#include <objtools/blast/seqdb_writer/writedb.hpp>
Public Types | |
enum | ESeqType { eProtein = 0 , eNucleotide = 1 } |
Sequence types. More... | |
enum | EIndexType { eNoIndex = 0 , eSparseIndex = 0x1 , eFullIndex = 0x2 , eAddTrace = 0x4 , eFullWithTrace = eFullIndex | eAddTrace , eDefault = eFullIndex | eAddTrace , eAddHash = 0x100 } |
Whether and what kind of indices to build. More... | |
typedef int | TIndexType |
Bitwise OR of "EIndexType". More... | |
Public Types inherited from CObject | |
enum | EAllocFillMode { eAllocFillNone = 1 , eAllocFillZero , eAllocFillPattern } |
Control filling of newly allocated memory. More... | |
typedef CObjectCounterLocker | TLockerType |
Default locker type for CRef. More... | |
typedef atomic< Uint8 > | TCounter |
Counter type is CAtomiCounter. More... | |
typedef Uint8 | TCount |
Alias for value type of counter. More... | |
Public Member Functions | |
CWriteDB (const string &dbname, ESeqType seqtype, const string &title, int itype=eDefault, bool parse_ids=true, bool long_ids=false, bool use_gi_mask=false, EBlastDbVersion dbver=eBDB_Version4, bool limit_defline=false, Uint8 oid_masks=EOidMaskType::fNone, bool scan_bioseq_4_cfastareader_usrobj=false) | |
Constructor. More... | |
~CWriteDB () | |
Destructor. More... | |
void | AddSequence (const CBioseq &bs) |
Add a sequence as a CBioseq. More... | |
void | AddSequence (const CBioseq &bs, CSeqVector &sv) |
Add a sequence as a CBioseq. More... | |
void | AddSequence (const CBioseq_Handle &bsh) |
Add a sequence as a CBioseq. More... | |
void | AddSequence (const CTempString &sequence, const CTempString &ambiguities="") |
Add a sequence as raw data. More... | |
void | SetPig (int pig) |
Set the PIG to be used for the sequence. More... | |
void | SetDeflines (const CBlast_def_line_set &deflines) |
Set the deflines to be used for the sequence. More... | |
int | RegisterMaskAlgorithm (EBlast_filter_program program, const string &options=string(), const string &name=string()) |
Register a type of filtering data found in this database. More... | |
int | RegisterMaskAlgorithm (const string &id, const string &description=string(), const string &options=string()) |
Register a type of filtering data found in this database. More... | |
void | SetMaskData (const CMaskedRangesVector &ranges, const vector< TGi > &gis) |
Set filtering data for a sequence. More... | |
void | ListVolumes (vector< string > &vols) |
List Volumes. More... | |
void | ListFiles (vector< string > &files) |
List Filenames. More... | |
void | Close () |
Close the Database. More... | |
void | SetMaxFileSize (Uint8 sz) |
Set maximum size for output files. More... | |
void | SetMaxVolumeLetters (Uint8 letters) |
Set maximum letters for output volumes. More... | |
void | SetMaskedLetters (const string &masked) |
Set letters that should not be used in sequences. More... | |
int | FindColumn (const string &title) const |
Find an existing column. More... | |
int | CreateUserColumn (const string &title) |
Set up a user-defined CWriteDB column. More... | |
void | AddColumnMetaData (int col_id, const string &key, const string &value) |
Add meta data to a user-defined column. More... | |
CBlastDbBlob & | SetBlobData (int column_id) |
Add blob data to a user-defined column. More... | |
Public Member Functions inherited from CObject | |
CObject (void) | |
Constructor. More... | |
CObject (const CObject &src) | |
Copy constructor. More... | |
virtual | ~CObject (void) |
Destructor. More... | |
CObject & | operator= (const CObject &src) THROWS_NONE |
Assignment operator. More... | |
bool | CanBeDeleted (void) const THROWS_NONE |
Check if object can be deleted. More... | |
bool | IsAllocatedInPool (void) const THROWS_NONE |
Check if object is allocated in memory pool (not system heap) More... | |
bool | Referenced (void) const THROWS_NONE |
Check if object is referenced. More... | |
bool | ReferencedOnlyOnce (void) const THROWS_NONE |
Check if object is referenced only once. More... | |
void | AddReference (void) const |
Add reference to object. More... | |
void | RemoveReference (void) const |
Remove reference to object. More... | |
void | ReleaseReference (void) const |
Remove reference without deleting object. More... | |
virtual void | DoNotDeleteThisObject (void) |
Mark this object as not allocated in heap – do not delete this object. More... | |
virtual void | DoDeleteThisObject (void) |
Mark this object as allocated in heap – object can be deleted. More... | |
void * | operator new (size_t size) |
Define new operator for memory allocation. More... | |
void * | operator new[] (size_t size) |
Define new[] operator for 'array' memory allocation. More... | |
void | operator delete (void *ptr) |
Define delete operator for memory deallocation. More... | |
void | operator delete[] (void *ptr) |
Define delete[] operator for memory deallocation. More... | |
void * | operator new (size_t size, void *place) |
Define new operator. More... | |
void | operator delete (void *ptr, void *place) |
Define delete operator. More... | |
void * | operator new (size_t size, CObjectMemoryPool *place) |
Define new operator using memory pool. More... | |
void | operator delete (void *ptr, CObjectMemoryPool *place) |
Define delete operator. More... | |
virtual void | DebugDump (CDebugDumpContext ddc, unsigned int depth) const |
Define method for dumping debug information. More... | |
Public Member Functions inherited from CDebugDumpable | |
CDebugDumpable (void) | |
virtual | ~CDebugDumpable (void) |
void | DebugDumpText (ostream &out, const string &bundle, unsigned int depth) const |
void | DebugDumpFormat (CDebugDumpFormatter &ddf, const string &bundle, unsigned int depth) const |
void | DumpToConsole (void) const |
Static Public Member Functions | |
static CRef< CBlast_def_line_set > | ExtractBioseqDeflines (const CBioseq &bs, bool parse_ids=true, bool long_ids=false, bool scan_bioseq_4_cfastareader_usrobj=false) |
Extract Deflines From Bioseq. More... | |
Static Public Member Functions inherited from CObject | |
static NCBI_XNCBI_EXPORT void | ThrowNullPointerException (void) |
Define method to throw null pointer exception. More... | |
static NCBI_XNCBI_EXPORT void | ThrowNullPointerException (const type_info &type) |
static EAllocFillMode | GetAllocFillMode (void) |
static void | SetAllocFillMode (EAllocFillMode mode) |
static void | SetAllocFillMode (const string &value) |
Set mode from configuration parameter value. More... | |
Static Public Member Functions inherited from CDebugDumpable | |
static void | EnableDebugDump (bool on) |
Protected Attributes | |
CWriteDB_Impl * | m_Impl |
Implementation object. More... | |
Additional Inherited Members | |
Static Public Attributes inherited from CObject | |
static const TCount | eCounterBitsCanBeDeleted = 1 << 0 |
Define possible object states. More... | |
static const TCount | eCounterBitsInPlainHeap = 1 << 1 |
Heap signature was found. More... | |
static const TCount | eCounterBitsPlaceMask |
Mask for 'in heap' state flags. More... | |
static const int | eCounterStep = 1 << 2 |
Skip over the "in heap" bits. More... | |
static const TCount | eCounterValid = TCount(1) << (sizeof(TCount) * 8 - 2) |
Minimal value for valid objects (reference counter is zero) Must be a single bit value. More... | |
static const TCount | eCounterStateMask |
Valid object, and object in heap. More... | |
Protected Member Functions inherited from CObject | |
virtual void | DeleteThis (void) |
Virtual method "deleting" this object. More... | |
User interface class for blast databases.
This class provides the top-level interface class for BLAST database users. It defines access to the database component by calling methods on objects which represent the various database files, such as the index, header, sequence, and alias files.
Definition at line 91 of file writedb.hpp.
typedef int CWriteDB::TIndexType |
Bitwise OR of "EIndexType".
Definition at line 128 of file writedb.hpp.
enum CWriteDB::EIndexType |
Whether and what kind of indices to build.
Definition at line 104 of file writedb.hpp.
enum CWriteDB::ESeqType |
Sequence types.
Enumerator | |
---|---|
eProtein | Protein database. |
eNucleotide | Nucleotide database. |
Definition at line 95 of file writedb.hpp.
CWriteDB::CWriteDB | ( | const string & | dbname, |
ESeqType | seqtype, | ||
const string & | title, | ||
int | itype = eDefault , |
||
bool | parse_ids = true , |
||
bool | long_ids = false , |
||
bool | use_gi_mask = false , |
||
EBlastDbVersion | dbver = eBDB_Version4 , |
||
bool | limit_defline = false , |
||
Uint8 | oid_masks = EOidMaskType::fNone , |
||
bool | scan_bioseq_4_cfastareader_usrobj = false |
||
) |
Constructor.
Starts construction of a blast database.
dbname | A list of database or alias names, seperated by spaces. [in] |
seqtype | Specify eProtein, eNucleotide, or eUnknown. [in] |
title | The database title. [in] |
itype | Indicates the type of indices to build if specified. [in] |
parse_ids | If true, generate ISAM files [in] |
long_ids | If true, assume long sequence ids (database|accession) when parsing string ids [in] |
use_gi_mask | If true, generate GI-based mask files [in] |
dbver | version of BLAST database to generate [in] |
scan_bioseq_4_cfastareader_usrobj | [in] If true, scan the Bioseq objects for a CFastaReader-created User-object containing a defline |
Definition at line 49 of file writedb.cpp.
CWriteDB::~CWriteDB | ( | ) |
Destructor.
This will return resources acquired by this object, and call Close() if it has not already been called.
Definition at line 74 of file writedb.cpp.
References m_Impl.
Add meta data to a user-defined column.
In addition to normal blob data, database columns can store a `dictionary' of user-defined metadata in key/value form. This method adds one such key/value pair to the column. Specifying a key a second time causes replacement of the previous value. Using this mechanism to store large amounts of data may have a negative impact on performance.
col_id | Specifies the column to add this metadata to. |
key | A unique key string. |
value | A value string. |
Definition at line 185 of file writedb.cpp.
References CWriteDB_Impl::AddColumnMetaData(), ncbi::grid::netcache::search::fields::key, m_Impl, and rapidjson::value.
Referenced by CBuildDatabase::AddSequences().
Add a sequence as a CBioseq.
This adds the sequence data in the specified CBioseq to the database. If the CBioseq contains deflines, they will also be used unless there is a call to SetDeflines() or AddDefline(). Note that the CBioseq will be held by CWriteDB at least until the next sequence is provided. If this method is used, the CBioseq is expected to contain sequence data accessible via GetInst().GetSeq_data(). If this might not be true, it may be better to use the version of this function that also takes a CSeqVector.
bs | The sequence and related data as a CBioseq. [in] |
Definition at line 79 of file writedb.cpp.
References CWriteDB_Impl::AddSequence(), and m_Impl.
Referenced by CBuildDatabase::AddSequences(), BOOST_AUTO_TEST_CASE(), s_DupIdsBioseq(), s_DupIdsRaw(), CBuildDatabase::x_DupLocal(), CBuildDatabase::x_EditAndAddBioseq(), and CMakeProfileDBApp::x_MakeVol().
void CWriteDB::AddSequence | ( | const CBioseq & | bs, |
CSeqVector & | sv | ||
) |
Add a sequence as a CBioseq.
This adds the sequence data in the specified CSeqVector, and the meta data in the specified CBioseq, to the database. If the CBioseq contains deflines, they will also be used unless there is a call to SetDeflines() or AddDefline(). Note that the CBioseq will be held by CWriteDB at least until the next sequence is provided. This version will use the CSeqVector if the sequence data is not found in the CBioseq.
bs | A CBioseq containing meta data for the sequence. [in] |
sv | The sequence data for the sequence. [in] |
Definition at line 89 of file writedb.cpp.
References CWriteDB_Impl::AddSequence(), and m_Impl.
void CWriteDB::AddSequence | ( | const CBioseq_Handle & | bsh | ) |
Add a sequence as a CBioseq.
This adds the sequence found in the given CBioseq_Handle to the database.
bsh | The sequence and related data as a CBioseq_Handle. [in] |
Definition at line 84 of file writedb.cpp.
References CWriteDB_Impl::AddSequence(), and m_Impl.
void CWriteDB::AddSequence | ( | const CTempString & | sequence, |
const CTempString & | ambiguities = "" |
||
) |
Add a sequence as raw data.
This adds a sequence provided as raw sequence data. The raw data must be (and is assumed to be) encoded correctly for the format of database being produced. For protein databases, the ambiguities string should be empty (and is thus optional). If this version of AddSequence() is used, the user must also provide one or more deflines with SetDeflines() or AddDefline() calls.
sequence | The sequence data as a string of bytes. [in] |
ambiguities | The ambiguity data as a string of bytes. [in] |
Definition at line 109 of file writedb.cpp.
References a, CWriteDB_Impl::AddSequence(), ambig(), CTempString::data(), CTempString::length(), and m_Impl.
void CWriteDB::Close | ( | void | ) |
Close the Database.
Flush all data to disk and close any open files.
Definition at line 104 of file writedb.cpp.
References CWriteDB_Impl::Close(), and m_Impl.
Referenced by BOOST_AUTO_TEST_CASE(), CBuildDatabase::EndBuild(), s_DupSequencesTest(), and CMakeProfileDBApp::x_MakeVol().
Set up a user-defined CWriteDB column.
This method creates a user-defined column associated with this database. The column is indexed by OID and contains arbitrary binary data, which is applied using the SetBlobData method below. The `title' parameter identifies the column and must be unique within this database. Because tables are accessed by title, it is not necessary to permanently associate file extensions with specific purposes or data types. The return value of this method is an integer that identifies this column for the purpose of inserting blob data. (The number of columns allowed is currently limited due to the file naming scheme, but some columns are used for built-in purposes.)
title | Name identifying this column. |
Definition at line 180 of file writedb.cpp.
References CWriteDB_Impl::CreateColumn(), and m_Impl.
Referenced by CBuildDatabase::AddSequences().
|
static |
Extract Deflines From Bioseq.
Deflines are extracted from the CBioseq and returned to the user. The caller can then modify or inspect the deflines, and apply them to a sequence with SetDeflines().
bs | The bioseq from which to extract a defline set. [in] |
parse_ids | If seqid should be parsed [in] |
long_ids | It true, use long sequence ids (database|accession) [in] |
scan_bioseq_4_cfastareader_usrobj | [in] If true, scan the Bioseq objects for a CFastaReader-created User-object containing a defline |
Definition at line 129 of file writedb.cpp.
References CWriteDB_Impl::ExtractBioseqDeflines().
Referenced by BOOST_AUTO_TEST_CASE(), CBuildDatabase::x_EditAndAddBioseq(), and CMakeProfileDBApp::x_MakeVol().
Find an existing column.
This looks for an existing column with the specified title and returns the column ID if found.
title | The column title to look for. |
Definition at line 175 of file writedb.cpp.
References CWriteDB_Impl::FindColumn(), and m_Impl.
Referenced by CBuildDatabase::AddSequences().
void CWriteDB::ListFiles | ( | vector< string > & | files | ) |
List Filenames.
Returns a list of the files constructed by this class; the returned list may not be complete until Close() has been called.
files | The set of resolved database path names. [out] |
Definition at line 146 of file writedb.cpp.
References CWriteDB_Impl::ListFiles(), and m_Impl.
Referenced by BOOST_AUTO_TEST_CASE(), s_DupSequencesTest(), s_WrapUpDb(), and CBuildDatabase::x_EndBuild().
void CWriteDB::ListVolumes | ( | vector< string > & | vols | ) |
List Volumes.
Returns the base names of all volumes constructed by this class; the returned list may not be complete until Close() has been called.
vols | The set of volumes produced by this class. [out] |
Definition at line 141 of file writedb.cpp.
References CWriteDB_Impl::ListVolumes(), and m_Impl.
Referenced by BOOST_AUTO_TEST_CASE(), and CBuildDatabase::x_EndBuild().
int CWriteDB::RegisterMaskAlgorithm | ( | const string & | id, |
const string & | description = string() , |
||
const string & | options = string() |
||
) |
Register a type of filtering data found in this database.
id | A string to identify the masking data. [in] |
description | Details about the masking data. [in] |
options | Algorithm options provided to the program. [in] |
int CWriteDB::RegisterMaskAlgorithm | ( | EBlast_filter_program | program, |
const string & | options = string() , |
||
const string & | name = string() |
||
) |
Register a type of filtering data found in this database.
program | Program used to produce this masking data. [in] |
options | Algorithm options provided to the program. [in] |
name | Name of the GI-based mask. [in] |
Referenced by CBuildDatabase::RegisterMaskingAlgorithm().
CBlastDbBlob & CWriteDB::SetBlobData | ( | int | column_id | ) |
Add blob data to a user-defined column.
To add data to a user-defined blob column, call this method, providing the column handle. A blob object will be returned; the user data should be stored in this object. The data can be stored any time up to the next call to an `AddSequence' method (just as with any other per-sequence data) but access to the returned object after that point results is incorrect and will have undefined consequences.
column_id | Identifier for a user-defined column. |
Definition at line 190 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetBlobData().
Referenced by CBuildDatabase::AddSequences().
void CWriteDB::SetDeflines | ( | const CBlast_def_line_set & | deflines | ) |
Set the deflines to be used for the sequence.
This method sets all the deflines at once as a complete set, overriding any deflines provided by AddSequence(). If this method is used with the CBioseq version of AddSequence, it replaces the deflines found in the CBioseq.
deflines | Deflines to use for this sequence. [in] |
Definition at line 94 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetDeflines().
Referenced by CBuildDatabase::AddSequences(), BOOST_AUTO_TEST_CASE(), s_DupIdsBioseq(), s_DupIdsRaw(), CBuildDatabase::x_DupLocal(), CBuildDatabase::x_EditAndAddBioseq(), and CMakeProfileDBApp::x_MakeVol().
void CWriteDB::SetMaskData | ( | const CMaskedRangesVector & | ranges, |
const vector< TGi > & | gis | ||
) |
Set filtering data for a sequence.
This method specifies filtered regions for this sequence. A sequence may have filtering data from one or more algorithms. For each algorithm_id value specified in ranges, a description should be added to the database using RegisterMaskAlgorithm(). This must be done before the first call to SetMaskData() that uses the algorithm id for a non-empty offset range list.
ranges | Filtered ranges for this sequence and algorithm. |
gis | GIs associated with this sequence. |
Definition at line 169 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetMaskData().
Referenced by CBuildDatabase::AddSequences(), and CBuildDatabase::x_AddMasksForSeqId().
Set letters that should not be used in sequences.
This method specifies letters that should not be used in the resulting database. The masked letters are expected to be specified in an IUPAC (alphabetic) encoding, and will be replaced by 'X' (for protein) when the sequences are packed. This method should be called before any sequences are added. This method only works with protein (the motivating case cannot happen with nucleotide).
masked | Letters to disinclude. [in] |
Definition at line 136 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetMaskedLetters().
Referenced by CBuildDatabase::SetMaskLetters().
void CWriteDB::SetMaxFileSize | ( | Uint8 | sz | ) |
Set maximum size for output files.
The provided size is applied as a limit on the size of output files. If adding a sequence would cause any output file to exceed this size, the volume is closed and a new volume is started (unless the current volume is empty, in which case the size limit is ignored and a one-sequence volume is created). The default value is 2^30-1. There is also a hard limit required by the database format.
sz | Maximum size in bytes of any volume component file. [in] |
Definition at line 118 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetMaxFileSize().
Referenced by CBuildDatabase::CBuildDatabase(), CBuildDatabase::SetMaxFileSize(), and CMakeProfileDBApp::x_InitOutputDb().
void CWriteDB::SetMaxVolumeLetters | ( | Uint8 | letters | ) |
Set maximum letters for output volumes.
The provided size is applied as a limit on the size of output volumes. If adding a sequence would cause a volume to exceed this many protein or nucleotide letters (*not* bytes), the volume is closed and a new volume is started (unless the volume is currently empty). There is no default, but there is a hard limit required by the format definition. Ambiguity encoding is not counted toward this limit.
letters | Maximum letters to pack in one volume. [in] |
Definition at line 123 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetMaxVolumeLetters().
Referenced by BOOST_AUTO_TEST_CASE().
void CWriteDB::SetPig | ( | int | pig | ) |
Set the PIG to be used for the sequence.
For proteins, this sets the PIG of the protein sequence.
pig | PIG identifier as an integer. [in] |
Definition at line 99 of file writedb.cpp.
References m_Impl, and CWriteDB_Impl::SetPig().
Referenced by BOOST_AUTO_TEST_CASE(), and CBuildDatabase::x_AddPig().
|
protected |
Implementation object.
Definition at line 447 of file writedb.hpp.
Referenced by AddColumnMetaData(), AddSequence(), Close(), CreateUserColumn(), CWriteDB(), FindColumn(), ListFiles(), ListVolumes(), SetBlobData(), SetDeflines(), SetMaskData(), SetMaskedLetters(), SetMaxFileSize(), SetMaxVolumeLetters(), SetPig(), and ~CWriteDB().