NCBI C++ ToolKit
|
This file defines several SeqDB utility functions related to byte order and file system portability. More...
#include <objtools/blast/seqdb_reader/seqdbcommon.hpp>
#include <corelib/ncbi_bswap.hpp>
#include <map>
Go to the source code of this file.
Go to the SVN repository for this file.
Classes | |
class | CSeqDB_Substring |
String slicing. More... | |
class | CSeqDB_BaseName |
CSeqDB_BaseName. More... | |
class | CSeqDB_FileName |
CSeqDB_FileName. More... | |
class | CSeqDB_DirName |
CSeqDB_DirName. More... | |
class | CSeqDB_BasePath |
CSeqDB_BasePath. More... | |
class | CSeqDB_Path |
CSeqDB_Path. More... | |
struct | SSeqDBSlice |
OID-Range type to simplify interfaces. More... | |
class | CSeqDBIntCache< TValue > |
Simple int-keyed cache. More... | |
Macros | |
#define | IS_POWER_OF_TWO(x) (((x) & ((x)-1)) == 0) |
Discretely tests whether an integer is a power of two. More... | |
#define | ALIGNED_TO_POW2(x, y) (! ((x) & (0-y))) |
Checks if a number is congruent to zero, modulo a power of 2. More... | |
#define | PTR_ALIGNED_TO_SELF_SIZE(x) (IS_POWER_OF_TWO(sizeof(*x)) && ALIGNED_TO_POW2((size_t)(x), sizeof(*x))) |
Is the provided pointer aligned to the size (which must be a power of two) of the type to which it points? More... | |
#define | SEQDB_ISEOL(x) (((x) == '\n') || ((x) == '\r')) |
Macro for EOL chars. More... | |
#define | SEQDB_FILE_ASSERT(YESNO) |
#define | FENCE_SENTRY 201 |
Fence Sentry value, which is placed at either end of ranges of data that are included in partially fetched sequences; this only applies to CSeqDBExpert objects, where SetOffsetRanges() has been called. More... | |
Functions | |
template<typename T > | |
T | SeqDB_GetStdOrdUnaligned (const T *stdord_obj) |
Reads a network order integer and returns a value. More... | |
template<typename T > | |
T | SeqDB_GetBrokenUnaligned (const T *stdord_obj) |
Read an unaligned integer into memory. More... | |
template<typename T > | |
T | SeqDB_GetStdOrd (const T *stdord_obj) |
Read a network order integer value. More... | |
template<typename T > | |
T | SeqDB_GetBroken (const T *stdord_obj) |
Read a non-network-order integer value. More... | |
void | s_SeqDB_QuickAssign (string &dst, const char *bp, const char *ep) |
Higher Performance String Assignment. More... | |
void | s_SeqDB_QuickAssign (string &dst, const string &src) |
Higher Performance String Assignment. More... | |
bool | SeqDB_SplitString (CSeqDB_Substring &buffer, CSeqDB_Substring &front, char delim) |
Parse a prefix from a substring. More... | |
void | SeqDB_CombinePath (const CSeqDB_Substring &path, const CSeqDB_Substring &file, const CSeqDB_Substring *extn, string &outp) |
Combine a filesystem path and file name. More... | |
CSeqDB_Substring | SeqDB_RemoveFileName (CSeqDB_Substring s) |
Returns a path minus filename. More... | |
CSeqDB_Substring | SeqDB_RemoveDirName (CSeqDB_Substring s) |
Returns a filename minus greedy path. More... | |
CSeqDB_Substring | SeqDB_RemoveExtn (CSeqDB_Substring s) |
Returns a filename minus greedy path. More... | |
void | SeqDB_ConvertOSPath (string &dbs) |
Change path delimiters to platform preferred kind in-place. More... | |
string | SeqDB_FindBlastDBPath (const string &file_name, char dbtype, string *sp, bool exact, CSeqDBAtlas &atlas) |
Finds a file in the search path. More... | |
void | SeqDB_JoinDelim (string &a, const string &b, const string &delim) |
Join two strings with a delimiter. More... | |
void | SeqDB_ThrowException (CSeqDBException::EErrCode code, const string &msg) |
Thow a SeqDB exception; this is seperated into a function primarily to allow a breakpoint to be set. More... | |
void | SeqDB_FileIntegrityAssert (const string &file, int line, const string &text) |
Report file corruption by throwing an eFile CSeqDBException. More... | |
void | SeqDB_CombineAndQuote (const vector< string > &dbs, string &dbname) |
Combine and quote list of database names. More... | |
void | SeqDB_SplitQuoted (const string &dbname, vector< CSeqDB_Substring > &dbs, bool keep_quote=false) |
Combine and quote list of database names. More... | |
template<class T , class U > | |
const U & | SeqDB_MapFind (const std::map< T, U > &m, const T &k, const U &dflt) |
Find a map value or return a default. More... | |
template<class T , class U > | |
int | SeqDB_VectorAssign (const T &data, vector< U > &v) |
Copy into a vector efficiently. More... | |
This file defines several SeqDB utility functions related to byte order and file system portability.
Implemented for: UNIX, MS-Windows
Definition in file seqdbgeneral.hpp.
#define ALIGNED_TO_POW2 | ( | x, | |
y | |||
) | (! ((x) & (0-y))) |
Checks if a number is congruent to zero, modulo a power of 2.
Definition at line 130 of file seqdbgeneral.hpp.
#define FENCE_SENTRY 201 |
Fence Sentry value, which is placed at either end of ranges of data that are included in partially fetched sequences; this only applies to CSeqDBExpert objects, where SetOffsetRanges() has been called.
Definition at line 1229 of file seqdbgeneral.hpp.
#define IS_POWER_OF_TWO | ( | x | ) | (((x) & ((x)-1)) == 0) |
Discretely tests whether an integer is a power of two.
Definition at line 127 of file seqdbgeneral.hpp.
#define PTR_ALIGNED_TO_SELF_SIZE | ( | x | ) | (IS_POWER_OF_TWO(sizeof(*x)) && ALIGNED_TO_POW2((size_t)(x), sizeof(*x))) |
Is the provided pointer aligned to the size (which must be a power of two) of the type to which it points?
Definition at line 134 of file seqdbgeneral.hpp.
#define SEQDB_FILE_ASSERT | ( | YESNO | ) |
Definition at line 1123 of file seqdbgeneral.hpp.
Macro for EOL chars.
Definition at line 192 of file seqdbgeneral.hpp.
Higher Performance String Assignment.
Gcc's default assignment and modifier methods (insert, operator = and operator += for instance) for strings do not use the capacity doubling technique (i.e. as used by vector::push_back()) until the length is about the size of a disk block. For our purposes, they often should use doubling. The following assignment function provides the doubling functionality for assignment. I use the assign(char*,char*) overload because it does not discard excess capacity.
dst | Destination of assigned data. |
bp | Start of memory containing new value. |
ep | Start of memory containing new value. |
Definition at line 210 of file seqdbgeneral.hpp.
Referenced by CSeqDB_BasePath::Assign(), CSeqDB_Path::Assign(), CSeqDB_Substring::GetStringQuick(), CSeqDB_BasePath::operator=(), CSeqDB_DirName::operator=(), CSeqDB_Path::operator=(), s_ConvertV4toV5(), s_SeqDB_QuickAssign(), s_SeqDB_ReadLine(), and SeqDB_JoinDelim().
Higher Performance String Assignment.
String to string assignment, using the above function.
dst | Destination string. |
src | Input string. |
Definition at line 235 of file seqdbgeneral.hpp.
References s_SeqDB_QuickAssign().
Combine and quote list of database names.
dbs | Database names to combine. |
dbname | Combined database name. |
Definition at line 1717 of file seqdbcommon.cpp.
Referenced by CSeqDB::CSeqDB(), and CMakeBlastDBApp::x_ProcessInputData().
void SeqDB_CombinePath | ( | const CSeqDB_Substring & | path, |
const CSeqDB_Substring & | file, | ||
const CSeqDB_Substring * | extn, | ||
string & | outp | ||
) |
Combine a filesystem path and file name.
Combine a provided filesystem path and a file name. This function tries to avoid duplicated delimiters. If either string is empty, the other is returned. Conceptually, the first path might be the current working directory and the second path is a filename. So, if the second path starts with "/", the first path is ignored. Also, care is taken to avoid duplicated delimiters. If the first path ends with the delimiter character, another delimiter will not be added between the strings. The delimiter used will vary from operating system to operating system, and is adjusted accordingly. If a file extension is specified, it will also be appended.
path | The filesystem path to use |
file | The name of the file (may include path components) |
extn | The file extension (without the "."), or NULL if none. |
outp | A returned string containing the combined path and file name |
Definition at line 131 of file seqdbcommon.cpp.
References CSeqDB_Substring::Empty(), CSeqDB_Substring::GetBegin(), CSeqDB_Substring::GetEnd(), CDirEntry::GetPathSeparator(), CSeqDB_Substring::GetString(), isalpha(), and CSeqDB_Substring::Size().
Referenced by CSeqDB_BasePath::CSeqDB_BasePath(), CSeqDB_Path::CSeqDB_Path(), CSeqDBLMDBSet::CSeqDBLMDBSet(), CSeqDB_Path::ReplaceFilename(), s_SeqDB_TryPaths(), and CVDBAliasNode::x_ResolveVDBList().
void SeqDB_ConvertOSPath | ( | string & | dbs | ) |
Change path delimiters to platform preferred kind in-place.
The path is modified in place. The 'Convert' interface is more efficient for cases where the new path would be assigned to the same string object. Delimiter conversion should be called by SeqDB at least once on any path received from the user, or via filesystem sources such as alias files.
dbs | This string will be changed in-place. |
Definition at line 284 of file seqdbcommon.cpp.
References CDirEntry::GetPathSeparator(), and i.
Referenced by CSeqDB_BaseName::FixDelimiters(), CSeqDB_FileName::FixDelimiters(), CSeqDB_BasePath::FixDelimiters(), s_Tokenize(), and SeqDB_MakeOSPath().
Report file corruption by throwing an eFile CSeqDBException.
This function is only called in the case of validation failure, and is used in code paths where the validation failure may be related to file corruption or filesystem problems. File data is considered a user input, so checks for corrupt file are treated as input validation. This means that (1) checks that may be caused by file corruption scenarios are not disabled in debug mode, and (2) an exception (rather than an abort) is used. Note that this function does not check the assert, so it should only be called in case of failure.
file | Name of the file containing the assert. |
line | The line the assert in on. |
text | The text version of the asserted condition. |
Definition at line 2255 of file seqdbcommon.cpp.
References CSeqDBException::eFileErr, file, NStr::IntToString(), msg(), SeqDB_ThrowException(), and text().
string SeqDB_FindBlastDBPath | ( | const string & | file_name, |
char | dbtype, | ||
string * | sp, | ||
bool | exact, | ||
CSeqDBAtlas & | atlas | ||
) |
Finds a file in the search path.
This function resolves the full name of a file. It searches for a file of the provided base name and returns the provided name with the full path attached. If the exact_name flag is set, the file is assumed to have any extension it may need, and none is added for searching or stripped from the return value. If exact_name is not set, the file is assumed to end in ".pin", ".nin", ".pal", or ".nal", and if such a file is found, that extension is stripped from the returned string. Furthermore, in the exact_name == false case, only file extensions relevant to the dbtype are considered. Thus, if dbtype is set to 'p' for protein, only ".pin" and ".pal" are checked for; if it is set to nucleotide, only ".nin" and ".nal" are considered. The places where the file may be found are dependant on the search path. The search path consists of the current working directory, the contents of the BLASTDB environment variable, the BLASTDB member of the BLAST group of settings in the NCBI meta-registry. This registry is an interface to settings found in (for example) a ".ncbirc" file found in the user's home directory (but several paths are usually checked). Finally, if the provided file_name starts with the default path delimiter (which is OS dependant, but for example, "/" on Linux), the path will be taken to be absolute, and the search path will not affect the results.
file_name | File base name for which to search |
dbtype | Input file base name |
sp | If non-null, the ":" delimited search path is returned here |
exact | If true, the file_name already includes any needed extension |
atlas | The memory management layer. |
locked | The lock holder object for this thread. |
Definition at line 416 of file seqdbcommon.cpp.
References dbname(), CSeqDBAtlas::GetSearchPath(), and s_SeqDB_FindBlastDBPath().
Referenced by CSeqDBAliasSets::x_FindBlastDBPath(), and CSeqDBAliasNode::x_ResolveNames().
Read a non-network-order integer value.
stdord_obj | Location in memory of integer. |
Definition at line 179 of file seqdbgeneral.hpp.
References PTR_ALIGNED_TO_SELF_SIZE, and SeqDB_GetBrokenUnaligned().
Referenced by CSeqDBRawFile::ReadSwapped().
Read an unaligned integer into memory.
This template builds a function that reads an integer (on any platform) by reading one byte at a time and assembling the value. The word "Broken" refers to fact that the integer in question is in the opposite of network byte order, and this function is called in those cases. (Currently, this only happens for the 8 byte volume size stored in the index file.)
stdord_obj | Location of non-network-order object. |
Definition at line 104 of file seqdbgeneral.hpp.
References T.
Referenced by SeqDB_GetBroken().
Read a network order integer value.
stdord_obj | Location in memory of network order integer. |
Definition at line 170 of file seqdbgeneral.hpp.
References SeqDB_GetStdOrdUnaligned().
Referenced by CTaxDBFileInfo::CTaxDBFileInfo(), CSeqDBIdxFile::GetAmbStartEnd(), CSeqDB::GetDate(), CSeqDBIdxFile::GetHdrStartEnd(), CSeqDBTaxId::GetOffset(), CSeqDBGiIndex::GetSeqGI(), CSeqDBIdxFile::GetSeqStart(), CSeqDBIdxFile::GetSeqStartEnd(), CSeqDBTaxId::GetTaxId(), CSeqDBRawFile::ReadSwapped(), s_ConvertV4toV5(), SeqDB_ReadBinaryGiList(), SeqDB_ReadMemoryGiList(), SeqDB_ReadMemoryPigList(), SeqDB_ReadMemoryTaxIdList(), SeqDB_ReadMemoryTiList(), SeqDB_UnpackAmbiguities(), CSeqDBIsam::x_DiffSample(), CSeqDBVol::x_GetAmbChar(), CSeqDBIsam::x_GetIndexKeyOffset(), CSeqDBIsam::x_GetNumericData(), CSeqDBIsam::x_GetNumericKey(), CSeqDBIsam::x_InitSearch(), CSeqDBIsam::x_LoadIndex(), and CSeqDBIsam::x_LoadPage().
Reads a network order integer and returns a value.
Integer types stored in platform-independent blast database files usually have network byte order. This template builds a function which reads such an integer and returns its value. It may or may not need to swap the integer, depending on the endianness of the platform. If the integer is not aligned to a multiple of the size of the data type, it will still be read byte-wise rather than word-wise. This is done to avoid bus errors on some platforms.
Definition at line 59 of file seqdbgeneral.hpp.
References _ASSERT, CByteSwap::GetInt2(), CByteSwap::GetInt4(), CByteSwap::GetInt8(), and T.
Referenced by SeqDB_GetStdOrd().
Join two strings with a delimiter.
This function returns whichever of two provided strings is non-empty. If both are non-empty, they are joined with a delimiter placed between them. It is intended for use when combining strings, such as a space delimited list of database volumes. It is probably not suitable for joining file system paths with filenames (use something like SeqDB_CombinePaths).
a | First component and returned path |
b | Second component |
delim | The delimiter to use when joining elements |
Definition at line 480 of file seqdbcommon.cpp.
References a, b, and s_SeqDB_QuickAssign().
Referenced by CSeqDB_TitleWalker::AddString().
const U& SeqDB_MapFind | ( | const std::map< T, U > & | m, |
const T & | k, | ||
const U & | dflt | ||
) |
Find a map value or return a default.
This is similar to operator[], except that it works for constant maps, and takes an arbitrary default value when the value is not found (for std::map, the default value is always TValue()).
m | The map from which to read values. |
k | The key for which to search. |
dflt | The value to return if the key was not found. |
Definition at line 1243 of file seqdbgeneral.hpp.
References map_checker< Container >::end().
Referenced by CSeqDB::GetColumnValue(), CSeqDB_ColumnReader::GetValue(), and CSeqDBImpl::x_GetColumnId().
CSeqDB_Substring SeqDB_RemoveDirName | ( | CSeqDB_Substring | s | ) |
Returns a filename minus greedy path.
Substring version. This returns the part of a file name after the last path delimiter, or the whole path if no delimiter was found.
s | Input path |
Definition at line 50 of file seqdbcommon.cpp.
References CSeqDB_Substring::EraseFront(), CSeqDB_Substring::FindLastOf(), and CDirEntry::GetPathSeparator().
Referenced by CSeqDB_BasePath::FindBaseName(), CSeqDB_Path::FindBaseName(), CSeqDB_Path::FindFileName(), and CMakeProfileDBApp::x_CreateAliasFile().
CSeqDB_Substring SeqDB_RemoveExtn | ( | CSeqDB_Substring | s | ) |
Returns a filename minus greedy path.
This returns the part of a file name after the last path delimiter, or the whole path if no delimiter was found.
s | Input path |
Definition at line 76 of file seqdbcommon.cpp.
References CSeqDB_Substring::GetEnd(), CSeqDB_Substring::Resize(), and CSeqDB_Substring::Size().
Referenced by CSeqDB_Path::FindBaseName(), and CSeqDB_Path::FindBasePath().
CSeqDB_Substring SeqDB_RemoveFileName | ( | CSeqDB_Substring | s | ) |
Returns a path minus filename.
Substring version of the above. This returns the part of a file path before the last path delimiter, or the whole path if no delimiter was found.
s | Input path |
Definition at line 62 of file seqdbcommon.cpp.
References CSeqDB_Substring::Clear(), CSeqDB_Substring::FindLastOf(), CDirEntry::GetPathSeparator(), and CSeqDB_Substring::Resize().
Referenced by CSeqDB_BasePath::FindDirName(), and CSeqDB_Path::FindDirName().
void SeqDB_SplitQuoted | ( | const string & | dbname, |
vector< CSeqDB_Substring > & | dbs, | ||
bool | keep_quote = false |
||
) |
Combine and quote list of database names.
dbname | Combined database name. |
dbs | Database names to combine. |
Definition at line 1762 of file seqdbcommon.cpp.
Referenced by CAlignFormatUtil::GetBlastDbInfo(), CSeqDBAliasNode::GetMaskList(), s_Tokenize(), CMakeBlastDBApp::x_BuildDatabase(), CMakeBlastDBApp::x_ProcessInputData(), and CSeqDBAliasNode::x_Tokenize().
bool SeqDB_SplitString | ( | CSeqDB_Substring & | buffer, |
CSeqDB_Substring & | front, | ||
char | delim | ||
) |
Parse a prefix from a substring.
The `buffer' argument is searched for a character. If found, the region before the delimiter is returned in `front' and the region after the delimiter is returned in `buffer', and true is returned. If not found, neither argument changes and false is returned.
buffer | Source data to search and remainder if found. [in|out] |
front | Region before delim if found. [out] |
delim | Character for which to search. [in] |
Definition at line 113 of file seqdbcommon.cpp.
References buffer, ctll::front(), and i.
Referenced by CSeqDBTaxInfo::GetTaxNames().
void SeqDB_ThrowException | ( | CSeqDBException::EErrCode | code, |
const string & | msg | ||
) |
Thow a SeqDB exception; this is seperated into a function primarily to allow a breakpoint to be set.
Definition at line 70 of file seqdbatlas.cpp.
References CSeqDBException::eArgErr, CSeqDBException::eFileErr, msg(), and NCBI_THROW.
Referenced by SeqDB_CheckLength(), and SeqDB_FileIntegrityAssert().
Copy into a vector efficiently.
This copies data into a vector which may not be empty beforehand. It is more efficient than freeing the vector for cases like vector<string>, where the existing string buffers may be large enough to hold the new elements. The vector is NOT resized downward but the caller may do a resize() if needed. This design was chosen because for some types (such as vector<string>), more efficient code can be written if element destruction/construction is avoided. The number of elements assigned is returned.
data | Data source usable by ITERATE and *iter. |
v | Vector to copy the data into. |
Definition at line 1269 of file seqdbgeneral.hpp.