NCBI C++ ToolKit
Classes | Macros | Functions
seqdbgeneral.hpp File Reference

This file defines several SeqDB utility functions related to byte order and file system portability. More...

#include <objtools/blast/seqdb_reader/seqdbcommon.hpp>
#include <corelib/ncbi_bswap.hpp>
#include <map>
+ Include dependency graph for seqdbgeneral.hpp:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

class  CSeqDB_Substring
 String slicing. More...
 
class  CSeqDB_BaseName
 CSeqDB_BaseName. More...
 
class  CSeqDB_FileName
 CSeqDB_FileName. More...
 
class  CSeqDB_DirName
 CSeqDB_DirName. More...
 
class  CSeqDB_BasePath
 CSeqDB_BasePath. More...
 
class  CSeqDB_Path
 CSeqDB_Path. More...
 
struct  SSeqDBSlice
 OID-Range type to simplify interfaces. More...
 
class  CSeqDBIntCache< TValue >
 Simple int-keyed cache. More...
 

Macros

#define IS_POWER_OF_TWO(x)   (((x) & ((x)-1)) == 0)
 Discretely tests whether an integer is a power of two. More...
 
#define ALIGNED_TO_POW2(x, y)   (! ((x) & (0-y)))
 Checks if a number is congruent to zero, modulo a power of 2. More...
 
#define PTR_ALIGNED_TO_SELF_SIZE(x)    (IS_POWER_OF_TWO(sizeof(*x)) && ALIGNED_TO_POW2((size_t)(x), sizeof(*x)))
 Is the provided pointer aligned to the size (which must be a power of two) of the type to which it points? More...
 
#define SEQDB_ISEOL(x)   (((x) == '\n') || ((x) == '\r'))
 Macro for EOL chars. More...
 
#define SEQDB_FILE_ASSERT(YESNO)
 
#define FENCE_SENTRY   201
 Fence Sentry value, which is placed at either end of ranges of data that are included in partially fetched sequences; this only applies to CSeqDBExpert objects, where SetOffsetRanges() has been called. More...
 

Functions

template<typename T >
T SeqDB_GetStdOrdUnaligned (const T *stdord_obj)
 Reads a network order integer and returns a value. More...
 
template<typename T >
T SeqDB_GetBrokenUnaligned (const T *stdord_obj)
 Read an unaligned integer into memory. More...
 
template<typename T >
T SeqDB_GetStdOrd (const T *stdord_obj)
 Read a network order integer value. More...
 
template<typename T >
T SeqDB_GetBroken (const T *stdord_obj)
 Read a non-network-order integer value. More...
 
void s_SeqDB_QuickAssign (string &dst, const char *bp, const char *ep)
 Higher Performance String Assignment. More...
 
void s_SeqDB_QuickAssign (string &dst, const string &src)
 Higher Performance String Assignment. More...
 
bool SeqDB_SplitString (CSeqDB_Substring &buffer, CSeqDB_Substring &front, char delim)
 Parse a prefix from a substring. More...
 
void SeqDB_CombinePath (const CSeqDB_Substring &path, const CSeqDB_Substring &file, const CSeqDB_Substring *extn, string &outp)
 Combine a filesystem path and file name. More...
 
CSeqDB_Substring SeqDB_RemoveFileName (CSeqDB_Substring s)
 Returns a path minus filename. More...
 
CSeqDB_Substring SeqDB_RemoveDirName (CSeqDB_Substring s)
 Returns a filename minus greedy path. More...
 
CSeqDB_Substring SeqDB_RemoveExtn (CSeqDB_Substring s)
 Returns a filename minus greedy path. More...
 
void SeqDB_ConvertOSPath (string &dbs)
 Change path delimiters to platform preferred kind in-place. More...
 
string SeqDB_FindBlastDBPath (const string &file_name, char dbtype, string *sp, bool exact, CSeqDBAtlas &atlas)
 Finds a file in the search path. More...
 
void SeqDB_JoinDelim (string &a, const string &b, const string &delim)
 Join two strings with a delimiter. More...
 
void SeqDB_ThrowException (CSeqDBException::EErrCode code, const string &msg)
 Thow a SeqDB exception; this is seperated into a function primarily to allow a breakpoint to be set. More...
 
void SeqDB_FileIntegrityAssert (const string &file, int line, const string &text)
 Report file corruption by throwing an eFile CSeqDBException. More...
 
void SeqDB_CombineAndQuote (const vector< string > &dbs, string &dbname)
 Combine and quote list of database names. More...
 
void SeqDB_SplitQuoted (const string &dbname, vector< CSeqDB_Substring > &dbs, bool keep_quote=false)
 Combine and quote list of database names. More...
 
template<class T , class U >
const USeqDB_MapFind (const std::map< T, U > &m, const T &k, const U &dflt)
 Find a map value or return a default. More...
 
template<class T , class U >
int SeqDB_VectorAssign (const T &data, vector< U > &v)
 Copy into a vector efficiently. More...
 

Detailed Description

This file defines several SeqDB utility functions related to byte order and file system portability.

Implemented for: UNIX, MS-Windows

Definition in file seqdbgeneral.hpp.

Macro Definition Documentation

◆ ALIGNED_TO_POW2

#define ALIGNED_TO_POW2 (   x,
 
)    (! ((x) & (0-y)))

Checks if a number is congruent to zero, modulo a power of 2.

Definition at line 130 of file seqdbgeneral.hpp.

◆ FENCE_SENTRY

#define FENCE_SENTRY   201

Fence Sentry value, which is placed at either end of ranges of data that are included in partially fetched sequences; this only applies to CSeqDBExpert objects, where SetOffsetRanges() has been called.

Note
this value is repeated in blast_util.h

Definition at line 1229 of file seqdbgeneral.hpp.

◆ IS_POWER_OF_TWO

#define IS_POWER_OF_TWO (   x)    (((x) & ((x)-1)) == 0)

Discretely tests whether an integer is a power of two.

Definition at line 127 of file seqdbgeneral.hpp.

◆ PTR_ALIGNED_TO_SELF_SIZE

#define PTR_ALIGNED_TO_SELF_SIZE (   x)     (IS_POWER_OF_TWO(sizeof(*x)) && ALIGNED_TO_POW2((size_t)(x), sizeof(*x)))

Is the provided pointer aligned to the size (which must be a power of two) of the type to which it points?

Definition at line 134 of file seqdbgeneral.hpp.

◆ SEQDB_FILE_ASSERT

#define SEQDB_FILE_ASSERT (   YESNO)
Value:
do { \
if (! (YESNO)) { \
SeqDB_FileIntegrityAssert(__FILE__, __LINE__, (#YESNO)); \
} \
} while(0)

Definition at line 1123 of file seqdbgeneral.hpp.

◆ SEQDB_ISEOL

#define SEQDB_ISEOL (   x)    (((x) == '\n') || ((x) == '\r'))

Macro for EOL chars.

Definition at line 192 of file seqdbgeneral.hpp.

Function Documentation

◆ s_SeqDB_QuickAssign() [1/2]

void s_SeqDB_QuickAssign ( string dst,
const char *  bp,
const char *  ep 
)
inline

Higher Performance String Assignment.

Gcc's default assignment and modifier methods (insert, operator = and operator += for instance) for strings do not use the capacity doubling technique (i.e. as used by vector::push_back()) until the length is about the size of a disk block. For our purposes, they often should use doubling. The following assignment function provides the doubling functionality for assignment. I use the assign(char*,char*) overload because it does not discard excess capacity.

Parameters
dstDestination of assigned data.
bpStart of memory containing new value.
epStart of memory containing new value.

Definition at line 210 of file seqdbgeneral.hpp.

Referenced by CSeqDB_BasePath::Assign(), CSeqDB_Path::Assign(), CSeqDB_Substring::GetStringQuick(), CSeqDB_BasePath::operator=(), CSeqDB_DirName::operator=(), CSeqDB_Path::operator=(), s_ConvertV4toV5(), s_SeqDB_QuickAssign(), s_SeqDB_ReadLine(), and SeqDB_JoinDelim().

◆ s_SeqDB_QuickAssign() [2/2]

void s_SeqDB_QuickAssign ( string dst,
const string src 
)
inline

Higher Performance String Assignment.

String to string assignment, using the above function.

Parameters
dstDestination string.
srcInput string.

Definition at line 235 of file seqdbgeneral.hpp.

References s_SeqDB_QuickAssign().

◆ SeqDB_CombineAndQuote()

void SeqDB_CombineAndQuote ( const vector< string > &  dbs,
string dbname 
)

Combine and quote list of database names.

Parameters
dbsDatabase names to combine.
dbnameCombined database name.

Definition at line 1717 of file seqdbcommon.cpp.

Referenced by CSeqDB::CSeqDB(), and CMakeBlastDBApp::x_ProcessInputData().

◆ SeqDB_CombinePath()

void SeqDB_CombinePath ( const CSeqDB_Substring path,
const CSeqDB_Substring file,
const CSeqDB_Substring extn,
string outp 
)

Combine a filesystem path and file name.

Combine a provided filesystem path and a file name. This function tries to avoid duplicated delimiters. If either string is empty, the other is returned. Conceptually, the first path might be the current working directory and the second path is a filename. So, if the second path starts with "/", the first path is ignored. Also, care is taken to avoid duplicated delimiters. If the first path ends with the delimiter character, another delimiter will not be added between the strings. The delimiter used will vary from operating system to operating system, and is adjusted accordingly. If a file extension is specified, it will also be appended.

Parameters
pathThe filesystem path to use
fileThe name of the file (may include path components)
extnThe file extension (without the "."), or NULL if none.
outpA returned string containing the combined path and file name

Definition at line 131 of file seqdbcommon.cpp.

References CSeqDB_Substring::Empty(), CSeqDB_Substring::GetBegin(), CSeqDB_Substring::GetEnd(), CDirEntry::GetPathSeparator(), CSeqDB_Substring::GetString(), isalpha(), and CSeqDB_Substring::Size().

Referenced by CSeqDB_BasePath::CSeqDB_BasePath(), CSeqDB_Path::CSeqDB_Path(), CSeqDBLMDBSet::CSeqDBLMDBSet(), CSeqDB_Path::ReplaceFilename(), s_SeqDB_TryPaths(), and CVDBAliasNode::x_ResolveVDBList().

◆ SeqDB_ConvertOSPath()

void SeqDB_ConvertOSPath ( string dbs)

Change path delimiters to platform preferred kind in-place.

The path is modified in place. The 'Convert' interface is more efficient for cases where the new path would be assigned to the same string object. Delimiter conversion should be called by SeqDB at least once on any path received from the user, or via filesystem sources such as alias files.

Parameters
dbsThis string will be changed in-place.

Definition at line 284 of file seqdbcommon.cpp.

References CDirEntry::GetPathSeparator(), and i.

Referenced by CSeqDB_BaseName::FixDelimiters(), CSeqDB_FileName::FixDelimiters(), CSeqDB_BasePath::FixDelimiters(), s_Tokenize(), and SeqDB_MakeOSPath().

◆ SeqDB_FileIntegrityAssert()

void SeqDB_FileIntegrityAssert ( const string file,
int  line,
const string text 
)

Report file corruption by throwing an eFile CSeqDBException.

This function is only called in the case of validation failure, and is used in code paths where the validation failure may be related to file corruption or filesystem problems. File data is considered a user input, so checks for corrupt file are treated as input validation. This means that (1) checks that may be caused by file corruption scenarios are not disabled in debug mode, and (2) an exception (rather than an abort) is used. Note that this function does not check the assert, so it should only be called in case of failure.

Parameters
fileName of the file containing the assert.
lineThe line the assert in on.
textThe text version of the asserted condition.

Definition at line 2255 of file seqdbcommon.cpp.

References CSeqDBException::eFileErr, file, NStr::IntToString(), msg(), SeqDB_ThrowException(), and text().

◆ SeqDB_FindBlastDBPath()

string SeqDB_FindBlastDBPath ( const string file_name,
char  dbtype,
string sp,
bool  exact,
CSeqDBAtlas atlas 
)

Finds a file in the search path.

This function resolves the full name of a file. It searches for a file of the provided base name and returns the provided name with the full path attached. If the exact_name flag is set, the file is assumed to have any extension it may need, and none is added for searching or stripped from the return value. If exact_name is not set, the file is assumed to end in ".pin", ".nin", ".pal", or ".nal", and if such a file is found, that extension is stripped from the returned string. Furthermore, in the exact_name == false case, only file extensions relevant to the dbtype are considered. Thus, if dbtype is set to 'p' for protein, only ".pin" and ".pal" are checked for; if it is set to nucleotide, only ".nin" and ".nal" are considered. The places where the file may be found are dependant on the search path. The search path consists of the current working directory, the contents of the BLASTDB environment variable, the BLASTDB member of the BLAST group of settings in the NCBI meta-registry. This registry is an interface to settings found in (for example) a ".ncbirc" file found in the user's home directory (but several paths are usually checked). Finally, if the provided file_name starts with the default path delimiter (which is OS dependant, but for example, "/" on Linux), the path will be taken to be absolute, and the search path will not affect the results.

Parameters
file_nameFile base name for which to search
dbtypeInput file base name
spIf non-null, the ":" delimited search path is returned here
exactIf true, the file_name already includes any needed extension
atlasThe memory management layer.
lockedThe lock holder object for this thread.
Returns
Fully qualified filename and path, minus extension

Definition at line 416 of file seqdbcommon.cpp.

References dbname(), CSeqDBAtlas::GetSearchPath(), and s_SeqDB_FindBlastDBPath().

Referenced by CSeqDBAliasSets::x_FindBlastDBPath(), and CSeqDBAliasNode::x_ResolveNames().

◆ SeqDB_GetBroken()

template<typename T >
T SeqDB_GetBroken ( const T stdord_obj)
inline

Read a non-network-order integer value.

Parameters
stdord_objLocation in memory of integer.
Returns
Value of the integer read from memory.

Definition at line 179 of file seqdbgeneral.hpp.

References PTR_ALIGNED_TO_SELF_SIZE, and SeqDB_GetBrokenUnaligned().

Referenced by CSeqDBRawFile::ReadSwapped().

◆ SeqDB_GetBrokenUnaligned()

template<typename T >
T SeqDB_GetBrokenUnaligned ( const T stdord_obj)
inline

Read an unaligned integer into memory.

This template builds a function that reads an integer (on any platform) by reading one byte at a time and assembling the value. The word "Broken" refers to fact that the integer in question is in the opposite of network byte order, and this function is called in those cases. (Currently, this only happens for the 8 byte volume size stored in the index file.)

Parameters
stdord_objLocation of non-network-order object.
Returns
Value of that object.

Definition at line 104 of file seqdbgeneral.hpp.

References T.

Referenced by SeqDB_GetBroken().

◆ SeqDB_GetStdOrd()

template<typename T >
T SeqDB_GetStdOrd ( const T stdord_obj)
inline

◆ SeqDB_GetStdOrdUnaligned()

template<typename T >
T SeqDB_GetStdOrdUnaligned ( const T stdord_obj)
inline

Reads a network order integer and returns a value.

Integer types stored in platform-independent blast database files usually have network byte order. This template builds a function which reads such an integer and returns its value. It may or may not need to swap the integer, depending on the endianness of the platform. If the integer is not aligned to a multiple of the size of the data type, it will still be read byte-wise rather than word-wise. This is done to avoid bus errors on some platforms.

Definition at line 59 of file seqdbgeneral.hpp.

References _ASSERT, CByteSwap::GetInt2(), CByteSwap::GetInt4(), CByteSwap::GetInt8(), and T.

Referenced by SeqDB_GetStdOrd().

◆ SeqDB_JoinDelim()

void SeqDB_JoinDelim ( string a,
const string b,
const string delim 
)

Join two strings with a delimiter.

This function returns whichever of two provided strings is non-empty. If both are non-empty, they are joined with a delimiter placed between them. It is intended for use when combining strings, such as a space delimited list of database volumes. It is probably not suitable for joining file system paths with filenames (use something like SeqDB_CombinePaths).

Parameters
aFirst component and returned path
bSecond component
delimThe delimiter to use when joining elements

Definition at line 480 of file seqdbcommon.cpp.

References a, b, and s_SeqDB_QuickAssign().

Referenced by CSeqDB_TitleWalker::AddString().

◆ SeqDB_MapFind()

template<class T , class U >
const U& SeqDB_MapFind ( const std::map< T, U > &  m,
const T k,
const U dflt 
)

Find a map value or return a default.

This is similar to operator[], except that it works for constant maps, and takes an arbitrary default value when the value is not found (for std::map, the default value is always TValue()).

Parameters
mThe map from which to read values.
kThe key for which to search.
dfltThe value to return if the key was not found.
Returns
The value corresponding to k or a reference to dflt.

Definition at line 1243 of file seqdbgeneral.hpp.

References map_checker< Container >::end().

Referenced by CSeqDB::GetColumnValue(), CSeqDB_ColumnReader::GetValue(), and CSeqDBImpl::x_GetColumnId().

◆ SeqDB_RemoveDirName()

CSeqDB_Substring SeqDB_RemoveDirName ( CSeqDB_Substring  s)

Returns a filename minus greedy path.

Substring version. This returns the part of a file name after the last path delimiter, or the whole path if no delimiter was found.

Parameters
sInput path
Returns
Filename portion of path

Definition at line 50 of file seqdbcommon.cpp.

References CSeqDB_Substring::EraseFront(), CSeqDB_Substring::FindLastOf(), and CDirEntry::GetPathSeparator().

Referenced by CSeqDB_BasePath::FindBaseName(), CSeqDB_Path::FindBaseName(), CSeqDB_Path::FindFileName(), and CMakeProfileDBApp::x_CreateAliasFile().

◆ SeqDB_RemoveExtn()

CSeqDB_Substring SeqDB_RemoveExtn ( CSeqDB_Substring  s)

Returns a filename minus greedy path.

This returns the part of a file name after the last path delimiter, or the whole path if no delimiter was found.

Parameters
sInput path
Returns
Path minus file extension

Definition at line 76 of file seqdbcommon.cpp.

References CSeqDB_Substring::GetEnd(), CSeqDB_Substring::Resize(), and CSeqDB_Substring::Size().

Referenced by CSeqDB_Path::FindBaseName(), and CSeqDB_Path::FindBasePath().

◆ SeqDB_RemoveFileName()

CSeqDB_Substring SeqDB_RemoveFileName ( CSeqDB_Substring  s)

Returns a path minus filename.

Substring version of the above. This returns the part of a file path before the last path delimiter, or the whole path if no delimiter was found.

Parameters
sInput path
Returns
Path minus file extension

Definition at line 62 of file seqdbcommon.cpp.

References CSeqDB_Substring::Clear(), CSeqDB_Substring::FindLastOf(), CDirEntry::GetPathSeparator(), and CSeqDB_Substring::Resize().

Referenced by CSeqDB_BasePath::FindDirName(), and CSeqDB_Path::FindDirName().

◆ SeqDB_SplitQuoted()

void SeqDB_SplitQuoted ( const string dbname,
vector< CSeqDB_Substring > &  dbs,
bool  keep_quote = false 
)

Combine and quote list of database names.

Parameters
dbnameCombined database name.
dbsDatabase names to combine.

Definition at line 1762 of file seqdbcommon.cpp.

References dbname(), and i.

Referenced by CAlignFormatUtil::GetBlastDbInfo(), CSeqDBAliasNode::GetMaskList(), s_Tokenize(), CMakeBlastDBApp::x_BuildDatabase(), CMakeBlastDBApp::x_ProcessInputData(), and CSeqDBAliasNode::x_Tokenize().

◆ SeqDB_SplitString()

bool SeqDB_SplitString ( CSeqDB_Substring buffer,
CSeqDB_Substring front,
char  delim 
)

Parse a prefix from a substring.

The `buffer' argument is searched for a character. If found, the region before the delimiter is returned in `front' and the region after the delimiter is returned in `buffer', and true is returned. If not found, neither argument changes and false is returned.

Parameters
bufferSource data to search and remainder if found. [in|out]
frontRegion before delim if found. [out]
delimCharacter for which to search. [in]
Returns
true if the character was found, false otherwise.

Definition at line 113 of file seqdbcommon.cpp.

References buffer, ctll::front(), and i.

Referenced by CSeqDBTaxInfo::GetTaxNames().

◆ SeqDB_ThrowException()

void SeqDB_ThrowException ( CSeqDBException::EErrCode  code,
const string msg 
)

Thow a SeqDB exception; this is seperated into a function primarily to allow a breakpoint to be set.

Definition at line 70 of file seqdbatlas.cpp.

References CSeqDBException::eArgErr, CSeqDBException::eFileErr, msg(), and NCBI_THROW.

Referenced by SeqDB_CheckLength(), and SeqDB_FileIntegrityAssert().

◆ SeqDB_VectorAssign()

template<class T , class U >
int SeqDB_VectorAssign ( const T data,
vector< U > &  v 
)

Copy into a vector efficiently.

This copies data into a vector which may not be empty beforehand. It is more efficient than freeing the vector for cases like vector<string>, where the existing string buffers may be large enough to hold the new elements. The vector is NOT resized downward but the caller may do a resize() if needed. This design was chosen because for some types (such as vector<string>), more efficient code can be written if element destruction/construction is avoided. The number of elements assigned is returned.

Parameters
dataData source usable by ITERATE and *iter.
vVector to copy the data into.
Returns
The number of elements copied.

Definition at line 1269 of file seqdbgeneral.hpp.

References data, i, ITERATE, and T.

Modified on Fri Sep 20 14:58:13 2024 by modify_doxy.py rev. 669887