NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Static Public Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
CSeqDBIsam Class Reference

Search Toolkit Book for CSeqDBIsam

CSeqDBIsam. More...

#include <objtools/blast/seqdb_reader/impl/seqdbisam.hpp>

+ Inheritance diagram for CSeqDBIsam:
+ Collaboration diagram for CSeqDBIsam:

Classes

class  SIsamKey
 Stores a key for an ISAM file. More...
 

Public Types

enum  EIsamDbType {
  eNumeric = 0 , eNumericNoData = 1 , eString = 2 , eStringDatabase = 3 ,
  eStringBin = 4 , eNumericLongId = 5
}
 Types of database this class can access. More...
 
typedef CSeqDBGiList::SGiOid TGiOid
 Import the type representing one GI, OID association. More...
 
typedef CSeqDBAtlas::TIndx TIndx
 Type which is large enough to span the bytes of an ISAM file. More...
 
typedef int TOid
 This class works with OIDs relative to a specific volume. More...
 
typedef Int8 TTi
 PIG identifiers for numeric indices over protein volumes. More...
 
typedef Int8 TId
 Type large enough to hold any numerical ID. More...
 
- Public Types inherited from CObject
enum  EAllocFillMode { eAllocFillNone = 1 , eAllocFillZero , eAllocFillPattern }
 Control filling of newly allocated memory. More...
 
typedef CObjectCounterLocker TLockerType
 Default locker type for CRef. More...
 
typedef atomic< Uint8TCounter
 Counter type is CAtomiCounter. More...
 
typedef Uint8 TCount
 Alias for value type of counter. More...
 

Public Member Functions

 CSeqDBIsam (CSeqDBAtlas &atlas, const string &dbname, char prot_nucl, char file_ext_char, ESeqDBIdType ident_type)
 Constructor. More...
 
 ~CSeqDBIsam ()
 Destructor. More...
 
bool PigToOid (TPig pig, TOid &oid)
 PIG translation. More...
 
bool IdToOid (Int8 id, TOid &oid)
 GI or TI translation. More...
 
void IdsToOids (int vol_start, int vol_end, CSeqDBGiList &ids)
 Translate Gis and Tis to Oids for the given ID list. More...
 
void IdsToOids (int vol_start, int vol_end, CSeqDBNegativeList &ids)
 Compute list of included OIDs based on a negative ID list. More...
 
void StringToOids (const string &acc, vector< TOid > &oids, bool adjusted, bool &version_check)
 String translation. More...
 
bool SeqidToOid (const string &acc, TOid &oid)
 Seq-id translation. More...
 
void HashToOids (unsigned hash, vector< TOid > &oids)
 Sequence hash lookup. More...
 
void UnLease ()
 Return any memory held by this object to the atlas. More...
 
void GetIdBounds (Int8 &low_id, Int8 &high_id, int &count)
 Get Numeric Bounds. More...
 
void GetIdBounds (string &low_id, string &high_id, int &count)
 Get String Bounds. More...
 
- Public Member Functions inherited from CObject
 CObject (void)
 Constructor. More...
 
 CObject (const CObject &src)
 Copy constructor. More...
 
virtual ~CObject (void)
 Destructor. More...
 
CObjectoperator= (const CObject &src) THROWS_NONE
 Assignment operator. More...
 
bool CanBeDeleted (void) const THROWS_NONE
 Check if object can be deleted. More...
 
bool IsAllocatedInPool (void) const THROWS_NONE
 Check if object is allocated in memory pool (not system heap) More...
 
bool Referenced (void) const THROWS_NONE
 Check if object is referenced. More...
 
bool ReferencedOnlyOnce (void) const THROWS_NONE
 Check if object is referenced only once. More...
 
void AddReference (void) const
 Add reference to object. More...
 
void RemoveReference (void) const
 Remove reference to object. More...
 
void ReleaseReference (void) const
 Remove reference without deleting object. More...
 
virtual void DoNotDeleteThisObject (void)
 Mark this object as not allocated in heap – do not delete this object. More...
 
virtual void DoDeleteThisObject (void)
 Mark this object as allocated in heap – object can be deleted. More...
 
void * operator new (size_t size)
 Define new operator for memory allocation. More...
 
void * operator new[] (size_t size)
 Define new[] operator for 'array' memory allocation. More...
 
void operator delete (void *ptr)
 Define delete operator for memory deallocation. More...
 
void operator delete[] (void *ptr)
 Define delete[] operator for memory deallocation. More...
 
void * operator new (size_t size, void *place)
 Define new operator. More...
 
void operator delete (void *ptr, void *place)
 Define delete operator. More...
 
void * operator new (size_t size, CObjectMemoryPool *place)
 Define new operator using memory pool. More...
 
void operator delete (void *ptr, CObjectMemoryPool *place)
 Define delete operator. More...
 
virtual void DebugDump (CDebugDumpContext ddc, unsigned int depth) const
 Define method for dumping debug information. More...
 
- Public Member Functions inherited from CDebugDumpable
 CDebugDumpable (void)
 
virtual ~CDebugDumpable (void)
 
void DebugDumpText (ostream &out, const string &bundle, unsigned int depth) const
 
void DebugDumpFormat (CDebugDumpFormatter &ddf, const string &bundle, unsigned int depth) const
 
void DumpToConsole (void) const
 

Static Public Member Functions

static bool IndexExists (const string &dbname, char prot_nucl, char file_ext_char)
 Check if a given ISAM index exists. More...
 
- Static Public Member Functions inherited from CObject
static NCBI_XNCBI_EXPORT void ThrowNullPointerException (void)
 Define method to throw null pointer exception. More...
 
static NCBI_XNCBI_EXPORT void ThrowNullPointerException (const type_info &type)
 
static EAllocFillMode GetAllocFillMode (void)
 
static void SetAllocFillMode (EAllocFillMode mode)
 
static void SetAllocFillMode (const string &value)
 Set mode from configuration parameter value. More...
 
- Static Public Member Functions inherited from CDebugDumpable
static void EnableDebugDump (bool on)
 

Private Types

enum  EErrorCode {
  eNotFound = 1 , eNoError = 0 , eBadVersion = -10 , eBadType = -11 ,
  eWrongFile = -12 , eInitFailed = -13
}
 Exit conditions occurring in this code. More...
 

Private Member Functions

template<class T >
void x_LoadIndex (CSeqDBFileMemMap &lease, vector< T > &keys, vector< TIndx > &offs)
 Load and extract all index samples into array at once. More...
 
template<class T >
void x_LoadData (CSeqDBFileMemMap &lease, vector< T > &keys, vector< int > &vals, int num_keys, TIndx begin)
 Load and extract a data page into array at once. More...
 
template<class T >
void x_TranslateGiList (int vol_start, CSeqDBGiList &gis)
 GiList Translation. More...
 
bool x_IdentToOid (Int8 id, TOid &oid)
 Numeric identifier lookup. More...
 
EErrorCode x_SearchIndexNumeric (Int8 Number, int *Data, Uint4 *Index, Int4 &SampleNum, bool &done)
 Index file search. More...
 
void x_SearchNegativeMulti (int vol_start, int vol_end, CSeqDBNegativeList &gis, bool use_tis)
 Negative ID List Translation. More...
 
void x_SearchNegativeMultiSeq (int vol_start, int vol_end, CSeqDBNegativeList &gis)
 
EErrorCode x_SearchDataNumeric (Int8 Number, int *Data, Uint4 *Index, Int4 SampleNum)
 Data file search. More...
 
EErrorCode x_NumericSearch (Int8 Number, int *Data, Uint4 *Index)
 Numeric identifier lookup. More...
 
EErrorCode x_StringSearch (const string &term_in, vector< string > &term_out, vector< string > &value_out, vector< TIndx > &index_out)
 String identifier lookup. More...
 
EErrorCode x_InitSearch (void)
 Initialize the search object. More...
 
int x_GetPageNumElements (Int4 SampleNum, Int4 *Start)
 Determine the number of elements in the data page. More...
 
bool x_SparseStringToOids (const string &acc, vector< int > &oids, bool adjusted)
 Lookup a string in a sparse table. More...
 
int x_DiffCharLease (const string &term_in, CSeqDBFileMemMap &lease, const string &file_name, TIndx file_length, Uint4 at_least, TIndx KeyOffset, bool ignore_case)
 Find the first character to differ in two strings. More...
 
int x_DiffChar (const string &term_in, const char *begin, const char *end, bool ignore_case)
 Find the first character to differ in two strings. More...
 
void x_ExtractData (const char *key_start, const char *entry_end, vector< string > &key_out, vector< string > &data_out)
 Extract the data from a key-value pair in memory. More...
 
TIndx x_GetIndexKeyOffset (TIndx sample_offset, Uint4 sample_num)
 Get the offset of the specified sample. More...
 
void x_GetIndexString (TIndx key_offset, int length, string &prefix, bool trim_to_null)
 Read a string from the index file. More...
 
int x_DiffSample (const string &term_in, Uint4 SampleNum, TIndx &KeyOffset)
 Find the first character to differ in two strings. More...
 
void x_ExtractAllData (const string &term_in, TIndx sample_index, vector< TIndx > &indices_out, vector< string > &keys_out, vector< string > &data_out)
 Find matches in the given page of a string ISAM file. More...
 
void x_ExtractPageData (const string &term_in, TIndx page_index, const char *beginp, const char *endp, vector< TIndx > &indices_out, vector< string > &keys_out, vector< string > &data_out)
 Find matches in the given memory area of a string ISAM file. More...
 
void x_LoadPage (TIndx SampleNum1, TIndx SampleNum2, const char **beginp, const char **endp)
 Map a page into memory. More...
 
int x_TestNumericSample (CSeqDBFileMemMap &index_lease, int index, Int8 key_in, Int8 &key_out, int &data_out)
 Test a sample key value from a numeric index. More...
 
void x_GetNumericSample (CSeqDBFileMemMap &index_lease, int index, Int8 &key_out, int &data_out)
 Get a sample key value from a numeric index. More...
 
bool x_FindInNegativeList (CSeqDBNegativeList &ids, int &index, Int8 key, bool use_tis)
 Find ID in the negative GI list using PBS. More...
 
bool x_FindInNegativeList (CSeqDBNegativeList &ids, int &index, string key)
 
void x_MapDataPage (int sample_index, int &start, int &num_elements, const void **data_page_begin)
 Map a data page. More...
 
void x_GetDataElement (const void *dpage, int index, Int8 &key, int &data)
 Get a particular data element from a data page. More...
 
void x_GetDataElement (const void *dpage, int index, string &key, int &data)
 
void x_FindIndexBounds ()
 Find the least and greatest keys in this ISAM file. More...
 
bool x_OutOfBounds (Int8 key)
 Check whether a numeric key is within this volume's bounds. More...
 
bool x_OutOfBounds (string key)
 Check whether a string key is within this volume's bounds. More...
 
Uint8 x_GetNumericKey (const void *p)
 
int x_GetNumericData (const void *p)
 
void x_LoadStringData (const char *begin, string &key, int &data)
 
template<>
void x_LoadIndex (CSeqDBFileMemMap &lease, vector< TGi > &keys, vector< TIndx > &offs)
 Load and extract all index samples into array at once. More...
 
template<>
void x_LoadData (CSeqDBFileMemMap &lease, vector< TGi > &keys, vector< int > &vals, int num_keys, TIndx begin)
 Load and extract a data page into array at once. More...
 
template<>
void x_LoadIndex (CSeqDBFileMemMap &lease, vector< string > &keys, vector< TIndx > &offs)
 
template<>
void x_LoadData (CSeqDBFileMemMap &lease, vector< string > &keys, vector< int > &vals, int num_keys, TIndx begin)
 

Static Private Member Functions

static void x_Lower (string &s)
 Converts a string to lower case. More...
 
static Int8 x_GetId (CSeqDBNegativeList &ids, int index, bool use_tis)
 Fetch a GI or TI from a GI list. More...
 
static string x_GetId (CSeqDBNegativeList &ids, int index)
 
static void x_MakeFilenames (const string &dbname, char prot_nucl, char file_ext_char, string &index_name, string &data_name)
 Make filenames for ISAM file. More...
 

Private Attributes

CSeqDBAtlasm_Atlas
 The memory management layer. More...
 
ESeqDBIdType m_IdentType
 The type of identifier this class uses. More...
 
CSeqDBFileMemMap m_IndexLease
 A persistent lease on the ISAM index file. More...
 
CSeqDBFileMemMap m_DataLease
 A persistent lease on the ISAM data file. More...
 
int m_Type
 The format type of database files found (eNumeric or eString). More...
 
string m_DataFname
 The filename of the ISAM data file. More...
 
string m_IndexFname
 The filename of the ISAM index file. More...
 
TIndx m_DataFileLength
 The length of the ISAM data file. More...
 
TIndx m_IndexFileLength
 The length of the ISAM index file. More...
 
Int4 m_NumTerms
 Number of terms in database. More...
 
Int4 m_NumSamples
 Number of terms in ISAM index. More...
 
Int4 m_PageSize
 Page size of ISAM index. More...
 
Int4 m_MaxLineSize
 Maximum string length in the database. More...
 
Int4 m_IdxOption
 Options set by upper layer. More...
 
bool m_Initialized
 Flag indicating whether initialization has been done. More...
 
TIndx m_KeySampleOffset
 Offset of samples in index file. More...
 
bool m_TestNonUnique
 Check if data for String ISAM sorted. More...
 
char * m_FileStart
 Pointer to index file if no memmap. More...
 
Int4 m_FirstOffset
 First and last offset's of last page. More...
 
Int4 m_LastOffset
 First and last offset's of last page. More...
 
SIsamKey m_FirstKey
 First volume key. More...
 
SIsamKey m_LastKey
 Last volume key. More...
 
bool m_LongId
 Use Uint8 for the key. More...
 
int m_TermSize
 size of the numeric key-data pair More...
 

Additional Inherited Members

- Static Public Attributes inherited from CObject
static const TCount eCounterBitsCanBeDeleted = 1 << 0
 Define possible object states. More...
 
static const TCount eCounterBitsInPlainHeap = 1 << 1
 Heap signature was found. More...
 
static const TCount eCounterBitsPlaceMask
 Mask for 'in heap' state flags. More...
 
static const int eCounterStep = 1 << 2
 Skip over the "in heap" bits. More...
 
static const TCount eCounterValid = TCount(1) << (sizeof(TCount) * 8 - 2)
 Minimal value for valid objects (reference counter is zero) Must be a single bit value. More...
 
static const TCount eCounterStateMask
 Valid object, and object in heap. More...
 
- Protected Member Functions inherited from CObject
virtual void DeleteThis (void)
 Virtual method "deleting" this object. More...
 

Detailed Description

CSeqDBIsam.

Manages one ISAM file, which will translate either PIGs, GIs, or Accessions to OIDs. Translation in the other direction is done in the CSeqDBVol code. Files managed by this class include those with the extensions pni, pnd, ppi, ppd, psi, psd, nsi, nsd, nni, and nnd. Each instance of this object will manage one pair of these files, including one whose name ends in 'i' and one whose name ends in 'd'.

Definition at line 127 of file seqdbisam.hpp.

Member Typedef Documentation

◆ TGiOid

Import the type representing one GI, OID association.

Definition at line 130 of file seqdbisam.hpp.

◆ TId

Type large enough to hold any numerical ID.

Definition at line 158 of file seqdbisam.hpp.

◆ TIndx

Type which is large enough to span the bytes of an ISAM file.

Definition at line 143 of file seqdbisam.hpp.

◆ TOid

This class works with OIDs relative to a specific volume.

Definition at line 146 of file seqdbisam.hpp.

◆ TTi

PIG identifiers for numeric indices over protein volumes.

Genomic IDs, the most common numerical identifier. Identifier type for trace databases.

Definition at line 155 of file seqdbisam.hpp.

Member Enumeration Documentation

◆ EErrorCode

enum CSeqDBIsam::EErrorCode
private

Exit conditions occurring in this code.

Enumerator
eNotFound 
eNoError 

The key was not found.

eBadVersion 

Lookup was successful.

eBadType 

The format version of the ISAM file is unsupported.

eWrongFile 

The requested ISAM type did not match the file.

eInitFailed 

The file was not found, or was the wrong length.

Definition at line 489 of file seqdbisam.hpp.

◆ EIsamDbType

Types of database this class can access.

Enumerator
eNumeric 
eNumericNoData 

Numeric database with Key/Value pairs in the index file.

eString 

This type is not supported.

eStringDatabase 

String database type used here.

eStringBin 

This type is not supported.

eNumericLongId 

This type is not supported.

Definition at line 133 of file seqdbisam.hpp.

Constructor & Destructor Documentation

◆ CSeqDBIsam()

CSeqDBIsam::CSeqDBIsam ( CSeqDBAtlas atlas,
const string dbname,
char  prot_nucl,
char  file_ext_char,
ESeqDBIdType  ident_type 
)

Constructor.

An ISAM file object corresponds to an index file and a data file, and converts identifiers (string, GI, or PIG) into OIDs relative to a particular database volume.

Parameters
atlasThe memory management object. [in]
dbnameThe name of the volume's files (minus the extension). [in]
prot_nuclWhether the sequences are protein or nucleotide. [in]
file_ext_charThis is 's', 'n', or 'p', for string, GI, or PIG, respectively. [in]
ident_typeThe type of identifiers this database translates. [in]

Definition at line 1102 of file seqdbisam.cpp.

References dbname(), DEFAULT_NISAM_SIZE, DEFAULT_SISAM_SIZE, eGiId, eHashId, eNoError, eNumeric, ePigId, eString, eStringId, eTiId, CSeqDBFileMemMap::Init(), m_DataFname, m_DataLease, m_IndexFname, m_IndexLease, m_Initialized, m_PageSize, m_Type, NCBI_THROW, x_FindIndexBounds(), x_InitSearch(), and x_MakeFilenames().

◆ ~CSeqDBIsam()

CSeqDBIsam::~CSeqDBIsam ( )

Destructor.

Releases all resources associated with this object.

Definition at line 1210 of file seqdbisam.cpp.

References UnLease().

Member Function Documentation

◆ GetIdBounds() [1/2]

void CSeqDBIsam::GetIdBounds ( Int8 low_id,
Int8 high_id,
int count 
)

Get Numeric Bounds.

Fetch the lowest, highest, and total number of numeric keys in the database index. If the operation fails, zero will be returned for count.

Parameters
low_idLowest numeric id value in database. [out]
high_idHighest numeric id value in database. [out]
countNumber of numeric id values in database. [out]
lockedLock holder object for this thread. [in]

Definition at line 1624 of file seqdbisam.cpp.

References CSeqDBIsam::SIsamKey::GetNumeric(), CSeqDBIsam::SIsamKey::IsSet(), m_FirstKey, m_Initialized, m_LastKey, and m_NumTerms.

Referenced by CSeqDBVol::GetGiBounds(), CSeqDBVol::GetPigBounds(), and CSeqDBVol::GetStringBounds().

◆ GetIdBounds() [2/2]

void CSeqDBIsam::GetIdBounds ( string low_id,
string high_id,
int count 
)

Get String Bounds.

Fetch the lowest, highest, and total number of string keys in the database index. If the operation fails, zero will be returned for count.

Parameters
low_idLowest string id value in database. [out]
high_idHighest string id value in database. [out]
countNumber of string id values in database. [out]
lockedLock holder object for this thread. [in]

Definition at line 1645 of file seqdbisam.cpp.

References CSeqDBIsam::SIsamKey::GetString(), CSeqDBIsam::SIsamKey::IsSet(), m_FirstKey, m_Initialized, m_LastKey, and m_NumTerms.

◆ HashToOids()

void CSeqDBIsam::HashToOids ( unsigned  hash,
vector< TOid > &  oids 
)

Sequence hash lookup.

This methods tries to find sequences associated with a given sequence hash value. The provided value is numeric but the ISAM file uses a string format, because string searches can return multiple results per key, and there may be multiple OIDs for a given hash value due to identical sequences and collisions.

Parameters
hashThe sequence hash value to look up. [in]
oidsThe returned oids. [out]
lockedThe lock hold object for this thread. [in|out]

Definition at line 1666 of file seqdbisam.cpp.

References _ASSERT, eHashId, eNoError, eNotFound, ITERATE, ncbi::grid::netcache::search::fields::key, m_IdentType, m_Initialized, NStr::UIntToString(), and x_StringSearch().

Referenced by CSeqDBVol::HashToOids().

◆ IdsToOids() [1/2]

void CSeqDBIsam::IdsToOids ( int  vol_start,
int  vol_end,
CSeqDBGiList ids 
)

Translate Gis and Tis to Oids for the given ID list.

This method iterates over a vector of Gi/OID and/or Ti/OID pairs. For each pair where the OID is -1, the GI or TI will be looked up in the ISAM file, and (if found) the correct OID will be stored (otherwise the -1 will remain). This method will normally be called once for each volume.

Parameters
vol_startThe starting OID of this volume. [in]
vol_endThe fist OID past the end of this volume. [in]
idsThe set of GI-OID or TI-OID pairs. [in|out]
lockedThe lock holder object for this thread. [in|out]

Definition at line 1387 of file seqdbisam.cpp.

References eGiId, ePigId, eStringId, eTiId, m_IdentType, and NCBI_THROW.

Referenced by CSeqDBVol::IdsToOids().

◆ IdsToOids() [2/2]

void CSeqDBIsam::IdsToOids ( int  vol_start,
int  vol_end,
CSeqDBNegativeList ids 
)

Compute list of included OIDs based on a negative ID list.

This method iterates over a vector of Gis or Tis, along with the corresponding ISAM file for this volume. Each OID found in the ISAM file is marked in the negative ID list. For those for which the GI or TI is not mentioned in the negative ID list, the OID will be marked as an 'included' OID in the ID list (that OID will be searched). The OIDs for IDs that are not found in the ID list will be marked as 'visible' OIDs. When this process is done for all volumes, the SeqDB object will use all OIDs that are either marked as 'included' or NOT marked as 'visible'. The 'visible' list is needed because otherwise iteration would skip IDs that are do not have GIs or TIs (whichever is being iterated). To use this method, this volume must have an ISAM file matching the negative ID list's identifier type or an exception will be thrown.

Parameters
vol_startThe starting OID of this volume. [in]
vol_endThe fist OID past the end of this volume. [in]
idsThe set of GI-OID pairs. [in|out]
lockedThe lock holder object for this thread. [in|out]

Definition at line 1420 of file seqdbisam.cpp.

References _ASSERT, eGiId, eStringId, eTiId, CSeqDBNegativeList::GetNumGis(), CSeqDBNegativeList::GetNumSis(), CSeqDBNegativeList::GetNumTis(), CSeqDBNegativeList::InsureOrder(), m_IdentType, x_SearchNegativeMulti(), and x_SearchNegativeMultiSeq().

◆ IdToOid()

bool CSeqDBIsam::IdToOid ( Int8  id,
TOid oid 
)
inline

GI or TI translation.

A GI or TI identifier is translated to an OID. GI identifiers are used for all types of sequences. TI identifiers are used primarily for nucleotide data in the Trace DBs. Multiple GIs may indicate the same sequence of bases and the same OID, but TIs are usually unique.

Parameters
idThe GI or TI to look up. [in]
oidThe returned oid. [out]
lockedThe lock hold object for this thread. [in|out]
Returns
true if the GI was found

Definition at line 225 of file seqdbisam.hpp.

References _ASSERT, eGiId, eTiId, m_IdentType, and x_IdentToOid().

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBVol::GiToOid(), CTestThread::Main(), CSeqDBVol::TiToOid(), and CSeqDBVol::x_StringToOids().

◆ IndexExists()

bool CSeqDBIsam::IndexExists ( const string dbname,
char  prot_nucl,
char  file_ext_char 
)
static

Check if a given ISAM index exists.

Parameters
dbnameBase name of the database volume.
prot_nucl'n' or 'p' for protein or nucleotide.
file_ext_charIdentifier symbol; 's' for string, etc.

Definition at line 1200 of file seqdbisam.cpp.

References dbname(), CFile::Exists(), and x_MakeFilenames().

Referenced by CSeqDBVol::GetGi(), CSeqDBVol::x_OpenGiFile(), CSeqDBVol::x_OpenHashFile(), CSeqDBVol::x_OpenPigFile(), CSeqDBVol::x_OpenStrFile(), and CSeqDBVol::x_OpenTiFile().

◆ PigToOid()

bool CSeqDBIsam::PigToOid ( TPig  pig,
TOid oid 
)
inline

PIG translation.

A PIG identifier is translated to an OID. PIG identifiers are used exclusively for protein sequences. One PIG corresponds to exactly one sequences of amino acids, and vice versa. They are also stable; the sequence a PIG points to will never be changed.

Parameters
pigThe PIG to look up. [in]
oidThe returned oid. [out]
lockedThe lock hold object for this thread. [in|out]
Returns
true if the PIG was found

Definition at line 203 of file seqdbisam.hpp.

References _ASSERT, ePigId, m_IdentType, and x_IdentToOid().

Referenced by CSeqDBVol::PigToOid(), and CSeqDBVol::x_StringToOids().

◆ SeqidToOid()

bool CSeqDBIsam::SeqidToOid ( const string acc,
TOid oid 
)

Seq-id translation.

A Seq-id identifier (serialized to a string) is translated into an OID. This routine will attempt to simplify the seqid so as to use the faster numeric lookup techniques whenever possible.

Parameters
accA string containing the Seq-id. [in]
oidThe returned oid. [out]
lockedThe lock hold object for this thread. [in|out]

◆ StringToOids()

void CSeqDBIsam::StringToOids ( const string acc,
vector< TOid > &  oids,
bool  adjusted,
bool version_check 
)

String translation.

A string id is translated to one or more OIDs. String ids are used by some groups which produce sequence data. In some cases, the string may correspond to more than one OID. For this reason, the OIDs are returned in a vector. The string provided is looked up in several ways. If it contains a pipe character ("|") the data will be interpreted as a SeqID. This routine can use faster lookup mechanisms if the simplification routines were able to recognize the sequence as one of several types that have numerical indices. The version_check flag is needed to support sparse indexing. If version_check is true, and the string has a version, and the lookup fails, this method will try to remove the version and search again. On return from this method version_check will be set to true if and only if the first search failed and the versionless search succeeded. CSeqDBVol::x_CheckVersions() can then be called to verify the OIDs; see that method for more information about this scenario.

Parameters
accThe string to look up. [in]
oidsThe returned oids. [out]
adjustedWhether the simplification adjusted the string. [in|out]
version_checkIf the version can be stripped [in] and if it was [out].
lockedThe lock hold object for this thread. [in|out]

Definition at line 1235 of file seqdbisam.cpp.

References _ASSERT, CSeq_id::AsFastaString(), eNoError, eNotFound, eStringId, CSeq_id::fParse_AnyLocal, CSeq_id::fParse_RawText, isdigit(), ITERATE, m_IdentType, m_Initialized, ncbi::grid::netcache::search::fields::size, and x_StringSearch().

Referenced by CSeqDBVol::x_StringToOids().

◆ UnLease()

void CSeqDBIsam::UnLease ( )

Return any memory held by this object to the atlas.

Definition at line 1215 of file seqdbisam.cpp.

References CSeqDBFileMemMap::Clear(), m_DataLease, and m_IndexLease.

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBVol::UnLease(), and ~CSeqDBIsam().

◆ x_DiffChar()

Int4 CSeqDBIsam::x_DiffChar ( const string term_in,
const char *  begin,
const char *  end,
bool  ignore_case 
)
private

Find the first character to differ in two strings.

This finds the index of the first character to differ in meaningful way between two strings. One of the strings is a term that is passed in; the other is a range of memory represented by two pointers.

Parameters
term_inThe key string to compare against.
beginA pointer to the start of the second string.
endA pointer to the end of the second string.
ignore_caseWhether to treat the search as case-sensitive
Returns
The position of the first difference.

Definition at line 589 of file seqdbisam.cpp.

References ch1, ch2, ENDS_ISAM_KEY(), i, int, result, s_SeqDBIsam_NullifyEOLs(), and toupper().

Referenced by x_DiffCharLease(), x_ExtractAllData(), and x_ExtractPageData().

◆ x_DiffCharLease()

int CSeqDBIsam::x_DiffCharLease ( const string term_in,
CSeqDBFileMemMap lease,
const string file_name,
TIndx  file_length,
Uint4  at_least,
TIndx  KeyOffset,
bool  ignore_case 
)
private

Find the first character to differ in two strings.

This finds the index of the first character to differ in meaningful way between two strings. One of the strings is a term that is passed in; the other is assumed to be located in the ISAM table, a lease to which is passed to this function.

Parameters
term_inThe key string to compare against.
leaseA lease to hold the data in the ISAM table file.
file_nameThe name of the ISAM file to work with.
file_lengthThe length of the file named by file_name.
at_leastTry to get at least this many bytes.
KeyOffsetThe location of the key in the leased file.
ignore_caseWhether to treat the search as case-sensitive
lockedThe lock holder object for this thread.
Returns
The position of the first difference.

Definition at line 516 of file seqdbisam.cpp.

References file_name, CSeqDBFileMemMap::GetFileDataPtr(), int, result, and x_DiffChar().

Referenced by x_DiffSample().

◆ x_DiffSample()

int CSeqDBIsam::x_DiffSample ( const string term_in,
Uint4  SampleNum,
TIndx KeyOffset 
)
private

Find the first character to differ in two strings.

This finds the index of the first character to differ between two strings. The first string is provided, the second is one of the sample strings, indicated by the index of that sample value.

Parameters
term_inThe key string to compare against.
SampleNumSelects which sample to compare with.
KeyOffsetThe returned offset of the key that was used.
lockedThis thread's lock holder object.

Definition at line 863 of file seqdbisam.cpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_IndexFileLength, m_IndexFname, m_IndexLease, m_KeySampleOffset, m_MaxLineSize, m_NumSamples, m_PageSize, MEMORY_ONLY_PAGE_SIZE, SeqDB_GetStdOrd(), and x_DiffCharLease().

Referenced by x_StringSearch().

◆ x_ExtractAllData()

void CSeqDBIsam::x_ExtractAllData ( const string term_in,
TIndx  sample_index,
vector< TIndx > &  indices_out,
vector< string > &  keys_out,
vector< string > &  data_out 
)
private

Find matches in the given page of a string ISAM file.

This searches the area around a specific page of the data file to find all matches to term_in. The results are returned in vectors. This method may search multiple pages.

Parameters
term_inThe key string to compare against.
sample_indexSelects which page to search.
indices_outThe index of each match.
keys_outThe key of each match.
data_outThe value of each match.
lockedThis thread's lock holder object.

Definition at line 688 of file seqdbisam.cpp.

References m_NumSamples, m_PageSize, s_SeqDBIsam_NullifyEOLs(), x_DiffChar(), x_ExtractPageData(), and x_LoadPage().

Referenced by x_StringSearch().

◆ x_ExtractData()

void CSeqDBIsam::x_ExtractData ( const char *  key_start,
const char *  entry_end,
vector< string > &  key_out,
vector< string > &  data_out 
)
private

Extract the data from a key-value pair in memory.

Given pointers to a location in mapped memory, and the end of the mapped data, this finds the key and data values for the object at that location.

Parameters
key_startA pointer to the beginning of the key-value pair in memory.
entry_endA pointer to the end of the mapped area of memory.
key_outA string holding the ISAM entry's key
data_outA string holding the ISAM entry's value

Definition at line 793 of file seqdbisam.cpp.

References ISAM_DATA_CHAR, and s_SeqDBIsam_NullifyEOLs().

Referenced by x_ExtractPageData(), and x_FindIndexBounds().

◆ x_ExtractPageData()

void CSeqDBIsam::x_ExtractPageData ( const string term_in,
TIndx  page_index,
const char *  beginp,
const char *  endp,
vector< TIndx > &  indices_out,
vector< string > &  keys_out,
vector< string > &  data_out 
)
private

Find matches in the given memory area of a string ISAM file.

This searches the specified section of memory to find all matches to term_in. The results are returned in vectors.

Parameters
term_inThe key string to compare against.
page_indexSelects which page to search.
beginpPointer to the start of the memory area
endpPointer to the end of the memory area
indices_outThe index of each match.
keys_outThe key of each match.
data_outThe value of each match.

Definition at line 634 of file seqdbisam.cpp.

References s_SeqDBIsam_NullifyEOLs(), x_DiffChar(), and x_ExtractData().

Referenced by x_ExtractAllData(), and x_StringSearch().

◆ x_FindIndexBounds()

void CSeqDBIsam::x_FindIndexBounds ( )
private

◆ x_FindInNegativeList() [1/2]

bool CSeqDBIsam::x_FindInNegativeList ( CSeqDBNegativeList ids,
int index,
Int8  key,
bool  use_tis 
)
inlineprivate

Find ID in the negative GI list using PBS.

Use parabolic binary search to find the specified ID in the negative ID list. The 'index' value is the index to start the search at (this must refer to an index at or before the target data if the search is to succeed). Whether the search was successful or not, the index will be moved forward past any elements with values less than 'key'.

Parameters
idsNegative ID list. [in|out]
indexIndex into negative ID list. [in|out]
keyKey for which to search. [in]
use_tisIf true, search for a TI, else for a GI. [in]
Returns
True if the search found the ID.

Definition at line 1428 of file seqdbisam.hpp.

References ncbi::grid::netcache::search::fields::key, CSeqDBNegativeList::ListSize(), and x_GetId().

Referenced by x_SearchNegativeMulti(), and x_SearchNegativeMultiSeq().

◆ x_FindInNegativeList() [2/2]

bool CSeqDBIsam::x_FindInNegativeList ( CSeqDBNegativeList ids,
int index,
string  key 
)
inlineprivate

◆ x_GetDataElement() [1/2]

void CSeqDBIsam::x_GetDataElement ( const void *  dpage,
int  index,
Int8 key,
int data 
)
inlineprivate

Get a particular data element from a data page.

Parameters
dpageA pointer to that page in memory. [in]
indexThe index of the element to fetch. [in]
keyThe returned key. [out]
dataThe returned value. [out]

Definition at line 1509 of file seqdbisam.hpp.

References ncbi::grid::netcache::search::fields::key, m_TermSize, x_GetNumericData(), and x_GetNumericKey().

Referenced by x_FindIndexBounds(), and x_SearchNegativeMulti().

◆ x_GetDataElement() [2/2]

void CSeqDBIsam::x_GetDataElement ( const void *  dpage,
int  index,
string key,
int data 
)
inlineprivate

◆ x_GetId() [1/2]

static string CSeqDBIsam::x_GetId ( CSeqDBNegativeList ids,
int  index 
)
inlinestaticprivate

Definition at line 1158 of file seqdbisam.hpp.

References CSeqDBNegativeList::GetSi().

◆ x_GetId() [2/2]

static Int8 CSeqDBIsam::x_GetId ( CSeqDBNegativeList ids,
int  index,
bool  use_tis 
)
inlinestaticprivate

Fetch a GI or TI from a GI list.

Definition at line 1151 of file seqdbisam.hpp.

References CSeqDBNegativeList::GetGi(), CSeqDBNegativeList::GetTi(), and GI_TO.

Referenced by x_FindInNegativeList().

◆ x_GetIndexKeyOffset()

CSeqDBIsam::TIndx CSeqDBIsam::x_GetIndexKeyOffset ( TIndx  sample_offset,
Uint4  sample_num 
)
private

Get the offset of the specified sample.

For string ISAM indices, the index file contains a table of offsets of the index file samples. This function gets the offset of the specified sample in the index file's table.

Parameters
sample_offsetThe offset into the file of the set of samples.
sample_numThe index of the sample to get.
lockedThis thread's lock holder object.
Returns
The offset of the sample in the index file.

Definition at line 823 of file seqdbisam.cpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_IndexLease, and SeqDB_GetStdOrd().

Referenced by x_StringSearch().

◆ x_GetIndexString()

void CSeqDBIsam::x_GetIndexString ( TIndx  key_offset,
int  length,
string prefix,
bool  trim_to_null 
)
private

Read a string from the index file.

Given an offset into the index file, and a maximum length, this function returns the bytes in a string object.

Parameters
key_offsetThe offset into the file of the first byte.
lengthThe maximum number of bytes to get.
prefixThe string in which to return the data.
trim_to_nullWhether to search for a null and return only that much data.
lockedThis thread's lock holder object.

Definition at line 836 of file seqdbisam.cpp.

References CSeqDBFileMemMap::GetFileDataPtr(), i, m_IndexLease, and str().

Referenced by x_StringSearch().

◆ x_GetNumericData()

int CSeqDBIsam::x_GetNumericData ( const void *  p)
inlineprivate

◆ x_GetNumericKey()

Uint8 CSeqDBIsam::x_GetNumericKey ( const void *  p)
inlineprivate

◆ x_GetNumericSample()

void CSeqDBIsam::x_GetNumericSample ( CSeqDBFileMemMap index_lease,
int  index,
Int8 key_out,
int data_out 
)
inlineprivate

Get a sample key value from a numeric index.

Given the index of a sample value, this code will get the key. If data values are stored in the index file, the corresponding data value will also be returned. The offset of the data block is computed and returned as well.

Parameters
index_leaseThe memory lease to use with the index file.
indexThe index of the sample to get.
key_outThe key found will be returned here.
data_outIf an exact match, the data found will be returned here.

Definition at line 1315 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_KeySampleOffset, m_TermSize, x_GetNumericData(), and x_GetNumericKey().

◆ x_GetPageNumElements()

Int4 CSeqDBIsam::x_GetPageNumElements ( Int4  SampleNum,
Int4 Start 
)
private

Determine the number of elements in the data page.

The number of elements is determined based on whether this is the last page and the configured page size.

Parameters
SampleNumWhich data page will be searched.
StartThe returned index of the start of the page.
Returns
The number of elements in this data page.

Definition at line 123 of file seqdbisam.cpp.

References m_NumSamples, m_NumTerms, and m_PageSize.

Referenced by x_MapDataPage(), and x_SearchDataNumeric().

◆ x_IdentToOid()

bool CSeqDBIsam::x_IdentToOid ( Int8  id,
TOid oid 
)
private

Numeric identifier lookup.

Given a numeric identifier, this routine finds the OID.

Parameters
idThe GI or PIG identifier to look up.
oidThe returned oid.
lockedThe lock holder object for this thread.
Returns
true if the identifier was found.

Definition at line 1221 of file seqdbisam.cpp.

References eNoError, and x_NumericSearch().

Referenced by IdToOid(), and PigToOid().

◆ x_InitSearch()

CSeqDBIsam::EErrorCode CSeqDBIsam::x_InitSearch ( void  )
private

Initialize the search object.

The first identifier search sets up the object by calling this function, which reads the metadata from the index file and sets all the fields needed for ISAM lookups.

Parameters
lockedThe lock holder object for this thread.
Returns
A non-zero error on failure, or eNoError on success.

Definition at line 59 of file seqdbisam.cpp.

References eBadType, eBadVersion, eNoError, eNumeric, eNumericLongId, eWrongFile, CSeqDBFileMemMap::GetFileDataPtr(), CSeqDBAtlas::GetFileSizeL(), ISAM_VERSION, m_Atlas, m_DataFileLength, m_DataFname, m_IdxOption, m_IndexFileLength, m_IndexFname, m_IndexLease, m_Initialized, m_KeySampleOffset, m_LongId, m_MaxLineSize, m_NumSamples, m_NumTerms, m_PageSize, m_TermSize, m_Type, MEMORY_ONLY_PAGE_SIZE, and SeqDB_GetStdOrd().

Referenced by CSeqDBIsam().

◆ x_LoadData() [1/3]

template<>
void CSeqDBIsam::x_LoadData ( CSeqDBFileMemMap lease,
vector< string > &  keys,
vector< int > &  vals,
int  num_keys,
TIndx  begin 
)
inlineprivate

Definition at line 1406 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), and NStr::StringToUInt().

◆ x_LoadData() [2/3]

template<class T >
void CSeqDBIsam::x_LoadData ( CSeqDBFileMemMap lease,
vector< T > &  keys,
vector< int > &  vals,
int  num_keys,
TIndx  begin 
)
inlineprivate

Load and extract a data page into array at once.

Definition at line 519 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_TermSize, T, x_GetNumericData(), and x_GetNumericKey().

Referenced by x_SearchNegativeMultiSeq(), and x_TranslateGiList().

◆ x_LoadData() [3/3]

template<>
void CSeqDBIsam::x_LoadData ( CSeqDBFileMemMap lease,
vector< TGi > &  keys,
vector< int > &  vals,
int  num_keys,
TIndx  begin 
)
inlineprivate

Load and extract a data page into array at once.

Definition at line 1353 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), GI_FROM, m_TermSize, x_GetNumericData(), and x_GetNumericKey().

◆ x_LoadIndex() [1/3]

template<>
void CSeqDBIsam::x_LoadIndex ( CSeqDBFileMemMap lease,
vector< string > &  keys,
vector< TIndx > &  offs 
)
inlineprivate

◆ x_LoadIndex() [2/3]

template<class T >
void CSeqDBIsam::x_LoadIndex ( CSeqDBFileMemMap lease,
vector< T > &  keys,
vector< TIndx > &  offs 
)
inlineprivate

Load and extract all index samples into array at once.

Definition at line 500 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_KeySampleOffset, m_NumSamples, m_NumTerms, m_PageSize, m_TermSize, T, and x_GetNumericKey().

Referenced by x_SearchNegativeMultiSeq(), and x_TranslateGiList().

◆ x_LoadIndex() [3/3]

template<>
void CSeqDBIsam::x_LoadIndex ( CSeqDBFileMemMap lease,
vector< TGi > &  keys,
vector< TIndx > &  offs 
)
inlineprivate

Load and extract all index samples into array at once.

Definition at line 1333 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), GI_FROM, m_KeySampleOffset, m_NumSamples, m_NumTerms, m_PageSize, m_TermSize, and x_GetNumericKey().

◆ x_LoadPage()

void CSeqDBIsam::x_LoadPage ( TIndx  SampleNum1,
TIndx  SampleNum2,
const char **  beginp,
const char **  endp 
)
private

Map a page into memory.

Given two indices, this method maps into memory the area starting at the beginning of the first index and extending to the end of the other. (If the indices are equal, only one page would be mapped.)

Parameters
SampleNum1The first page index.
SampleNum2The second page index.
beginpThe returned starting offset of the mapped area.
endpThe returned ending offset of the mapped area.
lockedThis thread's lock holder object.

Definition at line 899 of file seqdbisam.cpp.

References _ASSERT, CSeqDBFileMemMap::GetFileDataPtr(), m_DataFname, m_DataLease, m_IndexLease, m_KeySampleOffset, and SeqDB_GetStdOrd().

Referenced by x_ExtractAllData(), x_FindIndexBounds(), and x_StringSearch().

◆ x_LoadStringData()

void CSeqDBIsam::x_LoadStringData ( const char *  begin,
string key,
int data 
)
inlineprivate

Definition at line 1267 of file seqdbisam.hpp.

References ncbi::grid::netcache::search::fields::key, string, and NStr::StringToUInt().

Referenced by x_GetDataElement().

◆ x_Lower()

static void CSeqDBIsam::x_Lower ( string s)
inlinestaticprivate

Converts a string to lower case.

Definition at line 1143 of file seqdbisam.hpp.

References i, and tolower().

Referenced by x_FindIndexBounds(), and x_OutOfBounds().

◆ x_MakeFilenames()

void CSeqDBIsam::x_MakeFilenames ( const string dbname,
char  prot_nucl,
char  file_ext_char,
string index_name,
string data_name 
)
staticprivate

Make filenames for ISAM file.

Parameters
dbnameBase name of the database volume. [in]
prot_nucl'n' or 'p' for protein or nucleotide. [in]
file_ext_charIdentifier symbol; 's' for string, etc. [in]
index_nameFilename of ISAM index file. [out]
data_nameFilename of ISAM data file. [out]

Definition at line 1172 of file seqdbisam.cpp.

References dbname(), isalpha(), and NCBI_THROW.

Referenced by CSeqDBIsam(), and IndexExists().

◆ x_MapDataPage()

void CSeqDBIsam::x_MapDataPage ( int  sample_index,
int start,
int num_elements,
const void **  data_page_begin 
)
inlineprivate

Map a data page.

The caller provides an index into the sample file. The page of data is mapped, and a pointer is returned. In addition, the starting index (start) of the data is returned, along with the number of elements in that page.

Parameters
sample_indexIndex into the index (i.e. pni) file. [in]
startIndex of first element of the page. [out]
num_elementsNumber of elements in the page. [out]
data_page_beginPointer to the returned data. [out]
lockedThe lock holder object for this thread. [out]

Definition at line 1493 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_DataFname, m_DataLease, m_TermSize, and x_GetPageNumElements().

Referenced by x_FindIndexBounds(), and x_SearchNegativeMulti().

◆ x_NumericSearch()

CSeqDBIsam::EErrorCode CSeqDBIsam::x_NumericSearch ( Int8  Number,
int Data,
Uint4 Index 
)
private

Numeric identifier lookup.

Given a numeric identifier, this routine finds the OID.

Parameters
NumberThe GI or PIG identifier to look up.
DataThe returned OID.
IndexThe returned location in the ISAM table, or NULL.
lockedThe lock holder object for this thread.
Returns
A non-zero error on failure, or eNoError on success.

Definition at line 498 of file seqdbisam.cpp.

References done, x_SearchDataNumeric(), and x_SearchIndexNumeric().

Referenced by x_IdentToOid().

◆ x_OutOfBounds() [1/2]

bool CSeqDBIsam::x_OutOfBounds ( Int8  key)
private

Check whether a numeric key is within this volume's bounds.

Parameters
keyThe key for which to do the check.
lockedThe lock holder object for this thread.

Definition at line 1584 of file seqdbisam.cpp.

References _ASSERT, eNumeric, CSeqDBIsam::SIsamKey::IsSet(), ncbi::grid::netcache::search::fields::key, m_FirstKey, m_LastKey, m_Type, CSeqDBIsam::SIsamKey::OutsideFirstBound(), and CSeqDBIsam::SIsamKey::OutsideLastBound().

Referenced by x_SearchIndexNumeric(), and x_StringSearch().

◆ x_OutOfBounds() [2/2]

bool CSeqDBIsam::x_OutOfBounds ( string  key)
private

Check whether a string key is within this volume's bounds.

Parameters
keyThe key for which to do the check.
lockedThe lock holder object for this thread.

Definition at line 1603 of file seqdbisam.cpp.

References _ASSERT, eString, CSeqDBIsam::SIsamKey::IsSet(), ncbi::grid::netcache::search::fields::key, m_FirstKey, m_LastKey, m_Type, CSeqDBIsam::SIsamKey::OutsideFirstBound(), CSeqDBIsam::SIsamKey::OutsideLastBound(), and x_Lower().

◆ x_SearchDataNumeric()

CSeqDBIsam::EErrorCode CSeqDBIsam::x_SearchDataNumeric ( Int8  Number,
int Data,
Uint4 Index,
Int4  SampleNum 
)
private

Data file search.

Given a numeric identifier, this routine finds the OID in the data file.

Parameters
NumberThe GI or PIG identifier to look up.
DataThe returned OID.
IndexThe returned location in the ISAM table, or NULL.
SampleNumThe location of the page in the data file to search.
lockedThe lock holder object for this thread.
Returns
A non-zero error on failure, or eNoError on success.

Definition at line 421 of file seqdbisam.cpp.

References _ASSERT, eNoError, eNotFound, eNumericNoData, first(), CSeqDBFileMemMap::GetFileDataPtr(), last(), m_DataFname, m_DataLease, m_TermSize, m_Type, NULL, x_GetNumericData(), x_GetNumericKey(), and x_GetPageNumElements().

Referenced by x_NumericSearch().

◆ x_SearchIndexNumeric()

CSeqDBIsam::EErrorCode CSeqDBIsam::x_SearchIndexNumeric ( Int8  Number,
int Data,
Uint4 Index,
Int4 SampleNum,
bool done 
)
private

Index file search.

Given a numeric identifier, this routine finds the OID or the page in the data file where the OID can be found.

Parameters
NumberThe GI or PIG identifier to look up.
DataThe returned OID.
IndexThe returned location in the ISAM table, or NULL.
SampleNumThe returned location in the data file if not done.
donetrue if the OID was found.
locked
Returns
A non-zero error on failure, or eNoError on success.

Definition at line 140 of file seqdbisam.cpp.

References _ASSERT, done, eInitFailed, eNoError, eNotFound, eNumericNoData, CSeqDBFileMemMap::GetFileDataPtr(), m_IndexFname, m_IndexLease, m_Initialized, m_KeySampleOffset, m_NumSamples, m_PageSize, m_TermSize, m_Type, NULL, x_GetNumericData(), x_GetNumericKey(), and x_OutOfBounds().

Referenced by x_NumericSearch().

◆ x_SearchNegativeMulti()

void CSeqDBIsam::x_SearchNegativeMulti ( int  vol_start,
int  vol_end,
CSeqDBNegativeList gis,
bool  use_tis 
)
private

Negative ID List Translation.

Given a Negative ID list, this routine turns on the bits for the OIDs found in the volume but not in the negated ID list.

Parameters
vol_startThe starting OID for this ISAM file's database volume.
vol_endThe ending OID for this ISAM file's database volume.
gisThe Negative ID list to translate.
use_tisIterate over TIs if true (GIs otherwise).
lockedThe lock holder object for this thread.

Definition at line 219 of file seqdbisam.cpp.

References _ASSERT, CSeqDBNegativeList::AddIncludedOid(), CSeqDBNegativeList::AddVisibleOid(), eNumericNoData, CSeqDBNegativeList::GetNumGis(), CSeqDBNegativeList::GetNumTis(), i, m_Initialized, m_NumSamples, m_Type, NCBI_THROW, x_FindInNegativeList(), x_GetDataElement(), and x_MapDataPage().

Referenced by IdsToOids().

◆ x_SearchNegativeMultiSeq()

void CSeqDBIsam::x_SearchNegativeMultiSeq ( int  vol_start,
int  vol_end,
CSeqDBNegativeList gis 
)
private

◆ x_SparseStringToOids()

bool CSeqDBIsam::x_SparseStringToOids ( const string acc,
vector< int > &  oids,
bool  adjusted 
)
private

Lookup a string in a sparse table.

This does string lookup in a sparse string table. There is no support (code) for this since there are currently no examples of this kind of table to test against.

Parameters
accThe string to look up.
oidsThe returned oids found by the search.
adjustedWhether the key was changed by the identifier simplification logic.
lockedThe lock holder object for this thread.
Returns
true if results were found

Definition at line 1377 of file seqdbisam.cpp.

References _TROUBLE.

◆ x_StringSearch()

CSeqDBIsam::EErrorCode CSeqDBIsam::x_StringSearch ( const string term_in,
vector< string > &  term_out,
vector< string > &  value_out,
vector< TIndx > &  index_out 
)
private

String identifier lookup.

Given a string identifier, this routine finds the OID(s).

Parameters
term_inThe string identifier to look up.
term_outThe returned keys (as strings).
value_outThe returned oids (as strings).
index_outThe locations where the matches were found.
lockedThe lock holder object for this thread.
Returns
A non-zero error on failure, or eNoError on success.

Definition at line 934 of file seqdbisam.cpp.

References NStr::CompareNocase(), eInitFailed, eNoError, eNotFound, CSeqDBFileMemMap::GetFileDataPtr(), int, m_IndexFileLength, m_IndexLease, m_Initialized, m_KeySampleOffset, m_MaxLineSize, m_NumSamples, m_PageSize, MEMORY_ONLY_PAGE_SIZE, prefix, tolower(), x_DiffSample(), x_ExtractAllData(), x_ExtractPageData(), x_GetIndexKeyOffset(), x_GetIndexString(), x_LoadPage(), and x_OutOfBounds().

Referenced by HashToOids(), and StringToOids().

◆ x_TestNumericSample()

int CSeqDBIsam::x_TestNumericSample ( CSeqDBFileMemMap index_lease,
int  index,
Int8  key_in,
Int8 key_out,
int data_out 
)
inlineprivate

Test a sample key value from a numeric index.

This method reads the key value of an index file sample element from a numeric index file. The calling code should insure that the data is mapped in, and that the file type is correct. The key value found will be compared to the search key. This method will return 0 for an exact match, -1 if the key is less than the sample, or 1 if the key is greater. If the match is exact, it will also return the data in data_out.

Parameters
index_leaseThe memory lease to use with the index file.
indexThe index of the sample to get.
key_inThe key for which the user is searching.
key_outThe key found will be returned here.
data_outIf an exact match, the data found will be returned here.
Returns
-1, 0 or 1 when key_in is less, equal greater than key_out.

Definition at line 1284 of file seqdbisam.hpp.

References CSeqDBFileMemMap::GetFileDataPtr(), m_KeySampleOffset, m_TermSize, x_GetNumericData(), and x_GetNumericKey().

◆ x_TranslateGiList()

template<class T >
void CSeqDBIsam::x_TranslateGiList ( int  vol_start,
CSeqDBGiList gis 
)
inlineprivate

GiList Translation.

Given a GI list, this routine finds the OID for each ID in the list not already having a translation.

Parameters
vol_startThe starting OID for this ISAM file's database volume.
gisThe GI list to translate.
lockedThe lock holder object for this thread.

Definition at line 549 of file seqdbisam.hpp.

References CSeqDBGiList::eGi, CSeqDBGiList::GetKey(), CSeqDBGiList::GetSize(), CSeqDBGiList::InsureOrder(), m_DataLease, m_IndexLease, m_Initialized, m_NumSamples, m_NumTerms, m_PageSize, NCBI_THROW, T, x_LoadData(), and x_LoadIndex().

Member Data Documentation

◆ m_Atlas

CSeqDBAtlas& CSeqDBIsam::m_Atlas
private

The memory management layer.

Definition at line 1180 of file seqdbisam.hpp.

Referenced by x_InitSearch().

◆ m_DataFileLength

TIndx CSeqDBIsam::m_DataFileLength
private

The length of the ISAM data file.

Definition at line 1203 of file seqdbisam.hpp.

Referenced by x_InitSearch().

◆ m_DataFname

string CSeqDBIsam::m_DataFname
private

The filename of the ISAM data file.

Definition at line 1197 of file seqdbisam.hpp.

Referenced by CSeqDBIsam(), x_InitSearch(), x_LoadPage(), x_MapDataPage(), and x_SearchDataNumeric().

◆ m_DataLease

CSeqDBFileMemMap CSeqDBIsam::m_DataLease
private

A persistent lease on the ISAM data file.

Definition at line 1190 of file seqdbisam.hpp.

Referenced by CSeqDBIsam(), UnLease(), x_LoadPage(), x_MapDataPage(), x_SearchDataNumeric(), x_SearchNegativeMultiSeq(), and x_TranslateGiList().

◆ m_FileStart

char* CSeqDBIsam::m_FileStart
private

Pointer to index file if no memmap.

Definition at line 1233 of file seqdbisam.hpp.

◆ m_FirstKey

SIsamKey CSeqDBIsam::m_FirstKey
private

First volume key.

Definition at line 1242 of file seqdbisam.hpp.

Referenced by GetIdBounds(), x_FindIndexBounds(), and x_OutOfBounds().

◆ m_FirstOffset

Int4 CSeqDBIsam::m_FirstOffset
private

First and last offset's of last page.

Definition at line 1236 of file seqdbisam.hpp.

◆ m_IdentType

ESeqDBIdType CSeqDBIsam::m_IdentType
private

The type of identifier this class uses.

Definition at line 1183 of file seqdbisam.hpp.

Referenced by HashToOids(), IdsToOids(), IdToOid(), PigToOid(), and StringToOids().

◆ m_IdxOption

Int4 CSeqDBIsam::m_IdxOption
private

Options set by upper layer.

Definition at line 1221 of file seqdbisam.hpp.

Referenced by x_InitSearch().

◆ m_IndexFileLength

TIndx CSeqDBIsam::m_IndexFileLength
private

The length of the ISAM index file.

Definition at line 1206 of file seqdbisam.hpp.

Referenced by x_DiffSample(), x_InitSearch(), and x_StringSearch().

◆ m_IndexFname

string CSeqDBIsam::m_IndexFname
private

The filename of the ISAM index file.

Definition at line 1200 of file seqdbisam.hpp.

Referenced by CSeqDBIsam(), x_DiffSample(), x_InitSearch(), and x_SearchIndexNumeric().

◆ m_IndexLease

CSeqDBFileMemMap CSeqDBIsam::m_IndexLease
private

◆ m_Initialized

bool CSeqDBIsam::m_Initialized
private

◆ m_KeySampleOffset

TIndx CSeqDBIsam::m_KeySampleOffset
private

◆ m_LastKey

SIsamKey CSeqDBIsam::m_LastKey
private

Last volume key.

Definition at line 1245 of file seqdbisam.hpp.

Referenced by GetIdBounds(), x_FindIndexBounds(), and x_OutOfBounds().

◆ m_LastOffset

Int4 CSeqDBIsam::m_LastOffset
private

First and last offset's of last page.

Definition at line 1239 of file seqdbisam.hpp.

◆ m_LongId

bool CSeqDBIsam::m_LongId
private

Use Uint8 for the key.

Definition at line 1248 of file seqdbisam.hpp.

Referenced by x_GetNumericData(), x_GetNumericKey(), and x_InitSearch().

◆ m_MaxLineSize

Int4 CSeqDBIsam::m_MaxLineSize
private

Maximum string length in the database.

Definition at line 1218 of file seqdbisam.hpp.

Referenced by x_DiffSample(), x_InitSearch(), and x_StringSearch().

◆ m_NumSamples

Int4 CSeqDBIsam::m_NumSamples
private

◆ m_NumTerms

Int4 CSeqDBIsam::m_NumTerms
private

Number of terms in database.

Definition at line 1209 of file seqdbisam.hpp.

Referenced by GetIdBounds(), x_GetPageNumElements(), x_InitSearch(), x_LoadIndex(), x_SearchNegativeMultiSeq(), and x_TranslateGiList().

◆ m_PageSize

Int4 CSeqDBIsam::m_PageSize
private

◆ m_TermSize

int CSeqDBIsam::m_TermSize
private

◆ m_TestNonUnique

bool CSeqDBIsam::m_TestNonUnique
private

Check if data for String ISAM sorted.

Definition at line 1230 of file seqdbisam.hpp.

◆ m_Type

int CSeqDBIsam::m_Type
private

The format type of database files found (eNumeric or eString).

Definition at line 1194 of file seqdbisam.hpp.

Referenced by CSeqDBIsam(), x_FindIndexBounds(), x_InitSearch(), x_OutOfBounds(), x_SearchDataNumeric(), x_SearchIndexNumeric(), and x_SearchNegativeMulti().


The documentation for this class was generated from the following files:
Modified on Thu Nov 30 04:52:58 2023 by modify_doxy.py rev. 669887