NCBI C++ ToolKit
Public Member Functions | Private Member Functions | Private Attributes | List of all members
CGeneInfoFileReader Class Reference

Search Toolkit Book for CGeneInfoFileReader

CGeneInfoFileReader. More...

#include <objtools/blast/gene_info_reader/gene_info_reader.hpp>

+ Inheritance diagram for CGeneInfoFileReader:
+ Collaboration diagram for CGeneInfoFileReader:

Public Member Functions

 CGeneInfoFileReader (const string &strGi2GeneFile, const string &strGene2OffsetFile, const string &strGi2OffsetFile, const string &strAllGeneDataFile, const string &strGene2GiFile, bool bGiToOffsetLookup=true)
 Construct using direct paths. More...
 
 CGeneInfoFileReader (bool bGiToOffsetLookup=true)
 Construct using paths read from an environment variable. More...
 
virtual ~CGeneInfoFileReader ()
 Destructor. More...
 
virtual bool GetGeneIdsForGi (TGi gi, TGeneIdList &geneIdList)
 GetGeneIdsForGi implementation, see IGeneInfoInput. More...
 
virtual bool GetRNAGisForGeneId (int geneId, TGiList &giList)
 GetRNAGisForGeneId implementation, see IGeneInfoInput. More...
 
virtual bool GetProteinGisForGeneId (int geneId, TGiList &giList)
 GetProteinGisForGeneId implementation, see IGeneInfoInput. More...
 
virtual bool GetGenomicGisForGeneId (int geneId, TGiList &giList)
 GetGenomicGisForGeneId implementation, see IGeneInfoInput. More...
 
virtual bool GetGeneInfoForGi (TGi gi, TGeneInfoList &infoList)
 GetGeneInfoForGi implementation, see IGeneInfoInput. More...
 
virtual bool GetGeneInfoForId (int geneId, TGeneInfoList &infoList)
 GetGeneInfoForId implementation, see IGeneInfoInput. More...
 
- Public Member Functions inherited from IGeneInfoInput
virtual ~IGeneInfoInput ()
 Destructor. More...
 

Private Member Functions

void x_MapMemFiles ()
 Memory-map all the files. More...
 
void x_UnmapMemFiles ()
 Unmap all the memory-mapped files. More...
 
bool x_GiToGeneId (TGi gi, list< int > &listGeneIds)
 Fill the Gene ID list given a Gi. More...
 
bool x_GeneIdToOffset (int geneId, int &nOffset)
 Set the offset value given a Gene ID. More...
 
bool x_GiToOffset (TGi gi, list< int > &listOffsets)
 Set the offset value given a Gi. More...
 
bool x_GeneIdToGi (int geneId, int iGiField, list< TGi > &listGis)
 Fill the Gi list given a Gene ID, and the Gi field index, which represents the Gi type to be read from the file. More...
 
bool x_OffsetToInfo (int nOffset, CRef< CGeneInfo > &info)
 Read Gene data at the given offset and create the info object. More...
 

Private Attributes

string m_strGi2GeneFile
 Path to the Gi to Gene ID file. More...
 
string m_strGene2OffsetFile
 Path to the Gene ID to Offset file. More...
 
string m_strGi2OffsetFile
 Path to the Gi to Offset file. More...
 
string m_strGene2GiFile
 Path to the Gene ID to Gi file. More...
 
string m_strAllGeneDataFile
 Path to the file containing all the Gene data. More...
 
bool m_bGiToOffsetLookup
 Perform Gi to Offset lookups directly. More...
 
unique_ptr< CMemoryFilem_memGi2GeneFile
 Memory-mapped Gi to Gene ID file. More...
 
unique_ptr< CMemoryFilem_memGene2OffsetFile
 Memory-mapped Gene ID to Offset file. More...
 
unique_ptr< CMemoryFilem_memGi2OffsetFile
 Memory-mapped Gi to Offset file. More...
 
unique_ptr< CMemoryFilem_memGene2GiFile
 Memory-mapped Gene ID to Gi file. More...
 
CNcbiIfstream m_inAllData
 Input stream for the Gene data file. More...
 
TGeneIdToGeneInfoMap m_mapIdToInfo
 Cached map of looked up Gene Info objects. More...
 

Additional Inherited Members

- Public Types inherited from IGeneInfoInput
typedef list< TGiTGiList
 List of Gis. More...
 
typedef list< intTGeneIdList
 List of Gene IDs. More...
 
typedef map< int, CRef< CGeneInfo > > TGeneIdToGeneInfoMap
 Gene ID to Gene Information map. More...
 
typedef vector< CRef< CGeneInfo > > TGeneInfoList
 List of Gene Information objects. More...
 
- Static Public Member Functions inherited from CGeneFileUtils
static bool CheckDirExistence (const string &strDir)
 Check if a directory exists, given its name. More...
 
static bool CheckExistence (const string &strFile)
 Check if a file exists, given its name. More...
 
static Int8 GetLength (const string &strFile)
 Get the length of a file, given its name. More...
 
static bool OpenTextInputFile (const string &strFileName, CNcbiIfstream &in)
 Open the given text file for reading. More...
 
static bool OpenBinaryInputFile (const string &strFileName, CNcbiIfstream &in)
 Open the given binary file for reading. More...
 
static bool OpenTextOutputFile (const string &strFileName, CNcbiOfstream &out)
 Open the given text file for writing. More...
 
static bool OpenBinaryOutputFile (const string &strFileName, CNcbiOfstream &out)
 Open the given binary file for writing. More...
 
static void WriteRecord (CNcbiOfstream &out, STwoIntRecord &record)
 Write a pair of integers to the file. More...
 
static void ReadRecord (CNcbiIfstream &in, STwoIntRecord &record)
 Read a pair of integers from the file. More...
 
template<int k_nFields>
static void WriteRecord (CNcbiOfstream &out, SMultiIntRecord< k_nFields > &record)
 Write an n-tuple of integers to the file. More...
 
template<int k_nFields>
static void ReadRecord (CNcbiIfstream &in, SMultiIntRecord< k_nFields > &record)
 Read an n-tuple of integers from the file. More...
 
static void WriteGeneInfo (CNcbiOfstream &out, CRef< CGeneInfo > info, int &nCurrentOffset)
 Write a Gene info object to the file. More...
 
static void ReadGeneInfo (CNcbiIfstream &in, int nOffset, CRef< CGeneInfo > &info)
 Read a Gene info object from the file. More...
 

Detailed Description

CGeneInfoFileReader.

Class implementing the IGeneInfoInput interface using binary files.

CGeneInfoFileReader reads and memory-maps sorted binary files for fast Gi to Gene ID, Gene ID to Gene Info, Gi to Gene Info, and Gene ID to Gi conversions. The Gene Info lookup is represented by two files, one contains (Gi, Offset) or (Gene ID, Offset) pairs, the other one contains all the Gene data. The lookup is performed in two steps: first, the offset to the Gene data is obtained, then the Gene data line is read, parsed, and the corresponding CGeneInfo object is constructed. The paths to the pre-computed and sorted files are either provided directly to the constructor, or the class attempts to read them from a path stored in an environment variable (the preferred approach).

Definition at line 85 of file gene_info_reader.hpp.

Constructor & Destructor Documentation

◆ CGeneInfoFileReader() [1/2]

CGeneInfoFileReader::CGeneInfoFileReader ( const string strGi2GeneFile,
const string strGene2OffsetFile,
const string strGi2OffsetFile,
const string strAllGeneDataFile,
const string strGene2GiFile,
bool  bGiToOffsetLookup = true 
)

Construct using direct paths.

This version of the constructor takes the paths to the pre-computed binary files and attempts to open and map the files.

Parameters
strGi2GeneFilePath to the Gi to Gene ID file
strGene2OffsetFilePath to the Gene ID to Offset file.
strGi2OffsetFilePath to the Gi to Offset file.
strAllGeneDataFilePath to the Gene data file.
strGene2GiFilePath to the Gene ID to Gi file.
bGiToOffsetLookupPerform Gi to Offset lookups directly.

Definition at line 401 of file gene_info_reader.cpp.

References m_inAllData, m_strAllGeneDataFile, NCBI_THROW, CGeneFileUtils::OpenBinaryInputFile(), and x_MapMemFiles().

◆ CGeneInfoFileReader() [2/2]

CGeneInfoFileReader::CGeneInfoFileReader ( bool  bGiToOffsetLookup = true)

Construct using paths read from an environment variable.

This version of the constructor reads the paths to the pre-computed binary files from an environment variable and attempts to open and map the files.

Parameters
bGiToOffsetLookupPerform Gi to Offset lookups directly.

Definition at line 467 of file gene_info_reader.cpp.

References CDirEntry::AddTrailingPathSeparator(), CGeneFileUtils::CheckDirExistence(), GENE_ALL_GENE_DATA_FILE_NAME, GENE_GENE2GI_FILE_NAME, GENE_GENE2OFFSET_FILE_NAME, GENE_GI2GENE_FILE_NAME, GENE_GI2OFFSET_FILE_NAME, m_inAllData, m_strAllGeneDataFile, m_strGene2GiFile, m_strGene2OffsetFile, m_strGi2GeneFile, m_strGi2OffsetFile, NCBI_THROW, CGeneFileUtils::OpenBinaryInputFile(), s_FindPathToGeneInfoFiles(), and x_MapMemFiles().

◆ ~CGeneInfoFileReader()

CGeneInfoFileReader::~CGeneInfoFileReader ( )
virtual

Destructor.

Definition at line 497 of file gene_info_reader.cpp.

References x_UnmapMemFiles().

Member Function Documentation

◆ GetGeneIdsForGi()

bool CGeneInfoFileReader::GetGeneIdsForGi ( TGi  gi,
TGeneIdList geneIdList 
)
virtual

GetGeneIdsForGi implementation, see IGeneInfoInput.

Implements IGeneInfoInput.

Definition at line 502 of file gene_info_reader.cpp.

References x_GiToGeneId().

Referenced by CReadFilesApp::Run(), and s_CheckGiToGeneConsistency().

◆ GetGeneInfoForGi()

bool CGeneInfoFileReader::GetGeneInfoForGi ( TGi  gi,
TGeneInfoList infoList 
)
virtual

◆ GetGeneInfoForId()

bool CGeneInfoFileReader::GetGeneInfoForId ( int  geneId,
TGeneInfoList infoList 
)
virtual

◆ GetGenomicGisForGeneId()

bool CGeneInfoFileReader::GetGenomicGisForGeneId ( int  geneId,
TGiList giList 
)
virtual

GetGenomicGisForGeneId implementation, see IGeneInfoInput.

Implements IGeneInfoInput.

Definition at line 520 of file gene_info_reader.cpp.

References k_iGenomicGiField, and x_GeneIdToGi().

Referenced by CReadFilesApp::Run(), and s_CheckGiToGeneConsistency().

◆ GetProteinGisForGeneId()

bool CGeneInfoFileReader::GetProteinGisForGeneId ( int  geneId,
TGiList giList 
)
virtual

GetProteinGisForGeneId implementation, see IGeneInfoInput.

Implements IGeneInfoInput.

Definition at line 514 of file gene_info_reader.cpp.

References k_iProteinGiField, and x_GeneIdToGi().

Referenced by CReadFilesApp::Run(), and s_CheckGiToGeneConsistency().

◆ GetRNAGisForGeneId()

bool CGeneInfoFileReader::GetRNAGisForGeneId ( int  geneId,
TGiList giList 
)
virtual

GetRNAGisForGeneId implementation, see IGeneInfoInput.

Implements IGeneInfoInput.

Definition at line 508 of file gene_info_reader.cpp.

References k_iRNAGiField, and x_GeneIdToGi().

Referenced by CReadFilesApp::Run(), and s_CheckGiToGeneConsistency().

◆ x_GeneIdToGi()

bool CGeneInfoFileReader::x_GeneIdToGi ( int  geneId,
int  iGiField,
list< TGi > &  listGis 
)
private

Fill the Gi list given a Gene ID, and the Gi field index, which represents the Gi type to be read from the file.

Definition at line 369 of file gene_info_reader.cpp.

References m_memGene2GiFile, NCBI_THROW, s_GetMemFilePtrAndLength(), and s_SearchSortedArrayGis().

Referenced by GetGenomicGisForGeneId(), GetProteinGisForGeneId(), and GetRNAGisForGeneId().

◆ x_GeneIdToOffset()

bool CGeneInfoFileReader::x_GeneIdToOffset ( int  geneId,
int nOffset 
)
private

Set the offset value given a Gene ID.

Definition at line 317 of file gene_info_reader.cpp.

References m_memGene2OffsetFile, NCBI_THROW, s_GetField(), s_GetMemFilePtrAndLength(), and s_SearchSortedArray().

Referenced by GetGeneInfoForId().

◆ x_GiToGeneId()

bool CGeneInfoFileReader::x_GiToGeneId ( TGi  gi,
list< int > &  listGeneIds 
)
private

Fill the Gene ID list given a Gi.

Definition at line 296 of file gene_info_reader.cpp.

References GI_TO, m_memGi2GeneFile, NCBI_THROW, s_GetMemFilePtrAndLength(), and s_SearchSortedArray().

Referenced by GetGeneIdsForGi(), and GetGeneInfoForGi().

◆ x_GiToOffset()

bool CGeneInfoFileReader::x_GiToOffset ( TGi  gi,
list< int > &  listOffsets 
)
private

Set the offset value given a Gi.

Definition at line 342 of file gene_info_reader.cpp.

References GI_TO, m_bGiToOffsetLookup, m_memGi2OffsetFile, NCBI_THROW, s_GetMemFilePtrAndLength(), and s_SearchSortedArray().

Referenced by GetGeneInfoForGi().

◆ x_MapMemFiles()

void CGeneInfoFileReader::x_MapMemFiles ( )
private

◆ x_OffsetToInfo()

bool CGeneInfoFileReader::x_OffsetToInfo ( int  nOffset,
CRef< CGeneInfo > &  info 
)
private

Read Gene data at the given offset and create the info object.

Definition at line 392 of file gene_info_reader.cpp.

References info, m_inAllData, and CGeneFileUtils::ReadGeneInfo().

Referenced by GetGeneInfoForGi(), and GetGeneInfoForId().

◆ x_UnmapMemFiles()

void CGeneInfoFileReader::x_UnmapMemFiles ( )
private

Unmap all the memory-mapped files.

Definition at line 281 of file gene_info_reader.cpp.

References m_memGene2GiFile, m_memGene2OffsetFile, m_memGi2GeneFile, and m_memGi2OffsetFile.

Referenced by ~CGeneInfoFileReader().

Member Data Documentation

◆ m_bGiToOffsetLookup

bool CGeneInfoFileReader::m_bGiToOffsetLookup
private

Perform Gi to Offset lookups directly.

Definition at line 105 of file gene_info_reader.hpp.

Referenced by GetGeneInfoForGi(), x_GiToOffset(), and x_MapMemFiles().

◆ m_inAllData

CNcbiIfstream CGeneInfoFileReader::m_inAllData
private

Input stream for the Gene data file.

Definition at line 120 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader(), and x_OffsetToInfo().

◆ m_mapIdToInfo

TGeneIdToGeneInfoMap CGeneInfoFileReader::m_mapIdToInfo
private

Cached map of looked up Gene Info objects.

Definition at line 123 of file gene_info_reader.hpp.

Referenced by GetGeneInfoForId().

◆ m_memGene2GiFile

unique_ptr<CMemoryFile> CGeneInfoFileReader::m_memGene2GiFile
private

Memory-mapped Gene ID to Gi file.

Definition at line 117 of file gene_info_reader.hpp.

Referenced by x_GeneIdToGi(), x_MapMemFiles(), and x_UnmapMemFiles().

◆ m_memGene2OffsetFile

unique_ptr<CMemoryFile> CGeneInfoFileReader::m_memGene2OffsetFile
private

Memory-mapped Gene ID to Offset file.

Definition at line 111 of file gene_info_reader.hpp.

Referenced by x_GeneIdToOffset(), x_MapMemFiles(), and x_UnmapMemFiles().

◆ m_memGi2GeneFile

unique_ptr<CMemoryFile> CGeneInfoFileReader::m_memGi2GeneFile
private

Memory-mapped Gi to Gene ID file.

Definition at line 108 of file gene_info_reader.hpp.

Referenced by x_GiToGeneId(), x_MapMemFiles(), and x_UnmapMemFiles().

◆ m_memGi2OffsetFile

unique_ptr<CMemoryFile> CGeneInfoFileReader::m_memGi2OffsetFile
private

Memory-mapped Gi to Offset file.

Definition at line 114 of file gene_info_reader.hpp.

Referenced by x_GiToOffset(), x_MapMemFiles(), and x_UnmapMemFiles().

◆ m_strAllGeneDataFile

string CGeneInfoFileReader::m_strAllGeneDataFile
private

Path to the file containing all the Gene data.

Definition at line 102 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader().

◆ m_strGene2GiFile

string CGeneInfoFileReader::m_strGene2GiFile
private

Path to the Gene ID to Gi file.

Definition at line 99 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader(), and x_MapMemFiles().

◆ m_strGene2OffsetFile

string CGeneInfoFileReader::m_strGene2OffsetFile
private

Path to the Gene ID to Offset file.

Definition at line 93 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader(), and x_MapMemFiles().

◆ m_strGi2GeneFile

string CGeneInfoFileReader::m_strGi2GeneFile
private

Path to the Gi to Gene ID file.

Definition at line 90 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader(), and x_MapMemFiles().

◆ m_strGi2OffsetFile

string CGeneInfoFileReader::m_strGi2OffsetFile
private

Path to the Gi to Offset file.

Definition at line 96 of file gene_info_reader.hpp.

Referenced by CGeneInfoFileReader(), and x_MapMemFiles().


The documentation for this class was generated from the following files:
Modified on Mon Feb 26 04:02:37 2024 by modify_doxy.py rev. 669887