NCBI C++ ToolKit
Classes | Macros | Typedefs | Enumerations | Functions
blast_aalookup.h File Reference

Routines for creating protein BLAST lookup tables. More...

#include <algo/blast/core/ncbi_std.h>
#include <algo/blast/core/blast_def.h>
#include <algo/blast/core/blast_lookup.h>
#include <algo/blast/core/blast_options.h>
#include <algo/blast/core/blast_rps.h>
#include <algo/blast/core/blast_stat.h>
+ Include dependency graph for blast_aalookup.h:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

struct  AaLookupBackboneCell
 structure defining one cell of the compacted lookup table More...
 
struct  AaLookupSmallboneCell
 structure defining one cell of the small (i.e., use short) lookup table More...
 
struct  BlastAaLookupTable
 The basic lookup table structure for blastp searches. More...
 
struct  CompressedOverflowCell
 cell in list for holding query offsets More...
 
struct  CompressedMixedOffsets
 "alternative" structure of CompressedLookupBackboneCell storage More...
 
struct  CompressedLookupBackboneCell
 structure for hashtable of indexed query offsets More...
 
struct  BlastCompressedAaLookupTable
 The lookup table structure for protein searches using a compressed alphabet. More...
 
struct  RPSBackboneCell
 structure defining one cell of the RPS lookup table More...
 
struct  RPSBucket
 structure used for bucket sorting offsets retrieved from the RPS blast lookup table. More...
 
struct  BlastRPSLookupTable
 The basic lookup table structure for RPS blast searches. More...
 

Macros

#define AA_HITS_PER_CELL   3
 maximum number of hits in one lookup table cell More...
 
#define COMPRESSED_HITS_PER_BACKBONE_CELL   4
 number of query offsets to store in a backbone cell More...
 
#define COMPRESSED_HITS_CELL_MASK   0x03
 
#define COMPRESSED_HITS_PER_OVERFLOW_CELL   4
 number of query offsets to store in an overflow cell More...
 
#define COMPRESSED_OVERFLOW_CELLS_IN_BANK   209710
 number of cells in one bank of cells More...
 
#define COMPRESSED_OVERFLOW_MAX_BANKS   1024
 The maximum number of banks (usually less than 10 are needed; memory will run out before this is insufficient) More...
 
#define RPS_HITS_PER_CELL   3
 maximum number of hits in an RPS backbone cell; this may be redundant (have the same value as AA_HITS_PER_CELL) but must be separate to guarantee binary compatibility with existing RPS blast databases More...
 
#define RPS_BUCKET_SIZE   2048
 The number of regions into which the concatenated RPS blast database is split via bucket sorting. More...
 

Typedefs

typedef struct AaLookupBackboneCell AaLookupBackboneCell
 structure defining one cell of the compacted lookup table More...
 
typedef struct AaLookupSmallboneCell AaLookupSmallboneCell
 structure defining one cell of the small (i.e., use short) lookup table More...
 
typedef struct BlastAaLookupTable BlastAaLookupTable
 The basic lookup table structure for blastp searches. More...
 
typedef struct CompressedOverflowCell CompressedOverflowCell
 cell in list for holding query offsets More...
 
typedef struct CompressedMixedOffsets CompressedMixedOffsets
 "alternative" structure of CompressedLookupBackboneCell storage More...
 
typedef struct CompressedLookupBackboneCell CompressedLookupBackboneCell
 structure for hashtable of indexed query offsets More...
 
typedef struct BlastCompressedAaLookupTable BlastCompressedAaLookupTable
 The lookup table structure for protein searches using a compressed alphabet. More...
 
typedef struct RPSBackboneCell RPSBackboneCell
 structure defining one cell of the RPS lookup table More...
 
typedef struct RPSBucket RPSBucket
 structure used for bucket sorting offsets retrieved from the RPS blast lookup table. More...
 
typedef struct BlastRPSLookupTable BlastRPSLookupTable
 The basic lookup table structure for RPS blast searches. More...
 

Enumerations

enum  EBoneType { eBackbone = 0 , eSmallbone = 1 }
 types of cells More...
 

Functions

Int4 BlastAaLookupFinalize (BlastAaLookupTable *lookup, EBoneType bone_type)
 Pack the data structures comprising a protein lookup table into their final form. More...
 
Int4 BlastAaLookupTableNew (const LookupTableOptions *opt, BlastAaLookupTable **lut)
 Create a new protein lookup table. More...
 
BlastAaLookupTableBlastAaLookupTableDestruct (BlastAaLookupTable *lookup)
 Free the lookup table. More...
 
void BlastAaLookupIndexQuery (BlastAaLookupTable *lookup, Int4 **matrix, BLAST_SequenceBlk *query, BlastSeqLoc *unmasked_regions, Int4 query_bias)
 Index a protein query. More...
 
Int4 BlastCompressedAaLookupTableNew (BLAST_SequenceBlk *query, BlastSeqLoc *locations, BlastCompressedAaLookupTable **lut, const LookupTableOptions *opt, BlastScoreBlk *sbp)
 Create a new compressed protein lookup table. More...
 
BlastCompressedAaLookupTableBlastCompressedAaLookupTableDestruct (BlastCompressedAaLookupTable *lookup)
 Free the compressed lookup table. More...
 
static NCBI_INLINE Int4 s_ComputeCompressedIndex (Int4 wordsize, const Uint1 *word, Int4 compressed_alphabet_size, Int4 *skip, BlastCompressedAaLookupTable *lookup)
 Convert a word to use a compressed alphabet. More...
 
Int2 RPSLookupTableNew (const BlastRPSInfo *rps_info, BlastRPSLookupTable **lut)
 Create a new RPS blast lookup table. More...
 
BlastRPSLookupTableRPSLookupTableDestruct (BlastRPSLookupTable *lookup)
 Free the lookup table. More...
 

Detailed Description

Routines for creating protein BLAST lookup tables.

Contains definitions and prototypes for the lookup table construction phase of blastp and RPS blast.

Definition in file blast_aalookup.h.

Macro Definition Documentation

◆ AA_HITS_PER_CELL

#define AA_HITS_PER_CELL   3

maximum number of hits in one lookup table cell

Definition at line 50 of file blast_aalookup.h.

◆ COMPRESSED_HITS_CELL_MASK

#define COMPRESSED_HITS_CELL_MASK   0x03

Definition at line 190 of file blast_aalookup.h.

◆ COMPRESSED_HITS_PER_BACKBONE_CELL

#define COMPRESSED_HITS_PER_BACKBONE_CELL   4

number of query offsets to store in a backbone cell

Definition at line 189 of file blast_aalookup.h.

◆ COMPRESSED_HITS_PER_OVERFLOW_CELL

#define COMPRESSED_HITS_PER_OVERFLOW_CELL   4

number of query offsets to store in an overflow cell

Definition at line 193 of file blast_aalookup.h.

◆ COMPRESSED_OVERFLOW_CELLS_IN_BANK

#define COMPRESSED_OVERFLOW_CELLS_IN_BANK   209710

number of cells in one bank of cells

Definition at line 196 of file blast_aalookup.h.

◆ COMPRESSED_OVERFLOW_MAX_BANKS

#define COMPRESSED_OVERFLOW_MAX_BANKS   1024

The maximum number of banks (usually less than 10 are needed; memory will run out before this is insufficient)

Definition at line 200 of file blast_aalookup.h.

◆ RPS_BUCKET_SIZE

#define RPS_BUCKET_SIZE   2048

The number of regions into which the concatenated RPS blast database is split via bucket sorting.

Definition at line 351 of file blast_aalookup.h.

◆ RPS_HITS_PER_CELL

#define RPS_HITS_PER_CELL   3

maximum number of hits in an RPS backbone cell; this may be redundant (have the same value as AA_HITS_PER_CELL) but must be separate to guarantee binary compatibility with existing RPS blast databases

Definition at line 334 of file blast_aalookup.h.

Typedef Documentation

◆ AaLookupBackboneCell

structure defining one cell of the compacted lookup table

◆ AaLookupSmallboneCell

structure defining one cell of the small (i.e., use short) lookup table

◆ BlastAaLookupTable

The basic lookup table structure for blastp searches.

◆ BlastCompressedAaLookupTable

The lookup table structure for protein searches using a compressed alphabet.

◆ BlastRPSLookupTable

The basic lookup table structure for RPS blast searches.

◆ CompressedLookupBackboneCell

structure for hashtable of indexed query offsets

◆ CompressedMixedOffsets

"alternative" structure of CompressedLookupBackboneCell storage

◆ CompressedOverflowCell

cell in list for holding query offsets

◆ RPSBackboneCell

structure defining one cell of the RPS lookup table

◆ RPSBucket

typedef struct RPSBucket RPSBucket

structure used for bucket sorting offsets retrieved from the RPS blast lookup table.

Enumeration Type Documentation

◆ EBoneType

enum EBoneType

types of cells

Enumerator
eBackbone 
eSmallbone 

Definition at line 88 of file blast_aalookup.h.

Function Documentation

◆ BlastAaLookupFinalize()

Int4 BlastAaLookupFinalize ( BlastAaLookupTable lookup,
EBoneType  bone_type 
)

◆ BlastAaLookupIndexQuery()

void BlastAaLookupIndexQuery ( BlastAaLookupTable lookup,
Int4 **  matrix,
BLAST_SequenceBlk query,
BlastSeqLoc unmasked_regions,
Int4  query_bias 
)

Index a protein query.

Parameters
lookupthe lookup table [in/modified]
matrixthe substitution matrix [in]
querythe array of queries to index
unmasked_regionsa BlastSeqLoc* which points to a (list of) integer pair(s) which specify the unmasked region(s) of the query [in]
query_biasnumber added to each offset put into lookup table (only used for RPS blast database creation, otherwise 0) [in]

Definition at line 413 of file blast_aalookup.c.

References ASSERT, location, lookup(), NULL, query, s_AddNeighboringWords(), and s_AddPSSMNeighboringWords().

Referenced by LookupTableWrapInit_MT(), and CMakeProfileDBApp::x_RPSUpdateLookup().

◆ BlastAaLookupTableDestruct()

BlastAaLookupTable* BlastAaLookupTableDestruct ( BlastAaLookupTable lookup)

Free the lookup table.

Parameters
lookupThe lookup table structure to be freed
Returns
NULL

Definition at line 257 of file blast_aalookup.c.

References lookup(), NULL, and sfree.

Referenced by LookupTableWrapFree(), CMakeProfileDBApp::x_RPS_DbClose(), and CMakeProfileDBApp::CRPS_DbInfo::~CRPS_DbInfo().

◆ BlastAaLookupTableNew()

Int4 BlastAaLookupTableNew ( const LookupTableOptions opt,
BlastAaLookupTable **  lut 
)

Create a new protein lookup table.

Parameters
optpointer to lookup table options structure [in]
luthandle to lookup table structure [in/modified]
Returns
0 if successful, nonzero on failure

Definition at line 227 of file blast_aalookup.c.

References ASSERT, BLASTAA_SIZE, calloc(), i, ilog2(), lookup(), NULL, LookupTableOptions::threshold, and LookupTableOptions::word_size.

Referenced by LookupTableWrapInit_MT(), and CMakeProfileDBApp::x_RPSAddFirstSequence().

◆ BlastCompressedAaLookupTableDestruct()

BlastCompressedAaLookupTable* BlastCompressedAaLookupTableDestruct ( BlastCompressedAaLookupTable lookup)

Free the compressed lookup table.

Parameters
lookupThe lookup table structure to be freed
Returns
NULL

Definition at line 1359 of file blast_aalookup.c.

References free(), i, lookup(), NULL, and sfree.

Referenced by LookupTableWrapFree().

◆ BlastCompressedAaLookupTableNew()

Int4 BlastCompressedAaLookupTableNew ( BLAST_SequenceBlk query,
BlastSeqLoc locations,
BlastCompressedAaLookupTable **  lut,
const LookupTableOptions opt,
BlastScoreBlk sbp 
)

Create a new compressed protein lookup table.

Parameters
queryThe query sequence block (if concatenated sequence, the individual strands/sequences must be separated by a sentinel byte)[in]
locationsThe locations to be included in the lookup table, e.g. [0,length-1] for full sequence. NULL means no sequence. [in]
lutPointer to the lookup table to be created [out]
optOptions for lookup table creation [in]
sbppointer to score matrix information [in]
Returns
0 if successful, nonzero on failure

Definition at line 1270 of file blast_aalookup.c.

References ASSERT, BLASTAA_SIZE, calloc(), SCompressedAlphabet::compress_table, COMPRESSED_OVERFLOW_CELLS_IN_BANK, COMPRESSED_OVERFLOW_MAX_BANKS, SBlastScoreMatrix::data, i, iexp(), letter(), lookup(), malloc(), SCompressedAlphabet::matrix, NULL, query, s_CompressedAddNeighboringWords(), s_CompressedLookupFinalize(), SCompressedAlphabetFree(), SCompressedAlphabetNew(), LookupTableOptions::threshold, and LookupTableOptions::word_size.

Referenced by LookupTableWrapInit_MT().

◆ RPSLookupTableDestruct()

BlastRPSLookupTable* RPSLookupTableDestruct ( BlastRPSLookupTable lookup)

Free the lookup table.

Parameters
lookupThe lookup table structure to free; note that the rps_backbone and rps_seq_offsets fields are not freed by this call, since they may refer to memory-mapped arrays
Returns
NULL

Definition at line 212 of file blast_aalookup.c.

References i, lookup(), NULL, and sfree.

Referenced by LookupTableWrapFree().

◆ RPSLookupTableNew()

Int2 RPSLookupTableNew ( const BlastRPSInfo rps_info,
BlastRPSLookupTable **  lut 
)

◆ s_ComputeCompressedIndex()

static NCBI_INLINE Int4 s_ComputeCompressedIndex ( Int4  wordsize,
const Uint1 word,
Int4  compressed_alphabet_size,
Int4 skip,
BlastCompressedAaLookupTable lookup 
)
static

Convert a word to use a compressed alphabet.

The letters in the word are reversed compared to the original order

Parameters
wordsizeNumber of consecutive letters in a word [in]
compressed_alphabet_sizeNumber of letters in compressed alphabet [in]
wordSequence in "regular" AA alphabet [in]
skipIf a letter is encountered that cannot be compressed, the offset from word[] where index computation can begin again [out]
lookupTranslation tables etc [in]
Returns
Index calculated from scratch [out]

Definition at line 304 of file blast_aalookup.h.

References i, and lookup().

Referenced by s_BlastCompressedAaScanSubject(), and s_CompressedLookupAddUnencoded().

Modified on Fri Sep 20 14:57:28 2024 by modify_doxy.py rev. 669887