NCBI C++ ToolKit
|
Routines for creating protein BLAST lookup tables. More...
#include <algo/blast/core/ncbi_std.h>
#include <algo/blast/core/blast_def.h>
#include <algo/blast/core/blast_lookup.h>
#include <algo/blast/core/blast_options.h>
#include <algo/blast/core/blast_rps.h>
#include <algo/blast/core/blast_stat.h>
Go to the source code of this file.
Go to the SVN repository for this file.
Classes | |
struct | AaLookupBackboneCell |
structure defining one cell of the compacted lookup table More... | |
struct | AaLookupSmallboneCell |
structure defining one cell of the small (i.e., use short) lookup table More... | |
struct | BlastAaLookupTable |
The basic lookup table structure for blastp searches. More... | |
struct | CompressedOverflowCell |
cell in list for holding query offsets More... | |
struct | CompressedMixedOffsets |
"alternative" structure of CompressedLookupBackboneCell storage More... | |
struct | CompressedLookupBackboneCell |
structure for hashtable of indexed query offsets More... | |
struct | BlastCompressedAaLookupTable |
The lookup table structure for protein searches using a compressed alphabet. More... | |
struct | RPSBackboneCell |
structure defining one cell of the RPS lookup table More... | |
struct | RPSBucket |
structure used for bucket sorting offsets retrieved from the RPS blast lookup table. More... | |
struct | BlastRPSLookupTable |
The basic lookup table structure for RPS blast searches. More... | |
Macros | |
#define | AA_HITS_PER_CELL 3 |
maximum number of hits in one lookup table cell More... | |
#define | COMPRESSED_HITS_PER_BACKBONE_CELL 4 |
number of query offsets to store in a backbone cell More... | |
#define | COMPRESSED_HITS_CELL_MASK 0x03 |
#define | COMPRESSED_HITS_PER_OVERFLOW_CELL 4 |
number of query offsets to store in an overflow cell More... | |
#define | COMPRESSED_OVERFLOW_CELLS_IN_BANK 209710 |
number of cells in one bank of cells More... | |
#define | COMPRESSED_OVERFLOW_MAX_BANKS 1024 |
The maximum number of banks (usually less than 10 are needed; memory will run out before this is insufficient) More... | |
#define | RPS_HITS_PER_CELL 3 |
maximum number of hits in an RPS backbone cell; this may be redundant (have the same value as AA_HITS_PER_CELL) but must be separate to guarantee binary compatibility with existing RPS blast databases More... | |
#define | RPS_BUCKET_SIZE 2048 |
The number of regions into which the concatenated RPS blast database is split via bucket sorting. More... | |
Typedefs | |
typedef struct AaLookupBackboneCell | AaLookupBackboneCell |
structure defining one cell of the compacted lookup table More... | |
typedef struct AaLookupSmallboneCell | AaLookupSmallboneCell |
structure defining one cell of the small (i.e., use short) lookup table More... | |
typedef struct BlastAaLookupTable | BlastAaLookupTable |
The basic lookup table structure for blastp searches. More... | |
typedef struct CompressedOverflowCell | CompressedOverflowCell |
cell in list for holding query offsets More... | |
typedef struct CompressedMixedOffsets | CompressedMixedOffsets |
"alternative" structure of CompressedLookupBackboneCell storage More... | |
typedef struct CompressedLookupBackboneCell | CompressedLookupBackboneCell |
structure for hashtable of indexed query offsets More... | |
typedef struct BlastCompressedAaLookupTable | BlastCompressedAaLookupTable |
The lookup table structure for protein searches using a compressed alphabet. More... | |
typedef struct RPSBackboneCell | RPSBackboneCell |
structure defining one cell of the RPS lookup table More... | |
typedef struct RPSBucket | RPSBucket |
structure used for bucket sorting offsets retrieved from the RPS blast lookup table. More... | |
typedef struct BlastRPSLookupTable | BlastRPSLookupTable |
The basic lookup table structure for RPS blast searches. More... | |
Enumerations | |
enum | EBoneType { eBackbone = 0 , eSmallbone = 1 } |
types of cells More... | |
Routines for creating protein BLAST lookup tables.
Contains definitions and prototypes for the lookup table construction phase of blastp and RPS blast.
Definition in file blast_aalookup.h.
#define AA_HITS_PER_CELL 3 |
maximum number of hits in one lookup table cell
Definition at line 50 of file blast_aalookup.h.
#define COMPRESSED_HITS_CELL_MASK 0x03 |
Definition at line 190 of file blast_aalookup.h.
#define COMPRESSED_HITS_PER_BACKBONE_CELL 4 |
number of query offsets to store in a backbone cell
Definition at line 189 of file blast_aalookup.h.
#define COMPRESSED_HITS_PER_OVERFLOW_CELL 4 |
number of query offsets to store in an overflow cell
Definition at line 193 of file blast_aalookup.h.
#define COMPRESSED_OVERFLOW_CELLS_IN_BANK 209710 |
number of cells in one bank of cells
Definition at line 196 of file blast_aalookup.h.
#define COMPRESSED_OVERFLOW_MAX_BANKS 1024 |
The maximum number of banks (usually less than 10 are needed; memory will run out before this is insufficient)
Definition at line 200 of file blast_aalookup.h.
#define RPS_BUCKET_SIZE 2048 |
The number of regions into which the concatenated RPS blast database is split via bucket sorting.
Definition at line 351 of file blast_aalookup.h.
#define RPS_HITS_PER_CELL 3 |
maximum number of hits in an RPS backbone cell; this may be redundant (have the same value as AA_HITS_PER_CELL) but must be separate to guarantee binary compatibility with existing RPS blast databases
Definition at line 334 of file blast_aalookup.h.
typedef struct AaLookupBackboneCell AaLookupBackboneCell |
structure defining one cell of the compacted lookup table
typedef struct AaLookupSmallboneCell AaLookupSmallboneCell |
structure defining one cell of the small (i.e., use short) lookup table
typedef struct BlastAaLookupTable BlastAaLookupTable |
The basic lookup table structure for blastp searches.
typedef struct BlastCompressedAaLookupTable BlastCompressedAaLookupTable |
The lookup table structure for protein searches using a compressed alphabet.
typedef struct BlastRPSLookupTable BlastRPSLookupTable |
The basic lookup table structure for RPS blast searches.
typedef struct CompressedLookupBackboneCell CompressedLookupBackboneCell |
structure for hashtable of indexed query offsets
typedef struct CompressedMixedOffsets CompressedMixedOffsets |
"alternative" structure of CompressedLookupBackboneCell storage
typedef struct CompressedOverflowCell CompressedOverflowCell |
cell in list for holding query offsets
typedef struct RPSBackboneCell RPSBackboneCell |
structure defining one cell of the RPS lookup table
structure used for bucket sorting offsets retrieved from the RPS blast lookup table.
enum EBoneType |
Int4 BlastAaLookupFinalize | ( | BlastAaLookupTable * | lookup, |
EBoneType | bone_type | ||
) |
Pack the data structures comprising a protein lookup table into their final form.
lookup | the lookup table [in] |
Definition at line 267 of file blast_aalookup.c.
References AA_HITS_PER_CELL, ASSERT, calloc(), eBackbone, AaLookupBackboneCell::entries, AaLookupSmallboneCell::entries, i, lookup(), NULL, AaLookupBackboneCell::num_used, AaLookupSmallboneCell::num_used, AaLookupBackboneCell::overflow_cursor, AaLookupSmallboneCell::overflow_cursor, AaLookupBackboneCell::payload, AaLookupSmallboneCell::payload, PV_ARRAY_BTS, PV_ARRAY_TYPE, PV_SET, and sfree.
Referenced by LookupTableWrapInit_MT(), and CMakeProfileDBApp::x_RPS_DbClose().
void BlastAaLookupIndexQuery | ( | BlastAaLookupTable * | lookup, |
Int4 ** | matrix, | ||
BLAST_SequenceBlk * | query, | ||
BlastSeqLoc * | unmasked_regions, | ||
Int4 | query_bias | ||
) |
Index a protein query.
lookup | the lookup table [in/modified] |
matrix | the substitution matrix [in] |
query | the array of queries to index |
unmasked_regions | a BlastSeqLoc* which points to a (list of) integer pair(s) which specify the unmasked region(s) of the query [in] |
query_bias | number added to each offset put into lookup table (only used for RPS blast database creation, otherwise 0) [in] |
Definition at line 413 of file blast_aalookup.c.
References ASSERT, location, lookup(), NULL, query, s_AddNeighboringWords(), and s_AddPSSMNeighboringWords().
Referenced by LookupTableWrapInit_MT(), and CMakeProfileDBApp::x_RPSUpdateLookup().
BlastAaLookupTable* BlastAaLookupTableDestruct | ( | BlastAaLookupTable * | lookup | ) |
Free the lookup table.
lookup | The lookup table structure to be freed |
Definition at line 257 of file blast_aalookup.c.
References lookup(), NULL, and sfree.
Referenced by LookupTableWrapFree(), CMakeProfileDBApp::x_RPS_DbClose(), and CMakeProfileDBApp::CRPS_DbInfo::~CRPS_DbInfo().
Int4 BlastAaLookupTableNew | ( | const LookupTableOptions * | opt, |
BlastAaLookupTable ** | lut | ||
) |
Create a new protein lookup table.
opt | pointer to lookup table options structure [in] |
lut | handle to lookup table structure [in/modified] |
Definition at line 227 of file blast_aalookup.c.
References ASSERT, BLASTAA_SIZE, calloc(), i, ilog2(), lookup(), NULL, LookupTableOptions::threshold, and LookupTableOptions::word_size.
Referenced by LookupTableWrapInit_MT(), and CMakeProfileDBApp::x_RPSAddFirstSequence().
BlastCompressedAaLookupTable* BlastCompressedAaLookupTableDestruct | ( | BlastCompressedAaLookupTable * | lookup | ) |
Free the compressed lookup table.
lookup | The lookup table structure to be freed |
Definition at line 1359 of file blast_aalookup.c.
References free(), i, lookup(), NULL, and sfree.
Referenced by LookupTableWrapFree().
Int4 BlastCompressedAaLookupTableNew | ( | BLAST_SequenceBlk * | query, |
BlastSeqLoc * | locations, | ||
BlastCompressedAaLookupTable ** | lut, | ||
const LookupTableOptions * | opt, | ||
BlastScoreBlk * | sbp | ||
) |
Create a new compressed protein lookup table.
query | The query sequence block (if concatenated sequence, the individual strands/sequences must be separated by a sentinel byte)[in] |
locations | The locations to be included in the lookup table, e.g. [0,length-1] for full sequence. NULL means no sequence. [in] |
lut | Pointer to the lookup table to be created [out] |
opt | Options for lookup table creation [in] |
sbp | pointer to score matrix information [in] |
Definition at line 1270 of file blast_aalookup.c.
References ASSERT, BLASTAA_SIZE, calloc(), SCompressedAlphabet::compress_table, COMPRESSED_OVERFLOW_CELLS_IN_BANK, COMPRESSED_OVERFLOW_MAX_BANKS, SBlastScoreMatrix::data, i, iexp(), letter(), lookup(), malloc(), SCompressedAlphabet::matrix, NULL, query, s_CompressedAddNeighboringWords(), s_CompressedLookupFinalize(), SCompressedAlphabetFree(), SCompressedAlphabetNew(), LookupTableOptions::threshold, and LookupTableOptions::word_size.
Referenced by LookupTableWrapInit_MT().
BlastRPSLookupTable* RPSLookupTableDestruct | ( | BlastRPSLookupTable * | lookup | ) |
Free the lookup table.
lookup | The lookup table structure to free; note that the rps_backbone and rps_seq_offsets fields are not freed by this call, since they may refer to memory-mapped arrays |
Definition at line 212 of file blast_aalookup.c.
References i, lookup(), NULL, and sfree.
Referenced by LookupTableWrapFree().
Int2 RPSLookupTableNew | ( | const BlastRPSInfo * | rps_info, |
BlastRPSLookupTable ** | lut | ||
) |
Create a new RPS blast lookup table.
rps_info | pointer to structure with RPS setup information [in] |
lut | handle to lookup table [in/modified] |
Definition at line 124 of file blast_aalookup.c.
References ASSERT, BLAST_WORDSIZE_PROT, calloc(), i, ilog2(), info, lookup(), BlastRPSLookupFileHeader::magic_number, BlastRPSProfileHeader::magic_number, malloc(), NULL, RPSBucket::num_alloc, RPSBucket::num_filled, BlastRPSProfileHeader::num_profiles, RPSBucket::offset_pairs, BlastRPSLookupFileHeader::overflow_hits, PV_ARRAY_BTS, PV_ARRAY_TYPE, PV_SET, RPS_BUCKET_SIZE, RPS_MAGIC_NUM, RPS_MAGIC_NUM_28, BlastRPSLookupFileHeader::start_of_backbone, and BlastRPSProfileHeader::start_offsets.
Referenced by LookupTableWrapInit_MT().
|
static |
Convert a word to use a compressed alphabet.
The letters in the word are reversed compared to the original order
wordsize | Number of consecutive letters in a word [in] |
compressed_alphabet_size | Number of letters in compressed alphabet [in] |
word | Sequence in "regular" AA alphabet [in] |
skip | If a letter is encountered that cannot be compressed, the offset from word[] where index computation can begin again [out] |
lookup | Translation tables etc [in] |
Definition at line 304 of file blast_aalookup.h.
Referenced by s_BlastCompressedAaScanSubject(), and s_CompressedLookupAddUnencoded().