NCBI C++ ToolKit
Functions | Variables
phi_lookup.c File Reference

Functions for accessing the lookup table for PHI-BLAST. More...

#include <algo/blast/core/phi_lookup.h>
#include <algo/blast/core/blast_encoding.h>
#include <algo/blast/core/blast_util.h>
#include "pattern_priv.h"
+ Include dependency graph for phi_lookup.c:

Go to the source code of this file.

Go to the SVN repository for this file.

Functions

static void s_FindPrefixAndSuffixPos (Int4 *S, Int4 mask, Int4 mask2, Uint4 *prefixPos, Uint4 *suffixPos)
 Set up matches for words that encode 4 DNA characters; figure out for each of 256 possible DNA 4-mers, where a prefix matches the pattern and where a suffix matches the pattern. More...
 
static void s_InitDNAPattern (SPHIPatternSearchBlk *pattern_blk)
 Initialize mask and other arrays for DNA patterns. More...
 
static Int4 s_ExpandPattern (Int4 *inputPatternMasked, Uint1 *inputPattern, Int4 length, Int4 maxLength)
 Determine the length of the pattern after it has been expanded for efficient searching. More...
 
static Int4 s_PackPattern (Uint1 *inputPattern, Int4 length)
 Pack the next length bytes of inputPattern into a bit vector where the bit is 1 if and only if the byte is non-0. More...
 
static void s_PackLongPattern (Int4 numPlaces, Uint1 *inputPattern, SPHIPatternSearchBlk *pattern_blk)
 Pack the bit representation of the inputPattern into the array pattern_blk->match_maskL. More...
 
static Int4 s_NumOfOne (Int4 a)
 Return the number of 1 bits in the base 2 representation of a number a. More...
 
static void s_PackVeryLongPattern (Int4 *inputPatternMasked, Int4 numPlacesInPattern, SPHIPatternSearchBlk *pattern_blk)
 Sets up fields in SPHIPatternSearchBlk structure when pattern is very long. More...
 
static SPHIPatternSearchBlks_PatternSearchItemsInit ()
 Allocates the SPHIPatternSearchBlk structure. More...
 
static void s_MakePatternUpperCase (char *pattern_in, char *pattern_out, int length)
 Convert the string representation of a PHIblast pattern to uppercase. More...
 
Int2 SPHIPatternSearchBlkNew (char *pattern_in, Boolean is_dna, BlastScoreBlk *sbp, SPHIPatternSearchBlk **pattern_blk_out, Blast_Message **error_msg)
 Initialize the pattern items structure, serving as a "pseudo" lookup table in a PHI BLAST search. More...
 
SPHIPatternSearchBlkSPHIPatternSearchBlkFree (SPHIPatternSearchBlk *lut)
 Deallocate memory for the PHI BLAST lookup table. More...
 
Int4 PHIBlastScanSubject (const LookupTableWrap *lookup_wrap, const BLAST_SequenceBlk *query_blk, const BLAST_SequenceBlk *subject_blk, Int4 *offset_ptr, BlastOffsetPair *offset_pairs, Int4 array_size)
 Implementation of the ScanSubject function for PHI BLAST. More...
 

Variables

const int kMaskAaAlphabetBits = (1 << BLASTAA_SIZE) - 1
 Masks all bits corresponding to the aminoacid alphabet, i.e. More...
 

Detailed Description

Functions for accessing the lookup table for PHI-BLAST.

Definition in file phi_lookup.c.

Function Documentation

◆ PHIBlastScanSubject()

Int4 PHIBlastScanSubject ( const LookupTableWrap lookup_wrap,
const BLAST_SequenceBlk query_blk,
const BLAST_SequenceBlk subject_blk,
Int4 offset_ptr,
BlastOffsetPair offset_pairs,
Int4  array_size 
)

Implementation of the ScanSubject function for PHI BLAST.

Scans the subject sequence from "offset" to the end of the sequence.

Parameters
lookup_wrapPHI BLAST lookup table [in]
query_blkQuery sequence [in]
subject_blkSubject sequence [in]
offset_ptrNext offset in subject - set to end of sequence [out]
offset_pairsStarts and stops for pattern occurrences in subject [out]
array_sizeNot used.
Returns
Number of pattern occurrences found.

Definition at line 725 of file phi_lookup.c.

References ASSERT, ePhiLookupTable, ePhiNaLookupTable, FindPatternHits(), BLAST_SequenceBlk::length, LookupTableWrap::lut, LookupTableWrap::lut_type, PHI_MAX_HIT, BLAST_SequenceBlk::sequence, and subject.

Referenced by BOOST_AUTO_TEST_CASE(), PHIBlastWordFinder(), and CSeedTop::Run().

◆ s_ExpandPattern()

static Int4 s_ExpandPattern ( Int4 inputPatternMasked,
Uint1 inputPattern,
Int4  length,
Int4  maxLength 
)
static

Determine the length of the pattern after it has been expanded for efficient searching.

The expansion process concatenates all the patterns formed by enumerating every combination of variable- size regions. For example, A-x(2,5)-B-C expands to a concatenation of patterns of length 5, 6, 7 and 8. If the sum of concatenated pattern lengths exceeds PHI_MAX_PATTERN_LENGTH, the pattern is treated as very long.

Parameters
inputPatternMaskedMasked input pattern [in]
inputPatternInput pattern [in]
lengthLength of inputPattern [in]
maxLengthLimit on how long inputPattern can get [in]
Returns
the length of the expanded pattern, or -1 if the pattern is treated as very long

Definition at line 141 of file phi_lookup.c.

References i, kMaskAaAlphabetBits, PHI_MAX_PATTERN_LENGTH, and t.

Referenced by SPHIPatternSearchBlkNew().

◆ s_FindPrefixAndSuffixPos()

static void s_FindPrefixAndSuffixPos ( Int4 S,
Int4  mask,
Int4  mask2,
Uint4 prefixPos,
Uint4 suffixPos 
)
static

Set up matches for words that encode 4 DNA characters; figure out for each of 256 possible DNA 4-mers, where a prefix matches the pattern and where a suffix matches the pattern.

Masks are used to do the calculations with bit arithmetic.

Parameters
SArray of words [in]
maskHas 1 bits for whatever lengths of string the pattern can match [in]
mask2Has 4 1 bits corresponding to the last 4 positions of a match [in]
prefixPosSaved prefix position [out]
suffixPosSaved suffix position [out]

Definition at line 56 of file phi_lookup.c.

References i, mask, NCBI2NA_UNPACK_BASE, PHI_ASCII_SIZE, S, and tmp.

Referenced by s_InitDNAPattern().

◆ s_InitDNAPattern()

static void s_InitDNAPattern ( SPHIPatternSearchBlk pattern_blk)
static

◆ s_MakePatternUpperCase()

static void s_MakePatternUpperCase ( char *  pattern_in,
char *  pattern_out,
int  length 
)
static

Convert the string representation of a PHIblast pattern to uppercase.

Parameters
pattern_inThe input patter [in]
pattern_outThe converted pattern [out]
lengthLength of the pattern [in]

Definition at line 370 of file phi_lookup.c.

References ASSERT, and toupper().

Referenced by SPHIPatternSearchBlkNew().

◆ s_NumOfOne()

static Int4 s_NumOfOne ( Int4  a)
static

Return the number of 1 bits in the base 2 representation of a number a.

Parameters
aValue to count bits in [in]

Definition at line 268 of file phi_lookup.c.

References a.

Referenced by s_PackVeryLongPattern().

◆ s_PackLongPattern()

static void s_PackLongPattern ( Int4  numPlaces,
Uint1 inputPattern,
SPHIPatternSearchBlk pattern_blk 
)
static

Pack the bit representation of the inputPattern into the array pattern_blk->match_maskL.

Also packs pattern_blk->bitPatternByLetter.

Parameters
numPlacesNumber of positions in inputPattern [in]
inputPatternInput pattern [in]
pattern_blkThe structure containing pattern search information. [in] [out]

Definition at line 231 of file phi_lookup.c.

References SLongPatternItems::bitPatternByLetter, BLASTAA_SIZE, i, SLongPatternItems::inputPatternMasked, SLongPatternItems::match_maskL, SPHIPatternSearchBlk::multi_word_items, SLongPatternItems::numWords, and PHI_BITS_PACKED_PER_WORD.

Referenced by SPHIPatternSearchBlkNew().

◆ s_PackPattern()

static Int4 s_PackPattern ( Uint1 inputPattern,
Int4  length 
)
static

Pack the next length bytes of inputPattern into a bit vector where the bit is 1 if and only if the byte is non-0.

Parameters
inputPatternInput pattern [in]
lengthHow many bytes to pack? [in]
Returns
packed bit vector.

Definition at line 211 of file phi_lookup.c.

References i.

Referenced by SPHIPatternSearchBlkNew().

◆ s_PackVeryLongPattern()

static void s_PackVeryLongPattern ( Int4 inputPatternMasked,
Int4  numPlacesInPattern,
SPHIPatternSearchBlk pattern_blk 
)
static

Sets up fields in SPHIPatternSearchBlk structure when pattern is very long.

Parameters
inputPatternMaskedArray of pattern bit masks [in]
numPlacesInPatternNumber of bit masks for the pattern [in]
pattern_blkStructure to do the setup for [in] [out]

Definition at line 286 of file phi_lookup.c.

References BLASTAA_SIZE, calloc(), SLongPatternItems::extra_long_items, SLongPatternItems::match_maskL, SPHIPatternSearchBlk::multi_word_items, SExtraLongPatternItems::numPlacesInWord, SLongPatternItems::numWords, PHI_BITS_PACKED_PER_WORD, s_NumOfOne(), SLongPatternItems::SLL, SExtraLongPatternItems::spacing, and SExtraLongPatternItems::whichMostSpecific.

Referenced by SPHIPatternSearchBlkNew().

◆ s_PatternSearchItemsInit()

static SPHIPatternSearchBlk* s_PatternSearchItemsInit ( )
static

◆ SPHIPatternSearchBlkFree()

SPHIPatternSearchBlk* SPHIPatternSearchBlkFree ( SPHIPatternSearchBlk pattern_blk)

◆ SPHIPatternSearchBlkNew()

Int2 SPHIPatternSearchBlkNew ( char *  pattern,
Boolean  is_dna,
BlastScoreBlk sbp,
SPHIPatternSearchBlk **  pattern_blk,
Blast_Message **  error_msg 
)

Variable Documentation

◆ kMaskAaAlphabetBits

const int kMaskAaAlphabetBits = (1 << BLASTAA_SIZE) - 1

Masks all bits corresponding to the aminoacid alphabet, i.e.

the first 26 bits of an integer number.

Definition at line 41 of file phi_lookup.c.

Referenced by s_ExpandPattern(), s_PHIBlastAlignPatterns(), and SPHIPatternSearchBlkNew().

Modified on Thu Apr 11 15:04:13 2024 by modify_doxy.py rev. 669887