NCBI C++ ToolKit
Classes | Macros | Typedefs | Functions
blast_tune.c File Reference

Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity. More...

#include <algo/blast/core/blast_util.h>
#include <algo/blast/core/blast_tune.h>
+ Include dependency graph for blast_tune.c:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

struct  MatrixData
 structure containing intermediate data to be processed More...
 

Macros

#define TARGET_HIT_PROB   0.98
 the probability that a random alignment will be found. More...
 
#define SWAP_MATRIX(a, b)
 swap two matrices by swapping pointers to them More...
 

Typedefs

typedef struct MatrixData MatrixData
 structure containing intermediate data to be processed More...
 

Functions

static Int2 s_MatrixDataInit (MatrixData *m)
 initialize intermediate state. More...
 
static void s_MatrixDataFree (MatrixData *m)
 Free previously allocated scratch data. More...
 
static Int2 s_MatrixDataReset (MatrixData *m, Int4 new_word_size, double percent_identity)
 Set up for the next calculation of hit probability. More...
 
static void s_SetInitialMatrix (double *matrix, Int4 matrix_dim, double identity)
 Loads the initial value for matrix exponentiation. More...
 
static void s_MatrixMultiply (double *a, double identity, double *prod, Int4 dim)
 Multiply the current exponentiated matrix by the original state transition matrix. More...
 
static void s_MatrixSquare (double *a, double *prod, Int4 dim)
 Multiply a square matrix by itself. More...
 
static Int2 s_FindHitProbability (MatrixData *m, Int4 word_size, double min_percent_identity, Int4 min_align_length)
 For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment. More...
 
static Int4 s_FindWordSize (MatrixData *m, double min_percent_identity, Int4 min_align_length)
 For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability. More...
 
Int4 BLAST_FindBestNucleotideWordSize (double min_percent_identity, Int4 min_align_length)
 Given a minimum amount of identity and the minimum desired length of nucleotide alignments, find the largest blastn word size that will find random instances of those alignments with high probability. More...
 

Detailed Description

Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity.

Definition in file blast_tune.c.

Macro Definition Documentation

◆ SWAP_MATRIX

#define SWAP_MATRIX (   a,
  b 
)
Value:
{ \
double *tmp = (a); \
(a) = (b); \
(b) = tmp; \
}
static char tmp[3200]
Definition: utf8.c:42
unsigned int a
Definition: ncbi_localip.c:102

swap two matrices by swapping pointers to them

Definition at line 250 of file blast_tune.c.

◆ TARGET_HIT_PROB

#define TARGET_HIT_PROB   0.98

the probability that a random alignment will be found.

Given particulars about the alignment, we will attempt to compute the largest blastn word size that has at least this probability of finding a random alignment

Definition at line 57 of file blast_tune.c.

Typedef Documentation

◆ MatrixData

typedef struct MatrixData MatrixData

structure containing intermediate data to be processed

Function Documentation

◆ s_FindHitProbability()

static Int2 s_FindHitProbability ( MatrixData m,
Int4  word_size,
double  min_percent_identity,
Int4  min_align_length 
)
static

For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment.

Parameters
mSpace for the Markov chain calculation [in][out]
word_sizeThe blastn word size [in]
min_percent_identityHow much identity is expected in random alignments. Less identity means the probability of finding such alignments is decreased [in]
min_align_lengthThe smallest alignment length desired. Longer length gives blastn more leeway to find seeds and increases the computed probability that alignments will be found [in]
Returns
0 if the probability was successfully computed

Definition at line 270 of file blast_tune.c.

References MatrixData::hit_probability, mask, MatrixData::matrix_dim, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, s_MatrixDataReset(), s_MatrixMultiply(), s_MatrixSquare(), s_SetInitialMatrix(), and SWAP_MATRIX.

Referenced by s_FindWordSize().

◆ s_FindWordSize()

static Int4 s_FindWordSize ( MatrixData m,
double  min_percent_identity,
Int4  min_align_length 
)
static

For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability.

Parameters
mSpace for the Markov chain calculation [in][out]
min_percent_identityHow much identity is expected in random alignments [in]
min_align_lengthThe smallest alignment length desired [in]
Returns
The optimal word size, or zero if the optimization process failed

Definition at line 330 of file blast_tune.c.

References fabs, MatrixData::hit_probability, MIN, s_FindHitProbability(), and TARGET_HIT_PROB.

Referenced by BLAST_FindBestNucleotideWordSize().

◆ s_MatrixDataFree()

static void s_MatrixDataFree ( MatrixData m)
static

Free previously allocated scratch data.

Parameters
mpointer to intermediate state [in][out]

Definition at line 76 of file blast_tune.c.

References NULL, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.

Referenced by BLAST_FindBestNucleotideWordSize().

◆ s_MatrixDataInit()

static Int2 s_MatrixDataInit ( MatrixData m)
static

initialize intermediate state.

Note that memory for the matrices gets allocated later.

Parameters
mpointer to intermediate state [in][out]
Returns
-1 if m is NULL, zero otherwise

Definition at line 64 of file blast_tune.c.

References NULL.

Referenced by BLAST_FindBestNucleotideWordSize().

◆ s_MatrixDataReset()

static Int2 s_MatrixDataReset ( MatrixData m,
Int4  new_word_size,
double  percent_identity 
)
static

Set up for the next calculation of hit probability.

Parameters
mSpace for the Markov chain calculation [in][out]
new_word_sizeThe blastn word size to be used for the current test. The internally generated matrix has dimension one larger than this [in]
percent_identityThe desired amount of identity in alignments. A fractional number (0...1) [in]
Returns
0 if successful

Definition at line 93 of file blast_tune.c.

References MatrixData::hit_probability, MatrixData::matrix_dim, MatrixData::matrix_dim_alloc, NULL, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.

Referenced by s_FindHitProbability().

◆ s_MatrixMultiply()

static void s_MatrixMultiply ( double *  a,
double  identity,
double *  prod,
Int4  dim 
)
static

Multiply the current exponentiated matrix by the original state transition matrix.

Since the latter is very sparse and has a regular structure, this operation is essentially instantaneous compared to an ordinary matrix-matrix multiply

Parameters
aMatrix to multiply [in]
identityThe desired amount of identity in alignments. A fractional number (0...1). Note that this is the only information needed to create the state transition matrix, and its structure is sufficiently regular that the matrix can be implicitly used [in]
prodspace for the matrix product [out]
dimThe dimension of all matrices [in]

Definition at line 162 of file blast_tune.c.

References a, and i.

Referenced by s_FindHitProbability().

◆ s_MatrixSquare()

static void s_MatrixSquare ( double *  a,
double *  prod,
Int4  dim 
)
static

Multiply a square matrix by itself.

Parameters
aThe matrix [in]
prodSpace to store the product [out]
dimThe matrix dimesnion [in]

Definition at line 211 of file blast_tune.c.

References a, and i.

Referenced by s_FindHitProbability().

◆ s_SetInitialMatrix()

static void s_SetInitialMatrix ( double *  matrix,
Int4  matrix_dim,
double  identity 
)
static

Loads the initial value for matrix exponentiation.

This is the starting Markov chain described in the reference.

Parameters
matrixThe matrix to be initialized [in][out]
matrix_dimDimension of the matrix [in]
identityThe desired amount of identity in alignments. A fractional number (0...1) [in]

Definition at line 132 of file blast_tune.c.

References i, and row.

Referenced by s_FindHitProbability().

Modified on Fri Sep 20 14:58:20 2024 by modify_doxy.py rev. 669887