NCBI C++ ToolKit

Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity. More...
Go to the source code of this file.
Go to the SVN repository for this file.
Classes  
struct  MatrixData 
structure containing intermediate data to be processed More...  
Macros  
#define  TARGET_HIT_PROB 0.98 
the probability that a random alignment will be found. More...  
#define  SWAP_MATRIX(a, b) 
swap two matrices by swapping pointers to them More...  
Typedefs  
typedef struct MatrixData  MatrixData 
structure containing intermediate data to be processed More...  
Functions  
static Int2  s_MatrixDataInit (MatrixData *m) 
initialize intermediate state. More...  
static void  s_MatrixDataFree (MatrixData *m) 
Free previously allocated scratch data. More...  
static Int2  s_MatrixDataReset (MatrixData *m, Int4 new_word_size, double percent_identity) 
Set up for the next calculation of hit probability. More...  
static void  s_SetInitialMatrix (double *matrix, Int4 matrix_dim, double identity) 
Loads the initial value for matrix exponentiation. More...  
static void  s_MatrixMultiply (double *a, double identity, double *prod, Int4 dim) 
Multiply the current exponentiated matrix by the original state transition matrix. More...  
static void  s_MatrixSquare (double *a, double *prod, Int4 dim) 
Multiply a square matrix by itself. More...  
static Int2  s_FindHitProbability (MatrixData *m, Int4 word_size, double min_percent_identity, Int4 min_align_length) 
For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment. More...  
static Int4  s_FindWordSize (MatrixData *m, double min_percent_identity, Int4 min_align_length) 
For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability. More...  
Int4  BLAST_FindBestNucleotideWordSize (double min_percent_identity, Int4 min_align_length) 
Given a minimum amount of identity and the minimum desired length of nucleotide alignments, find the largest blastn word size that will find random instances of those alignments with high probability. More...  
Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity.
Definition in file blast_tune.c.
swap two matrices by swapping pointers to them
Definition at line 250 of file blast_tune.c.
#define TARGET_HIT_PROB 0.98 
the probability that a random alignment will be found.
Given particulars about the alignment, we will attempt to compute the largest blastn word size that has at least this probability of finding a random alignment
Definition at line 57 of file blast_tune.c.
typedef struct MatrixData MatrixData 
structure containing intermediate data to be processed

static 
For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment.
m  Space for the Markov chain calculation [in][out] 
word_size  The blastn word size [in] 
min_percent_identity  How much identity is expected in random alignments. Less identity means the probability of finding such alignments is decreased [in] 
min_align_length  The smallest alignment length desired. Longer length gives blastn more leeway to find seeds and increases the computed probability that alignments will be found [in] 
Definition at line 270 of file blast_tune.c.
References MatrixData::hit_probability, mask, MatrixData::matrix_dim, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, s_MatrixDataReset(), s_MatrixMultiply(), s_MatrixSquare(), s_SetInitialMatrix(), and SWAP_MATRIX.
Referenced by s_FindWordSize().

static 
For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability.
m  Space for the Markov chain calculation [in][out] 
min_percent_identity  How much identity is expected in random alignments [in] 
min_align_length  The smallest alignment length desired [in] 
Definition at line 330 of file blast_tune.c.
References fabs, MatrixData::hit_probability, MIN, s_FindHitProbability(), and TARGET_HIT_PROB.
Referenced by BLAST_FindBestNucleotideWordSize().

static 
Free previously allocated scratch data.
m  pointer to intermediate state [in][out] 
Definition at line 76 of file blast_tune.c.
References NULL, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.
Referenced by BLAST_FindBestNucleotideWordSize().

static 
initialize intermediate state.
Note that memory for the matrices gets allocated later.
m  pointer to intermediate state [in][out] 
Definition at line 64 of file blast_tune.c.
References NULL.
Referenced by BLAST_FindBestNucleotideWordSize().

static 
Set up for the next calculation of hit probability.
m  Space for the Markov chain calculation [in][out] 
new_word_size  The blastn word size to be used for the current test. The internally generated matrix has dimension one larger than this [in] 
percent_identity  The desired amount of identity in alignments. A fractional number (0...1) [in] 
Definition at line 93 of file blast_tune.c.
References MatrixData::hit_probability, MatrixData::matrix_dim, MatrixData::matrix_dim_alloc, NULL, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.
Referenced by s_FindHitProbability().
Multiply the current exponentiated matrix by the original state transition matrix.
Since the latter is very sparse and has a regular structure, this operation is essentially instantaneous compared to an ordinary matrixmatrix multiply
a  Matrix to multiply [in] 
identity  The desired amount of identity in alignments. A fractional number (0...1). Note that this is the only information needed to create the state transition matrix, and its structure is sufficiently regular that the matrix can be implicitly used [in] 
prod  space for the matrix product [out] 
dim  The dimension of all matrices [in] 
Definition at line 162 of file blast_tune.c.
Referenced by s_FindHitProbability().
Multiply a square matrix by itself.
a  The matrix [in] 
prod  Space to store the product [out] 
dim  The matrix dimesnion [in] 
Definition at line 211 of file blast_tune.c.
Referenced by s_FindHitProbability().
Loads the initial value for matrix exponentiation.
This is the starting Markov chain described in the reference.
matrix  The matrix to be initialized [in][out] 
matrix_dim  Dimension of the matrix [in] 
identity  The desired amount of identity in alignments. A fractional number (0...1) [in] 
Definition at line 132 of file blast_tune.c.
Referenced by s_FindHitProbability().