NCBI C++ ToolKit
|
Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity. More...
Go to the source code of this file.
Go to the SVN repository for this file.
Classes | |
struct | MatrixData |
structure containing intermediate data to be processed More... | |
Macros | |
#define | TARGET_HIT_PROB 0.98 |
the probability that a random alignment will be found. More... | |
#define | SWAP_MATRIX(a, b) |
swap two matrices by swapping pointers to them More... | |
Typedefs | |
typedef struct MatrixData | MatrixData |
structure containing intermediate data to be processed More... | |
Functions | |
static Int2 | s_MatrixDataInit (MatrixData *m) |
initialize intermediate state. More... | |
static void | s_MatrixDataFree (MatrixData *m) |
Free previously allocated scratch data. More... | |
static Int2 | s_MatrixDataReset (MatrixData *m, Int4 new_word_size, double percent_identity) |
Set up for the next calculation of hit probability. More... | |
static void | s_SetInitialMatrix (double *matrix, Int4 matrix_dim, double identity) |
Loads the initial value for matrix exponentiation. More... | |
static void | s_MatrixMultiply (double *a, double identity, double *prod, Int4 dim) |
Multiply the current exponentiated matrix by the original state transition matrix. More... | |
static void | s_MatrixSquare (double *a, double *prod, Int4 dim) |
Multiply a square matrix by itself. More... | |
static Int2 | s_FindHitProbability (MatrixData *m, Int4 word_size, double min_percent_identity, Int4 min_align_length) |
For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment. More... | |
static Int4 | s_FindWordSize (MatrixData *m, double min_percent_identity, Int4 min_align_length) |
For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability. More... | |
Int4 | BLAST_FindBestNucleotideWordSize (double min_percent_identity, Int4 min_align_length) |
Given a minimum amount of identity and the minimum desired length of nucleotide alignments, find the largest blastn word size that will find random instances of those alignments with high probability. More... | |
Routines that compute a blastn word size appropriate for finding, with high probability, alignments with specified length and percent identity.
Definition in file blast_tune.c.
swap two matrices by swapping pointers to them
Definition at line 250 of file blast_tune.c.
#define TARGET_HIT_PROB 0.98 |
the probability that a random alignment will be found.
Given particulars about the alignment, we will attempt to compute the largest blastn word size that has at least this probability of finding a random alignment
Definition at line 57 of file blast_tune.c.
typedef struct MatrixData MatrixData |
structure containing intermediate data to be processed
|
static |
For fixed word size and alignment properties, compute the probability that blastn with that word size will find a seed within a random alignment.
m | Space for the Markov chain calculation [in][out] |
word_size | The blastn word size [in] |
min_percent_identity | How much identity is expected in random alignments. Less identity means the probability of finding such alignments is decreased [in] |
min_align_length | The smallest alignment length desired. Longer length gives blastn more leeway to find seeds and increases the computed probability that alignments will be found [in] |
Definition at line 270 of file blast_tune.c.
References MatrixData::hit_probability, mask, MatrixData::matrix_dim, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, s_MatrixDataReset(), s_MatrixMultiply(), s_MatrixSquare(), s_SetInitialMatrix(), and SWAP_MATRIX.
Referenced by s_FindWordSize().
|
static |
For specified alignment properties, compute the blastn word size that will cause random alignments with those properties to be found with specified (high) probability.
m | Space for the Markov chain calculation [in][out] |
min_percent_identity | How much identity is expected in random alignments [in] |
min_align_length | The smallest alignment length desired [in] |
Definition at line 330 of file blast_tune.c.
References fabs, MatrixData::hit_probability, MIN, s_FindHitProbability(), and TARGET_HIT_PROB.
Referenced by BLAST_FindBestNucleotideWordSize().
|
static |
Free previously allocated scratch data.
m | pointer to intermediate state [in][out] |
Definition at line 76 of file blast_tune.c.
References NULL, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.
Referenced by BLAST_FindBestNucleotideWordSize().
|
static |
initialize intermediate state.
Note that memory for the matrices gets allocated later.
m | pointer to intermediate state [in][out] |
Definition at line 64 of file blast_tune.c.
References NULL.
Referenced by BLAST_FindBestNucleotideWordSize().
|
static |
Set up for the next calculation of hit probability.
m | Space for the Markov chain calculation [in][out] |
new_word_size | The blastn word size to be used for the current test. The internally generated matrix has dimension one larger than this [in] |
percent_identity | The desired amount of identity in alignments. A fractional number (0...1) [in] |
Definition at line 93 of file blast_tune.c.
References MatrixData::hit_probability, MatrixData::matrix_dim, MatrixData::matrix_dim_alloc, NULL, MatrixData::percent_identity, MatrixData::power_matrix, MatrixData::prod_matrix, and sfree.
Referenced by s_FindHitProbability().
Multiply the current exponentiated matrix by the original state transition matrix.
Since the latter is very sparse and has a regular structure, this operation is essentially instantaneous compared to an ordinary matrix-matrix multiply
a | Matrix to multiply [in] |
identity | The desired amount of identity in alignments. A fractional number (0...1). Note that this is the only information needed to create the state transition matrix, and its structure is sufficiently regular that the matrix can be implicitly used [in] |
prod | space for the matrix product [out] |
dim | The dimension of all matrices [in] |
Definition at line 162 of file blast_tune.c.
Referenced by s_FindHitProbability().
Multiply a square matrix by itself.
a | The matrix [in] |
prod | Space to store the product [out] |
dim | The matrix dimesnion [in] |
Definition at line 211 of file blast_tune.c.
Referenced by s_FindHitProbability().
Loads the initial value for matrix exponentiation.
This is the starting Markov chain described in the reference.
matrix | The matrix to be initialized [in][out] |
matrix_dim | Dimension of the matrix [in] |
identity | The desired amount of identity in alignments. A fractional number (0...1) [in] |
Definition at line 132 of file blast_tune.c.
Referenced by s_FindHitProbability().