NCBI C++ ToolKit
Classes | Macros | Typedefs | Enumerations | Functions | Variables
blast_psi_priv.c File Reference

Defintions for functions in private interface for Position Iterated BLAST API. More...

#include "blast_psi_priv.h"
#include "blast_posit.h"
#include <algo/blast/core/ncbi_math.h>
#include <algo/blast/core/blast_util.h>
#include <algo/blast/composition_adjustment/composition_constants.h>
#include "blast_dynarray.h"
#include <algo/blast/composition_adjustment/matrix_frequency_data.h>
+ Include dependency graph for blast_psi_priv.c:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

struct  _PSIAlignmentTraits
 Auxiliary structure to maintain information about two aligned regions between the query and a subject sequence. More...
 

Macros

#define DEFINE_COPY_MATRIX_FUNCTION(type)
 Implements the generic copy matrix functions. More...
 
#define EFFECTIVE_ALPHABET   20
 size of alphabet used for pseudocounts calculations More...
 
#define SEQUENCE_WEIGHTS_CHECK__ABORT_ON_FAILURE   0
 The following define enables/disables the _PSICheckSequenceWeights function's abort statement in the case when the sequence weights are not being checked. More...
 
#define MAX_IND_OBSERVATIONS   400
 max number of independent observation for pseudocount calculation More...
 
#define PSEUDO_MAX   1000000
 effective infinity More...
 

Typedefs

typedef enum _EPSIPurgeFsmState _EPSIPurgeFsmState
 Defines the states of the finite state machine used in s_PSIPurgeSimilarAlignments. More...
 
typedef struct _PSIAlignmentTraits _PSIAlignmentTraits
 Auxiliary structure to maintain information about two aligned regions between the query and a subject sequence. More...
 

Enumerations

enum  _EPSIPurgeFsmState { eCounting , eResting }
 Defines the states of the finite state machine used in s_PSIPurgeSimilarAlignments. More...
 

Functions

void ** _PSIAllocateMatrix (unsigned int ncols, unsigned int nrows, unsigned int data_type_sz)
 Generic 2 dimensional matrix allocator. More...
 
void ** _PSIDeallocateMatrix (void **matrix, unsigned int ncols)
 Generic 2 dimensional matrix deallocator. More...
 
void _PSICopyMatrix_int (int **dest, int **src, unsigned int ncols, unsigned int nrows)
 Copies src matrix into dest matrix, both of which must be int matrices with dimensions ncols by nrows. More...
 
void _PSICopyMatrix_double (double **dest, double **src, unsigned int ncols, unsigned int nrows)
 Copies src matrix into dest matrix, both of which must be double matrices with dimensions ncols by nrows. More...
 
_PSIPackedMsa_PSIPackedMsaNew (const PSIMsa *msa)
 Allocates and initializes the compact version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine. More...
 
_PSIPackedMsa_PSIPackedMsaFree (_PSIPackedMsa *msa)
 Deallocates the _PSIMsa data structure. More...
 
unsigned int _PSIPackedMsaGetNumberOfAlignedSeqs (const _PSIPackedMsa *msa)
 Retrieve the number of aligned sequences in the compact multiple sequence alignment. More...
 
_PSIMsa_PSIMsaNew (const _PSIPackedMsa *msa, Uint4 alphabet_size)
 Allocates and initializes the internal version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine. More...
 
_PSIMsa_PSIMsaFree (_PSIMsa *msa)
 Deallocates the _PSIMsa data structure. More...
 
_PSIInternalPssmData_PSIInternalPssmDataNew (Uint4 query_length, Uint4 alphabet_size)
 Allocates a new _PSIInternalPssmData structure. More...
 
_PSIInternalPssmData_PSIInternalPssmDataFree (_PSIInternalPssmData *pssm_data)
 Deallocates the _PSIInternalPssmData structure. More...
 
_PSIAlignedBlock_PSIAlignedBlockNew (Uint4 query_length)
 Allocates and initializes the _PSIAlignedBlock structure. More...
 
_PSIAlignedBlock_PSIAlignedBlockFree (_PSIAlignedBlock *aligned_blocks)
 Deallocates the _PSIAlignedBlock structure. More...
 
_PSISequenceWeights_PSISequenceWeightsNew (const PSIMsaDimensions *dimensions, const BlastScoreBlk *sbp)
 Allocates and initializes the _PSISequenceWeights structure. More...
 
_PSISequenceWeights_PSISequenceWeightsFree (_PSISequenceWeights *seq_weights)
 Deallocates the _PSISequenceWeights structure. More...
 
static int s_PSIValidateNoGapsInQuery (const _PSIMsa *msa)
 Validate that there are no gaps in the query sequence. More...
 
static int s_PSIValidateNoFlankingGaps (const _PSIMsa *msa)
 Validate that there are no flanking gaps in the multiple sequence alignment. More...
 
static int s_PSIValidateAlignedColumns (const _PSIMsa *msa)
 Validate that there are no unaligned columns or columns which only contain gaps in the multiple sequence alignment. More...
 
static int s_PSIValidateParticipatingSequences (const _PSIMsa *msa)
 Verify that after purging biased sequences in multiple sequence alignment there are still sequences participating in the multiple sequences alignment. More...
 
void _PSIStructureGroupCustomization (_PSIMsa *msa)
 Enable NCBI structure group customization to discard the query sequence, as this really isn't the result of a PSI-BLAST iteration, but rather an artificial consensus sequence of the multiple sequence alignment constructed by them. More...
 
int _PSIValidateMSA_StructureGroup (const _PSIMsa *msa)
 Structure group validation function for multiple sequence alignment structure. More...
 
int _PSIValidateMSA (const _PSIMsa *msa, Boolean ignore_unaligned_positions)
 Main validation function for multiple sequence alignment structure. More...
 
int _PSIValidateCdMSA (const PSICdMsa *cd_msa, Uint4 alphabet_size)
 Validation of multiple alignment of conserved domains structure. More...
 
static void s_PSIPurgeSelfHits (_PSIPackedMsa *msa)
 Remove those sequences which are identical to the query sequence. More...
 
static void s_PSIPurgeNearIdenticalAlignments (_PSIPackedMsa *msa)
 Keeps only one copy of any aligned sequences which are >kPSINearIdentical% identical to one another. More...
 
static void s_PSIPurgeSimilarAlignments (_PSIPackedMsa *msa, Uint4 seq_index1, Uint4 seq_index2, double max_percent_identity)
 This function compares the sequences in the msa->cell structure indexed by sequence_index1 and seq_index2. More...
 
int _PSIPurgeBiasedSegments (_PSIPackedMsa *msa)
 Main function for keeping only those selected sequences for PSSM construction (stage 2). More...
 
void _PSIUpdatePositionCounts (_PSIMsa *msa)
 Counts the number of sequences matching the query per query position (columns of the multiple alignment) as well as the number of residues present in each position of the query. More...
 
static NCBI_INLINE void _PSIResetAlignmentTraits (_PSIAlignmentTraits *traits, Uint4 position)
 Resets the traits structure to restart finite state machine. More...
 
static NCBI_INLINE void _handleNeitherAligned (_PSIAlignmentTraits *traits, _EPSIPurgeFsmState *state, _PSIPackedMsa *msa, Uint4 seq_index, double max_percent_identity)
 Handles neither is aligned event FIXME: document better. More...
 
static NCBI_INLINE void _handleBothAlignedSameResidueNoX (_PSIAlignmentTraits *traits, _EPSIPurgeFsmState *state)
 Handle event when both positions are aligned, using the same residue, but this residue is not X. More...
 
static NCBI_INLINE void _handleEitherAlignedEitherX (_PSIAlignmentTraits *traits, _EPSIPurgeFsmState *state)
 Handle the event when either position is aligned and either is X. More...
 
static NCBI_INLINE void _handleEitherAlignedNeitherX (_PSIAlignmentTraits *traits, _EPSIPurgeFsmState *state, Uint4 position)
 Handle the event when either position is aligned and neither is X. More...
 
static void _PSIGetLeftExtents (const _PSIMsa *msa, Uint4 seq_index)
 Computes the left extents for the sequence identified by seq_index. More...
 
static void _PSIGetRightExtents (const _PSIMsa *msa, Uint4 seq_index)
 Computes the right extents for the sequence identified by seq_index. More...
 
static void _PSIComputePositionExtents (const _PSIMsa *msa, Uint4 seq_index, _PSIAlignedBlock *aligned_blocks)
 Computes the aligned blocks extents' for each position for the sequence identified by seq_index. More...
 
static void _PSIComputeAlignedRegionLengths (const _PSIMsa *msa, _PSIAlignedBlock *aligned_blocks)
 Calculates the aligned blocks lengths in the multiple sequence alignment data structure. More...
 
int _PSIComputeAlignmentBlocks (const _PSIMsa *msa, _PSIAlignedBlock *aligned_blocks)
 Main function to compute aligned blocks' properties for each position within multiple alignment (stage 3) Corresponds to posit.c:posComputeExtents. More...
 
static void _PSIGetAlignedSequencesForPosition (const _PSIMsa *msa, Uint4 position, SDynamicUint4Array *aligned_sequences)
 Populates the array aligned_sequences with the indices of the sequences which are part of the multiple sequence alignment at the request position. More...
 
static void _PSICalculateNormalizedSequenceWeights (const _PSIMsa *msa, const _PSIAlignedBlock *aligned_blocks, Uint4 position, const SDynamicUint4Array *aligned_seqs, _PSISequenceWeights *seq_weights)
 Calculates the position based weights using a modified version of the Henikoff's algorithm presented in "Position-based sequence weights". More...
 
static void _PSICalculateMatchWeights (const _PSIMsa *msa, Uint4 position, const SDynamicUint4Array *aligned_seqs, _PSISequenceWeights *seq_weights)
 Calculate the weighted observed sequence weights. More...
 
static void _PSISpreadGapWeights (const _PSIMsa *msa, _PSISequenceWeights *seq_weights, Boolean nsg_compatibility_mode)
 Uses disperse method of spreading the gap weights. More...
 
static int _PSICheckSequenceWeights (const _PSIMsa *msa, const _PSISequenceWeights *seq_weights, Boolean nsg_compatibility_mode)
 Verifies that the sequence weights for each column of the PSSM add up to 1.0. More...
 
int _PSIComputeSequenceWeights (const _PSIMsa *msa, const _PSIAlignedBlock *aligned_blocks, Boolean nsg_compatibility_mode, _PSISequenceWeights *seq_weights)
 Main function to calculate the sequence weights. More...
 
static void s_PSIComputeFrequenciesFromCDsCleanup (double *sum_weights)
 
int _PSIComputeFrequenciesFromCDs (const PSICdMsa *cd_msa, BlastScoreBlk *sbp, const PSIBlastOptions *options, _PSISequenceWeights *seq_weights)
 Main function to calculate CD weights and combine weighted residue counts from matched CDs. More...
 
static void s_initializeExpNumObservations (double *expno, const double *backgroundProbabilities)
 initialize the expected number of observations use background probabilities for this matrix Calculate exp. More...
 
static double s_effectiveObservations (const _PSIAlignedBlock *align_blk, const _PSISequenceWeights *seq_weights, int columnNumber, int queryLength, const double *expno)
 A method to estimate the effetive number of observations in the interval for the specified columnNumber copy of posit.c:effectiveObservations. More...
 
static double s_columnSpecificPseudocounts (const _PSISequenceWeights *posSearch, int columnNumber, const double *backgroundProbabilities, const double observations)
 copy of posit.c:columnSpecificPseudocounts More...
 
int _PSIComputeFreqRatios (const _PSIMsa *msa, const _PSISequenceWeights *seq_weights, const BlastScoreBlk *sbp, const _PSIAlignedBlock *aligned_blocks, Int4 pseudo_count, Boolean nsg_compatibility_mode, _PSIInternalPssmData *internal_pssm)
 Main function to compute the PSSM's frequency ratios (stage 5). More...
 
int _PSIComputeFreqRatiosFromCDs (const PSICdMsa *cd_msa, const _PSISequenceWeights *seq_weights, const BlastScoreBlk *sbp, Int4 pseudo_count, _PSIInternalPssmData *internal_pssm)
 Main function to compute CD-based PSSM's frequency ratios. More...
 
double * _PSICalculateInformationContentFromScoreMatrix (Int4 **score_mat, const double *std_prob, const Uint1 *query, Uint4 query_length, Uint4 alphabet_sz, double lambda)
 Calculates the information content from the scoring matrix. More...
 
double * _PSICalculateInformationContentFromFreqRatios (double **freq_ratios, const double *std_prob, Uint4 query_length, Uint4 alphabet_sz)
 Calculates the information content from the residue frequencies calculated in stage 5 of the PSSM creation algorithm Corresponds to posit.c:posFreqsToInformation. More...
 
int _PSIConvertFreqRatiosToPSSM (_PSIInternalPssmData *internal_pssm, const Uint1 *query, const BlastScoreBlk *sbp, const double *std_probs)
 Converts the PSSM's frequency ratios obtained in the previous stage to a PSSM of scores. More...
 
int _PSIScaleMatrix (const Uint1 *query, const double *std_probs, _PSIInternalPssmData *internal_pssm, BlastScoreBlk *sbp)
 Scales the PSSM (stage 7) More...
 
int _IMPALAScaleMatrix (const Uint1 *query, const double *std_probs, _PSIInternalPssmData *internal_pssm, BlastScoreBlk *sbp, double scaling_factor)
 Provides a similar function to _PSIScaleMatrix but it performs the scaling as IMPALA did, i.e. More...
 
Uint4 _PSISequenceLengthWithoutX (const Uint1 *seq, Uint4 length)
 Calculates the length of the sequence without including any 'X' residues. More...
 
Blast_ScoreFreq_PSIComputeScoreProbabilities (const int **pssm, const Uint1 *query, Uint4 query_length, const double *std_probs, const BlastScoreBlk *sbp)
 Compute the probabilities for each score in the PSSM. More...
 
void _PSIUpdateLambdaK (const int **pssm, const Uint1 *query, Uint4 query_length, const double *std_probs, BlastScoreBlk *sbp)
 Updates the Karlin-Altschul parameters based on the query sequence and PSSM's score frequencies. More...
 
static void s_PSIDiscardIfUnused (_PSIPackedMsa *msa, unsigned int seq_index)
 Check if we still need this sequence. More...
 
int _PSIPurgeAlignedRegion (_PSIPackedMsa *msa, unsigned int seq_index, unsigned int start, unsigned int stop)
 Marks the (start, stop] region corresponding to sequence seq_index in alignment so that it is not further considered for PSSM calculation. More...
 
int _PSISaveDiagnostics (const _PSIMsa *msa, const _PSIAlignedBlock *aligned_block, const _PSISequenceWeights *seq_weights, const _PSIInternalPssmData *internal_pssm, PSIDiagnosticsResponse *diagnostics)
 Collects diagnostic information from the process of creating the PSSM. More...
 
int _PSISaveCDDiagnostics (const PSICdMsa *cd_msa, const _PSISequenceWeights *seq_weights, const _PSIInternalPssmData *internal_pssm, PSIDiagnosticsResponse *diagnostics)
 Collects diagnostic information from the process of creating the CDD-based PSSM. More...
 
static void s_fillColumnProbabilities (double *probabilities, const _PSISequenceWeights *posSearch, Int4 columnNumber)
 Reorders in the same manner as returned by Blast_GetMatrixBackgroundFreq this function is a copy of posit.c:fillColumnProbabilities. More...
 
static void s_adjustColumnProbabilities (double *initialProbabilities, double *probabilitiesToReturn, double standardWeight, const double *standardProbabilities, double observations)
 adjust the probabilities by assigning observations weight to initialProbabilities and standardWeight to standardProbabilities copy of posit.c:adjustColumnProbabilities More...
 
static double s_computeRelativeEntropy (const double *newDistribution, const double *backgroundProbabilities)
 compute relative entropy of first distribution to second distribution A copy of posit.c:computeRelativeEntropy More...
 

Variables

const double kPSINearIdentical = 0.94
 Percent identity threshold for discarding near-identical matches. More...
 
const double kPSIIdentical = 1.0
 Percent identity threshold for discarding identical matches. More...
 
const unsigned int kQueryIndex = 0
 Index into multiple sequence alignment structure for the query sequence. More...
 
const double kEpsilon = 0.0001
 Small constant to test against 0. More...
 
const int kPSIScaleFactor = 200
 Successor to POSIT_SCALE_FACTOR. More...
 
const double kPositScalingPercent = 0.05
 Constant used in scaling PSSM routines: Successor to POSIT_PERCENT. More...
 
const Uint4 kPositScalingNumIterations = 10
 Constant used in scaling PSSM routines: Successor to POSIT_NUM_ITERATIONS. More...
 
const double kPosEpsilon = 0.0001
 minimum return value of s_computeRelativeEntropy More...
 

Detailed Description

Defintions for functions in private interface for Position Iterated BLAST API.

Definition in file blast_psi_priv.c.

Macro Definition Documentation

◆ DEFINE_COPY_MATRIX_FUNCTION

#define DEFINE_COPY_MATRIX_FUNCTION (   type)
Value:
void _PSICopyMatrix_##type(type** dest, type** src, \
unsigned int ncols, unsigned int nrows) \
{ \
unsigned int i = 0; \
unsigned int j = 0; \
ASSERT(dest); \
ASSERT(src); \
for (i = 0; i < ncols; i++) { \
for (j = 0; j < nrows; j++) { \
dest[i][j] = src[i][j]; \
} \
} \
} \
static int type
Definition: getdata.c:31
for(len=0;yy_str[len];++len)
int i
#define ASSERT
macro for assert.
Definition: ncbi_std.h:107
Definition: type.c:6

Implements the generic copy matrix functions.

Prototypes must be defined in the header file manually following the naming convention for _PSICopyMatrix_int

Definition at line 106 of file blast_psi_priv.c.

◆ EFFECTIVE_ALPHABET

#define EFFECTIVE_ALPHABET   20

size of alphabet used for pseudocounts calculations

Definition at line 550 of file blast_psi_priv.c.

◆ MAX_IND_OBSERVATIONS

#define MAX_IND_OBSERVATIONS   400

max number of independent observation for pseudocount calculation

Definition at line 2065 of file blast_psi_priv.c.

◆ PSEUDO_MAX

#define PSEUDO_MAX   1000000

effective infinity

Definition at line 2066 of file blast_psi_priv.c.

◆ SEQUENCE_WEIGHTS_CHECK__ABORT_ON_FAILURE

#define SEQUENCE_WEIGHTS_CHECK__ABORT_ON_FAILURE   0

The following define enables/disables the _PSICheckSequenceWeights function's abort statement in the case when the sequence weights are not being checked.

When this is enabled, abort() will be invoked if none of the sequence weights are checked to be in the proper range. The C toolkit code silently ignores this situation, so it's implemented that way here for backwards compatibility.

Definition at line 1972 of file blast_psi_priv.c.

Typedef Documentation

◆ _EPSIPurgeFsmState

Defines the states of the finite state machine used in s_PSIPurgeSimilarAlignments.

Successor to posit.c's POS_COUNTING and POS_RESTING

◆ _PSIAlignmentTraits

Auxiliary structure to maintain information about two aligned regions between the query and a subject sequence.

It is used to store the data manipulated by the finite state machine used in s_PSIPurgeSimilarAlignments.

Enumeration Type Documentation

◆ _EPSIPurgeFsmState

Defines the states of the finite state machine used in s_PSIPurgeSimilarAlignments.

Successor to posit.c's POS_COUNTING and POS_RESTING

Enumerator
eCounting 
eResting 

Definition at line 1031 of file blast_psi_priv.c.

Function Documentation

◆ _handleBothAlignedSameResidueNoX()

static NCBI_INLINE void _handleBothAlignedSameResidueNoX ( _PSIAlignmentTraits traits,
_EPSIPurgeFsmState state 
)
static

Handle event when both positions are aligned, using the same residue, but this residue is not X.

Definition at line 1118 of file blast_psi_priv.c.

References abort(), ASSERT, eCounting, eResting, and _PSIAlignmentTraits::n_identical.

Referenced by s_PSIPurgeSimilarAlignments().

◆ _handleEitherAlignedEitherX()

static NCBI_INLINE void _handleEitherAlignedEitherX ( _PSIAlignmentTraits traits,
_EPSIPurgeFsmState state 
)
static

Handle the event when either position is aligned and either is X.

Definition at line 1140 of file blast_psi_priv.c.

References abort(), ASSERT, eCounting, eResting, and _PSIAlignmentTraits::n_x_residues.

Referenced by s_PSIPurgeSimilarAlignments().

◆ _handleEitherAlignedNeitherX()

static NCBI_INLINE void _handleEitherAlignedNeitherX ( _PSIAlignmentTraits traits,
_EPSIPurgeFsmState state,
Uint4  position 
)
static

Handle the event when either position is aligned and neither is X.

Definition at line 1162 of file blast_psi_priv.c.

References _PSIResetAlignmentTraits(), abort(), ASSERT, eCounting, _PSIAlignmentTraits::effective_length, and eResting.

Referenced by s_PSIPurgeSimilarAlignments().

◆ _handleNeitherAligned()

static NCBI_INLINE void _handleNeitherAligned ( _PSIAlignmentTraits traits,
_EPSIPurgeFsmState state,
_PSIPackedMsa msa,
Uint4  seq_index,
double  max_percent_identity 
)
static

◆ _IMPALAScaleMatrix()

int _IMPALAScaleMatrix ( const Uint1 query,
const double *  std_probs,
_PSIInternalPssmData internal_pssm,
BlastScoreBlk sbp,
double  scaling_factor 
)

Provides a similar function to _PSIScaleMatrix but it performs the scaling as IMPALA did, i.e.

: allowing the specification of a scaling factor and when calculating the score probabilities, the query length includes 'X' residues.

Definition at line 2599 of file blast_psi_priv.c.

References _PSICopyMatrix_int(), _PSIInternalPssmData::freq_ratios, Kappa_compactSearchItemsFree(), Kappa_compactSearchItemsNew(), Kappa_impalaScaling(), Kappa_posSearchItemsFree(), Kappa_posSearchItemsNew(), BlastScoreBlk::name, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, PSI_SUCCESS, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios().

◆ _PSIAlignedBlockFree()

_PSIAlignedBlock* _PSIAlignedBlockFree ( _PSIAlignedBlock aligned_blocks)

Deallocates the _PSIAlignedBlock structure.

Parameters
aligned_blocksdata structure to deallocate [in]
Returns
NULL

Definition at line 532 of file blast_psi_priv.c.

References NULL, _PSIAlignedBlock::pos_extnt, sfree, and _PSIAlignedBlock::size.

Referenced by _PSIAlignedBlockNew(), Deleter< _PSIAlignedBlock >::Delete(), and s_PSICreatePssmCleanUp().

◆ _PSIAlignedBlockNew()

_PSIAlignedBlock* _PSIAlignedBlockNew ( Uint4  query_length)

Allocates and initializes the _PSIAlignedBlock structure.

Parameters
query_lengthlength of the query sequence of the multiple sequence alignment [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 501 of file blast_psi_priv.c.

References _PSIAlignedBlockFree(), calloc(), i, SSeqRange::left, malloc(), NULL, _PSIAlignedBlock::pos_extnt, SSeqRange::right, and _PSIAlignedBlock::size.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIAllocateMatrix()

void** _PSIAllocateMatrix ( unsigned int  ncols,
unsigned int  nrows,
unsigned int  data_type_sz 
)

Generic 2 dimensional matrix allocator.

Allocates a ncols by nrows matrix with cells of size data_type_sz. Must be freed using x_DeallocateMatrix

Parameters
ncolsnumber of columns in matrix [in]
nrowsnumber of rows in matrix [in]
data_type_szsize of the data type (in bytes) to allocate for each element in the matrix [in]
Returns
pointer to allocated memory or NULL in case of failure

Definition at line 66 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), calloc(), i, malloc(), and NULL.

Referenced by _PSIInternalPssmDataNew(), _PSIMatrixFrequencyRatiosNew(), _PSIMsaNew(), _PSIPackedMsaNew(), _PSISequenceWeightsNew(), Kappa_posSearchItemsNew(), CRedoAlignmentTestFixture::loadPssmFromFile(), PSIDiagnosticsResponseNew(), PSIMatrixNew(), PSIMsaNew(), RPSRescalePssm(), s_RPSComputeTraceback(), s_RPSFillFreqRatiosInPsiMatrix(), SBlastScoreMatrixNew(), and SPsiBlastScoreMatrixNew().

◆ _PSICalculateInformationContentFromFreqRatios()

double* _PSICalculateInformationContentFromFreqRatios ( double **  freq_ratios,
const double *  std_prob,
Uint4  query_length,
Uint4  alphabet_sz 
)

Calculates the information content from the residue frequencies calculated in stage 5 of the PSSM creation algorithm Corresponds to posit.c:posFreqsToInformation.

See also
_PSIComputeFreqRatios: stage 5
Parameters
freq_ratiosmatrix of frequency ratios (dimensions: query_length x alphabet_sz) (const) [in]
std_probstandard residue probabilities [in]
query_lengthlength of the query [in]
alphabet_szlength of the alphabet used by the query [in]
Returns
array of length query_length containing the information content per query position or NULL on error (e.g.: out-of-memory or NULL parameters)

Definition at line 2341 of file blast_psi_priv.c.

References calloc(), kEpsilon, log, NCBIMATH_LN2, NULL, and r().

Referenced by _PSISaveCDDiagnostics(), and _PSISaveDiagnostics().

◆ _PSICalculateInformationContentFromScoreMatrix()

double* _PSICalculateInformationContentFromScoreMatrix ( Int4 **  score_mat,
const double *  std_prob,
const Uint1 query,
Uint4  query_length,
Uint4  alphabet_sz,
double  lambda 
)

Calculates the information content from the scoring matrix.

Parameters
score_matalphabet by alphabet_sz matrix of scores (const) [in]
std_probstandard residue probabilities [in]
queryquery sequence [in]
query_lengthlength of the query [in]
alphabet_szlength of the alphabet used by the query [in]
lambdalambda parameter [in] FIXME documentation
Returns
array of length query_length containing the information content per query position or NULL on error (e.g.: out-of-memory or NULL parameters)

Definition at line 2299 of file blast_psi_priv.c.

References calloc(), kEpsilon, lambda(), log, NCBIMATH_LN2, NULL, query, r(), and tmp.

◆ _PSICalculateMatchWeights()

static void _PSICalculateMatchWeights ( const _PSIMsa msa,
Uint4  position,
const SDynamicUint4Array aligned_seqs,
_PSISequenceWeights seq_weights 
)
static

Calculate the weighted observed sequence weights.

Parameters
msamultiple sequence alignment data structure [in]
positionposition of the query to calculate the sequence weights for [in]
aligned_seqsarray containing the indices of the sequences participating in the multiple sequence alignment at the requested position [in]
seq_weightssequence weights data structure [in|out]

Definition at line 1875 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, SDynamicUint4Array::data, _PSISequenceWeights::gapless_column_weights, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, _PSISequenceWeights::norm_seq_weights, and SDynamicUint4Array::num_used.

Referenced by _PSIComputeSequenceWeights().

◆ _PSICalculateNormalizedSequenceWeights()

static void _PSICalculateNormalizedSequenceWeights ( const _PSIMsa msa,
const _PSIAlignedBlock aligned_blocks,
Uint4  position,
const SDynamicUint4Array aligned_seqs,
_PSISequenceWeights seq_weights 
)
static

Calculates the position based weights using a modified version of the Henikoff's algorithm presented in "Position-based sequence weights".

Skipped optimization about identical previous sets.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blocksaligned regions' extents [in]
positionposition of the query to calculate the sequence weights for [in]
aligned_seqsarray containing the indices of the sequences participating in the multiple sequence alignment at the requested position [in]
seq_weightssequence weights data structure [out]

Definition at line 1750 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, BLASTAA_SIZE, _PSIMsa::cell, SDynamicUint4Array::data, EFFECTIVE_ALPHABET, FALSE, i, SSeqRange::left, _PSIMsaCell::letter, MIN, _PSISequenceWeights::norm_seq_weights, SDynamicUint4Array::num_used, _PSIAlignedBlock::pos_extnt, _PSISequenceWeights::posDistinctDistrib, SSeqRange::right, _PSISequenceWeights::row_sigma, _PSISequenceWeights::sigma, and TRUE.

Referenced by _PSIComputeSequenceWeights().

◆ _PSICheckSequenceWeights()

static int _PSICheckSequenceWeights ( const _PSIMsa msa,
const _PSISequenceWeights seq_weights,
Boolean  nsg_compatibility_mode 
)
static

Verifies that the sequence weights for each column of the PSSM add up to 1.0.

Parameters
msamultiple sequence alignment data structure [in]
seq_weightssequence weights data structure [in]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE [in]
Returns
PSIERR_BADSEQWEIGHTS in case of failure, PSI_SUCCESS otherwise

Definition at line 1977 of file blast_psi_priv.c.

References _PSIMsa::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, assert, _PSIMsa::cell, _PSIMsa::dimensions, FALSE, kQueryIndex, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, _PSIMsa::num_matching_seqs, PSI_SUCCESS, PSIERR_BADSEQWEIGHTS, PSIMsaDimensions::query_length, and TRUE.

Referenced by _PSIComputeSequenceWeights().

◆ _PSIComputeAlignedRegionLengths()

static void _PSIComputeAlignedRegionLengths ( const _PSIMsa msa,
_PSIAlignedBlock aligned_blocks 
)
static

Calculates the aligned blocks lengths in the multiple sequence alignment data structure.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blocksaligned regions' extents [in|out]

Definition at line 1421 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::dimensions, i, SSeqRange::left, _PSIAlignedBlock::pos_extnt, _PSIMsa::query, PSIMsaDimensions::query_length, SSeqRange::right, and _PSIAlignedBlock::size.

Referenced by _PSIComputeAlignmentBlocks().

◆ _PSIComputeAlignmentBlocks()

int _PSIComputeAlignmentBlocks ( const _PSIMsa msa,
_PSIAlignedBlock aligned_block 
)

Main function to compute aligned blocks' properties for each position within multiple alignment (stage 3) Corresponds to posit.c:posComputeExtents.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blockdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [out]
Returns
PSIERR_BADPARAM if arguments are NULL PSI_SUCCESS otherwise

Definition at line 1297 of file blast_psi_priv.c.

References _PSIComputeAlignedRegionLengths(), _PSIComputePositionExtents(), _PSIGetLeftExtents(), _PSIGetRightExtents(), _PSIMsa::dimensions, kQueryIndex, PSIMsaDimensions::num_seqs, PSI_SUCCESS, and PSIERR_BADPARAM.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIComputeFreqRatios()

int _PSIComputeFreqRatios ( const _PSIMsa msa,
const _PSISequenceWeights seq_weights,
const BlastScoreBlk sbp,
const _PSIAlignedBlock aligned_blocks,
Int4  pseudo_count,
Boolean  nsg_compatibility_mode,
_PSIInternalPssmData internal_pssm 
)

Main function to compute the PSSM's frequency ratios (stage 5).

Implements formula 2 in Nucleic Acids Research, 2001, Vol 29, No 14. Corresponds to posit.c:posComputePseudoFreqs

Parameters
msamultiple sequence alignment data structure [in]
seq_weightsdata structure containing the data needed to compute the sequence weights [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
aligned_blocksdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [in]
pseudo_countpseudo count constant [in]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE
internal_pssmPSSM being computed [out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSI_SUCCESS otherwise

Definition at line 2071 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, _PSIMsa::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetMatrixBackgroundFreq(), BLAST_SCORE_MIN, _PSIMsa::cell, SBlastScoreMatrix::data, SFreqRatios::data, _PSIMsa::dimensions, _PSIInternalPssmData::freq_ratios, i, _PSISequenceWeights::independent_observations, kEpsilon, kQueryIndex, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, BlastScoreBlk::matrix, MAX_IND_OBSERVATIONS, BlastScoreBlk::name, NULL, PSEUDO_MAX, _PSIInternalPssmData::pseudocounts, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_UNKNOWN, PSIMsaDimensions::query_length, r(), s_columnSpecificPseudocounts(), s_effectiveObservations(), s_initializeExpNumObservations(), and _PSISequenceWeights::std_prob.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIComputeFreqRatiosFromCDs()

int _PSIComputeFreqRatiosFromCDs ( const PSICdMsa cd_msa,
const _PSISequenceWeights seq_weights,
const BlastScoreBlk sbp,
Int4  pseudo_count,
_PSIInternalPssmData internal_pssm 
)

Main function to compute CD-based PSSM's frequency ratios.

Parameters
cd_msamultiple alignment of CDs [in]
seq_weightscontains weighted residue frequencies and effective number of observations [in]
sbpinitialized score block data structure [in]
pseudo_countpseudo count constant [in]
internal_pssmPSSM [out]
Returns
status

Definition at line 2185 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetMatrixBackgroundFreq(), BLAST_SCORE_MIN, SBlastScoreMatrix::data, SFreqRatios::data, PSICdMsa::dimensions, _PSIInternalPssmData::freq_ratios, i, _PSISequenceWeights::independent_observations, kEpsilon, _PSISequenceWeights::match_weights, BlastScoreBlk::matrix, MAX, BlastScoreBlk::name, NULL, PSEUDO_MAX, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSICdMsa::query, PSIMsaDimensions::query_length, r(), s_columnSpecificPseudocounts(), and _PSISequenceWeights::std_prob.

Referenced by BOOST_AUTO_TEST_CASE(), PSICreatePssmFromCDD(), and s_TestCreatePssmFromFreqs().

◆ _PSIComputeFrequenciesFromCDs()

int _PSIComputeFrequenciesFromCDs ( const PSICdMsa cd_msa,
BlastScoreBlk sbp,
const PSIBlastOptions options,
_PSISequenceWeights seq_weights 
)

Main function to calculate CD weights and combine weighted residue counts from matched CDs.

Parameters
cd_msamultiple alignment of conserved domains data structure [in]
sbpBLAST score block [in]
optionsCDD-related options [in]
seq_weightsdata structure with CD frequencies [out]

Definition at line 1651 of file blast_psi_priv.c.

References BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, PSICdMsaCell::data, PSICdMsa::dimensions, fabs, _PSISequenceWeights::independent_observations, PSICdMsaCellData::iobsr, PSICdMsaCell::is_aligned, malloc(), _PSISequenceWeights::match_weights, MIN, PSICdMsa::msa, NULL, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSICdMsa::query, PSIMsaDimensions::query_length, s_PSIComputeFrequenciesFromCDsCleanup(), and PSICdMsaCellData::wfreqs.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmFromCDD().

◆ _PSIComputePositionExtents()

static void _PSIComputePositionExtents ( const _PSIMsa msa,
Uint4  seq_index,
_PSIAlignedBlock aligned_blocks 
)
static

Computes the aligned blocks extents' for each position for the sequence identified by seq_index.

Parameters
msamultiple sequence alignment data structure [in]
seq_indexindex of the sequence of interest [in]
aligned_blocksaligned regions' extents [out]

Definition at line 1387 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::extents, i, bm::is_aligned(), SSeqRange::left, letter(), MAX, MIN, NULL, _PSIAlignedBlock::pos_extnt, PSIMsaDimensions::query_length, and SSeqRange::right.

Referenced by _PSIComputeAlignmentBlocks().

◆ _PSIComputeScoreProbabilities()

Blast_ScoreFreq* _PSIComputeScoreProbabilities ( const int **  pssm,
const Uint1 query,
Uint4  query_length,
const double *  std_probs,
const BlastScoreBlk sbp 
)

Compute the probabilities for each score in the PSSM.

This is only valid for protein sequences. FIXME: Should this be moved to blast_stat.[hc]? used in kappa.c in notposfillSfp()

Parameters
pssmPSSM for which to compute the score probabilities [in]
queryquery sequence for the PSSM above in ncbistdaa encoding [in]
query_lengthlength of the query sequence above [in]
std_probsarray containing the standard background residue probabilities [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
Returns
structure containing the score frequencies, or NULL in case of error

Definition at line 2647 of file blast_psi_priv.c.

References _PSISequenceLengthWithoutX(), BlastScoreBlk::alphabet_code, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetStdAlphabet(), BLAST_SCORE_MAX, BLAST_SCORE_MIN, Blast_ScoreFreqNew(), BLASTAA_SEQ_CODE, BLASTAA_SIZE, kScore, MAX, MIN, NULL, Blast_ScoreFreq::obs_max, Blast_ScoreFreq::obs_min, query, r(), Blast_ScoreFreq::score_avg, and Blast_ScoreFreq::sprob.

Referenced by _PSIUpdateLambdaK().

◆ _PSIComputeSequenceWeights()

int _PSIComputeSequenceWeights ( const _PSIMsa msa,
const _PSIAlignedBlock aligned_blocks,
Boolean  nsg_compatibility_mode,
_PSISequenceWeights seq_weights 
)

Main function to calculate the sequence weights.

Should be called with the return value of PSIComputeAlignmentBlocks (stage 4) Corresponds to posit.c:posComputeSequenceWeights

Parameters
msamultiple sequence alignment data structure [in]
aligned_blocksdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [in]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE [in]
seq_weightsdata structure containing the data needed to compute the sequence weights [out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSIERR_OUTOFMEM in case of memory allocation failure, PSIERR_BADSEQWEIGHTS if the sequence weights fail to add up to 1.0, PSI_SUCCESS otherwise

Definition at line 1552 of file blast_psi_priv.c.

References _PSICalculateMatchWeights(), _PSICalculateNormalizedSequenceWeights(), _PSICheckSequenceWeights(), _PSIGetAlignedSequencesForPosition(), _PSISpreadGapWeights(), ASSERT, _PSIMsa::dimensions, DynamicUint4Array_AreEqual(), DynamicUint4Array_Copy(), DynamicUint4Array_Dup(), DynamicUint4ArrayFree(), DynamicUint4ArrayNewEx(), EFFECTIVE_ALPHABET, _PSISequenceWeights::norm_seq_weights, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, SDynamicUint4Array::num_used, _PSISequenceWeights::posDistinctDistrib, _PSISequenceWeights::posNumParticipating, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, _PSISequenceWeights::row_sigma, _PSISequenceWeights::sigma, and _PSIAlignedBlock::size.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIConvertFreqRatiosToPSSM()

int _PSIConvertFreqRatiosToPSSM ( _PSIInternalPssmData internal_pssm,
const Uint1 query,
const BlastScoreBlk sbp,
const double *  std_probs 
)

Converts the PSSM's frequency ratios obtained in the previous stage to a PSSM of scores.

(stage 6)

Parameters
internal_pssmPSSM being computed [in|out]
queryquery sequence in ncbistdaa encoding. The length of this sequence is read from internal_pssm->ncols [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
std_probsarray containing the standard residue probabilities [in]
Returns
PSIERR_BADPARAM if arguments are NULL, PSI_SUCCESS otherwise

Definition at line 2392 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, SFreqRatios::bit_scale_factor, BLAST_Nint(), BLAST_SCORE_MIN, SBlastScoreMatrix::data, SFreqRatios::data, FALSE, _PSIInternalPssmData::freq_ratios, i, int, BlastScoreBlk::kbp_ideal, kEpsilon, kPSIScaleFactor, Blast_KarlinBlk::Lambda, log, BlastScoreBlk::matrix, BlastScoreBlk::name, NCBIMATH_LN2, _PSIInternalPssmData::ncols, NULL, PSI_SUCCESS, PSIERR_BADPARAM, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, tmp, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios(), BOOST_AUTO_TEST_CASE(), s_ScalePosMatrix(), and s_TestCreatePssmFromFreqs().

◆ _PSICopyMatrix_double()

void _PSICopyMatrix_double ( double **  dest,
double **  src,
unsigned int  ncols,
unsigned int  nrows 
)

Copies src matrix into dest matrix, both of which must be double matrices with dimensions ncols by nrows.

Parameters
destDestination matrix [out]
srcSource matrix [in]
ncolsNumber of columns to copy [in]
nrowsNumber of rows to copy [in]

Definition at line 124 of file blast_psi_priv.c.

Referenced by PSICreatePssmFromFrequencyRatios(), and s_ScalePosMatrix().

◆ _PSICopyMatrix_int()

void _PSICopyMatrix_int ( int **  dest,
int **  src,
unsigned int  ncols,
unsigned int  nrows 
)

Copies src matrix into dest matrix, both of which must be int matrices with dimensions ncols by nrows.

Parameters
destDestination matrix [out]
srcSource matrix [in]
ncolsNumber of columns to copy [in]
nrowsNumber of rows to copy [in]

Definition at line 123 of file blast_psi_priv.c.

Referenced by _IMPALAScaleMatrix(), s_PSISavePssm(), s_ScalePosMatrix(), and CRedoAlignmentTestFixture::setupPositionBasedBlastScoreBlk().

◆ _PSIDeallocateMatrix()

void** _PSIDeallocateMatrix ( void **  matrix,
unsigned int  ncols 
)

Generic 2 dimensional matrix deallocator.

Deallocates the memory allocated by x_AllocateMatrix

Parameters
matrixmatrix to deallocate [in]
ncolsnumber of columns in the matrix [in]
Returns
NULL

Definition at line 88 of file blast_psi_priv.c.

References i, NULL, and sfree.

Referenced by _PSIAllocateMatrix(), _PSIInternalPssmDataFree(), _PSIMatrixFrequencyRatiosFree(), _PSIMsaFree(), _PSIPackedMsaFree(), _PSISequenceWeightsFree(), Kappa_posSearchItemsFree(), CRedoAlignmentTestFixture::loadPssmFromFile(), PSIDiagnosticsResponseFree(), PSIMatrixFree(), PSIMsaFree(), s_RPSComputeTraceback(), SBlastScoreMatrixFree(), CRedoAlignmentTestFixture::setupPositionBasedBlastScoreBlk(), and SPsiBlastScoreMatrixFree().

◆ _PSIGetAlignedSequencesForPosition()

static void _PSIGetAlignedSequencesForPosition ( const _PSIMsa msa,
Uint4  position,
SDynamicUint4Array aligned_sequences 
)
static

Populates the array aligned_sequences with the indices of the sequences which are part of the multiple sequence alignment at the request position.

Parameters
msamultiple sequence alignment data structure [in]
positionposition of interest [in]
aligned_sequencesarray which will contain the indices of the sequences aligned at the requested position. This array must have size greater than or equal to the number of sequences + 1 in multiple alignment data structure (alignment->dimensions->num_seqs + 1) [out]

Definition at line 1904 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, DynamicUint4Array_Append(), i, _PSIMsaCell::is_aligned, _PSIMsaCell::letter, SDynamicUint4Array::num_allocated, PSIMsaDimensions::num_seqs, and SDynamicUint4Array::num_used.

Referenced by _PSIComputeSequenceWeights().

◆ _PSIGetLeftExtents()

static void _PSIGetLeftExtents ( const _PSIMsa msa,
Uint4  seq_index 
)
static

Computes the left extents for the sequence identified by seq_index.

Parameters
msamultiple sequence alignment data structure [in]
seq_indexindex of the sequence of interest [in]

Definition at line 1319 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::extents, bm::is_aligned(), SSeqRange::left, letter(), NULL, prev(), and PSIMsaDimensions::query_length.

Referenced by _PSIComputeAlignmentBlocks().

◆ _PSIGetRightExtents()

static void _PSIGetRightExtents ( const _PSIMsa msa,
Uint4  seq_index 
)
static

Computes the right extents for the sequence identified by seq_index.

Parameters
msamultiple sequence alignment data structure [in]
seq_indexindex of the sequence of interest [in]

Definition at line 1353 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::extents, bm::is_aligned(), last(), letter(), NULL, PSIMsaDimensions::query_length, and SSeqRange::right.

Referenced by _PSIComputeAlignmentBlocks().

◆ _PSIInternalPssmDataFree()

_PSIInternalPssmData* _PSIInternalPssmDataFree ( _PSIInternalPssmData pssm)

◆ _PSIInternalPssmDataNew()

_PSIInternalPssmData* _PSIInternalPssmDataNew ( Uint4  query_length,
Uint4  alphabet_size 
)

Allocates a new _PSIInternalPssmData structure.

Parameters
query_lengthnumber of columns for the PSSM [in]
alphabet_sizenumber of rows for the PSSM [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 425 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIInternalPssmDataFree(), calloc(), _PSIInternalPssmData::freq_ratios, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, _PSIInternalPssmData::pseudocounts, _PSIInternalPssmData::pssm, and _PSIInternalPssmData::scaled_pssm.

Referenced by BOOST_AUTO_TEST_CASE(), PSICreatePssmFromCDD(), PSICreatePssmFromFrequencyRatios(), PSICreatePssmWithDiagnostics(), s_ScalePosMatrix(), and s_TestCreatePssmFromFreqs().

◆ _PSIMsaFree()

_PSIMsa* _PSIMsaFree ( _PSIMsa msa)

Deallocates the _PSIMsa data structure.

Parameters
msamultiple sequence alignment data structure to deallocate [in]
Returns
NULL

Definition at line 389 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), _PSIMsa::cell, _PSIMsa::dimensions, NULL, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, _PSIMsa::query, PSIMsaDimensions::query_length, _PSIMsa::residue_counts, and sfree.

Referenced by _PSIMsaNew(), Deleter< _PSIMsa >::Delete(), and s_PSICreatePssmCleanUp().

◆ _PSIMsaNew()

_PSIMsa* _PSIMsaNew ( const _PSIPackedMsa packed_msa,
Uint4  alphabet_size 
)

Allocates and initializes the internal version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine.

Parameters
packed_msacompact multiple sequence alignment data structure [in]
alphabet_sizenumber of elements in the alphabet that makes up the aligned characters in the multiple sequence alignment [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 308 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIMsaFree(), _PSIPackedMsaGetNumberOfAlignedSeqs(), _PSIUpdatePositionCounts(), _PSIMsa::alphabet_size, ASSERT, calloc(), _PSIMsa::cell, _PSIPackedMsa::data, _PSIPackedMsa::dimensions, _PSIMsa::dimensions, _PSIMsaCell::extents, _PSIPackedMsaCell::is_aligned, _PSIMsaCell::is_aligned, IS_residue, kQueryIndex, SSeqRange::left, _PSIPackedMsaCell::letter, _PSIMsaCell::letter, malloc(), NULL, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, _PSIMsa::query, PSIMsaDimensions::query_length, _PSIMsa::residue_counts, SSeqRange::right, and _PSIPackedMsa::use_sequence.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIPackedMsaFree()

_PSIPackedMsa* _PSIPackedMsaFree ( _PSIPackedMsa msa)

Deallocates the _PSIMsa data structure.

Parameters
msamultiple sequence alignment data structure to deallocate [in]
Returns
NULL

Definition at line 183 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), _PSIPackedMsa::data, _PSIPackedMsa::dimensions, NULL, PSIMsaDimensions::num_seqs, sfree, and _PSIPackedMsa::use_sequence.

Referenced by _PSIPackedMsaNew(), Deleter< _PSIPackedMsa >::Delete(), PSICreatePssmWithDiagnostics(), and s_PSICreatePssmCleanUp().

◆ _PSIPackedMsaGetNumberOfAlignedSeqs()

unsigned int _PSIPackedMsaGetNumberOfAlignedSeqs ( const _PSIPackedMsa msa)

Retrieve the number of aligned sequences in the compact multiple sequence alignment.

Parameters
msamultiple sequence alignment data structure to deallocate [in]

Definition at line 209 of file blast_psi_priv.c.

References _PSIPackedMsa::dimensions, i, PSIMsaDimensions::num_seqs, and _PSIPackedMsa::use_sequence.

Referenced by _PSIMsaNew().

◆ _PSIPackedMsaNew()

_PSIPackedMsa* _PSIPackedMsaNew ( const PSIMsa msa)

Allocates and initializes the compact version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine.

Parameters
msamultiple sequence alignment data structure provided by the user [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 129 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIPackedMsaFree(), ASSERT, BLASTAA_SIZE, Boolean, calloc(), _PSIPackedMsa::data, _PSIPackedMsa::dimensions, _PSIPackedMsaCell::is_aligned, _PSIPackedMsaCell::letter, malloc(), NULL, TRUE, and _PSIPackedMsa::use_sequence.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIPurgeAlignedRegion()

int _PSIPurgeAlignedRegion ( _PSIPackedMsa msa,
unsigned int  seq_index,
unsigned int  start,
unsigned int  stop 
)

Marks the (start, stop] region corresponding to sequence seq_index in alignment so that it is not further considered for PSSM calculation.

Note that the query sequence cannot be purged.

Parameters
msamultiple sequence alignment data [in|out]
seq_indexindex of the sequence of interested in alignment [in]
startstart of the region to remove [in]
stopstop of the region to remove [in]
Returns
PSIERR_BADPARAM if no alignment is given, or if seq_index or stop are invalid, PSI_SUCCESS otherwise

Definition at line 2781 of file blast_psi_priv.c.

References _PSIPackedMsa::data, _PSIPackedMsa::dimensions, FALSE, i, _PSIPackedMsaCell::is_aligned, _PSIPackedMsaCell::letter, NULL, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIMsaDimensions::query_length, and s_PSIDiscardIfUnused().

Referenced by _handleNeitherAligned().

◆ _PSIPurgeBiasedSegments()

int _PSIPurgeBiasedSegments ( _PSIPackedMsa msa)

Main function for keeping only those selected sequences for PSSM construction (stage 2).

After this function the multiple sequence alignment data will not be modified.

See also
implementation of PSICreatePssmWithDiagnostics
Parameters
msamultiple sequence alignment data structure [in]
Returns
PSIERR_BADPARAM if alignment is NULL; PSI_SUCCESS otherwise

Definition at line 948 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, s_PSIPurgeNearIdenticalAlignments(), and s_PSIPurgeSelfHits().

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIResetAlignmentTraits()

static NCBI_INLINE void _PSIResetAlignmentTraits ( _PSIAlignmentTraits traits,
Uint4  position 
)
static

Resets the traits structure to restart finite state machine.

Parameters
traitsstructure to reset [in|out]
positionposition in the multiple sequence alignment to which the traits structure is initialized [in]

Definition at line 1067 of file blast_psi_priv.c.

References ASSERT, and _PSIAlignmentTraits::start.

Referenced by _handleEitherAlignedNeitherX(), and s_PSIPurgeSimilarAlignments().

◆ _PSISaveCDDiagnostics()

int _PSISaveCDDiagnostics ( const PSICdMsa msa,
const _PSISequenceWeights seq_weights,
const _PSIInternalPssmData internal_pssm,
PSIDiagnosticsResponse diagnostics 
)

Collects diagnostic information from the process of creating the CDD-based PSSM.

Parameters
cd_msamultiple alignment of CDs data structure [in]
seq_weightssequence weights data structure [in]
internal_pssmstructure containing PSSM's frequency ratios [in]
diagnosticsoutput parameter [out]
Returns
PSI_SUCCESS on success, PSIERR_OUTOFMEM if memory allocation fails or PSIERR_BADPARAM if any of its arguments is NULL

Definition at line 2912 of file blast_psi_priv.c.

References _PSICalculateInformationContentFromFreqRatios(), PSIDiagnosticsResponse::alphabet_size, ASSERT, PSICdMsa::dimensions, _PSIInternalPssmData::freq_ratios, PSIDiagnosticsResponse::frequency_ratios, PSIDiagnosticsResponse::independent_observations, _PSISequenceWeights::independent_observations, info, PSIDiagnosticsResponse::information_content, _PSISequenceWeights::match_weights, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, PSIDiagnosticsResponse::query_length, r(), sfree, _PSISequenceWeights::std_prob, and PSIDiagnosticsResponse::weighted_residue_freqs.

Referenced by PSICreatePssmFromCDD().

◆ _PSISaveDiagnostics()

int _PSISaveDiagnostics ( const _PSIMsa msa,
const _PSIAlignedBlock aligned_block,
const _PSISequenceWeights seq_weights,
const _PSIInternalPssmData internal_pssm,
PSIDiagnosticsResponse diagnostics 
)

Collects diagnostic information from the process of creating the PSSM.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blockaligned regions' extents [in]
seq_weightssequence weights data structure [in]
internal_pssmstructure containing PSSM's frequency ratios [in]
diagnosticsoutput parameter [out]
Returns
PSI_SUCCESS on success, PSIERR_OUTOFMEM if memory allocation fails or PSIERR_BADPARAM if any of its arguments is NULL

Definition at line 2809 of file blast_psi_priv.c.

References _PSICalculateInformationContentFromFreqRatios(), PSIDiagnosticsResponse::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIInternalPssmData::freq_ratios, PSIDiagnosticsResponse::frequency_ratios, PSIDiagnosticsResponse::gapless_column_weights, _PSISequenceWeights::gapless_column_weights, PSIDiagnosticsResponse::independent_observations, _PSISequenceWeights::independent_observations, info, PSIDiagnosticsResponse::information_content, PSIDiagnosticsResponse::interval_sizes, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, PSIDiagnosticsResponse::num_matching_seqs, _PSIMsa::num_matching_seqs, _PSIInternalPssmData::pseudocounts, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, PSIDiagnosticsResponse::query_length, r(), _PSIMsa::residue_counts, PSIDiagnosticsResponse::residue_freqs, sfree, PSIDiagnosticsResponse::sigma, _PSISequenceWeights::sigma, _PSIAlignedBlock::size, _PSISequenceWeights::std_prob, and PSIDiagnosticsResponse::weighted_residue_freqs.

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIScaleMatrix()

int _PSIScaleMatrix ( const Uint1 query,
const double *  std_probs,
_PSIInternalPssmData internal_pssm,
BlastScoreBlk sbp 
)

Scales the PSSM (stage 7)

Parameters
queryquery sequence in ncbistdaa encoding. The length of this sequence is read from internal_pssm->ncols [in]
std_probsarray containing the standard background residue probabilities [in]
internal_pssmPSSM being computed [in|out]
sbpscore block structure initialized for the scoring system used with the query sequence [in|out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSIERR_POSITIVEAVGSCORE if the average score of the generated PSSM is positive, PSI_SUCCESS otherwise

Definition at line 2481 of file blast_psi_priv.c.

References _PSIUpdateLambdaK(), ASSERT, BLAST_Nint(), BLAST_SCORE_MIN, FALSE, i, int, BlastScoreBlk::kbp_ideal, BlastScoreBlk::kbp_psi, kPositScalingNumIterations, kPositScalingPercent, kPSIScaleFactor, Blast_KarlinBlk::Lambda, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_POSITIVEAVGSCORE, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios(), and BOOST_AUTO_TEST_CASE().

◆ _PSISequenceLengthWithoutX()

Uint4 _PSISequenceLengthWithoutX ( const Uint1 seq,
Uint4  length 
)

Calculates the length of the sequence without including any 'X' residues.

used in kappa.c

Parameters
seqsequence to examine [in]
lengthlength of the sequence above [in]
Returns
number of non-X residues in the sequence

Definition at line 2629 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, and i.

Referenced by _PSIComputeScoreProbabilities().

◆ _PSISequenceWeightsFree()

_PSISequenceWeights* _PSISequenceWeightsFree ( _PSISequenceWeights seq_weights)

◆ _PSISequenceWeightsNew()

_PSISequenceWeights* _PSISequenceWeightsNew ( const PSIMsaDimensions dims,
const BlastScoreBlk sbp 
)

◆ _PSISpreadGapWeights()

static void _PSISpreadGapWeights ( const _PSIMsa msa,
_PSISequenceWeights seq_weights,
Boolean  nsg_compatibility_mode 
)
static

Uses disperse method of spreading the gap weights.

Parameters
msamultiple sequence alignment data structure [in]
seq_weightssequence weights data structure [in|out]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE [in]

Definition at line 1932 of file blast_psi_priv.c.

References _PSIMsa::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, kEpsilon, kQueryIndex, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, _PSIMsa::num_matching_seqs, PSIMsaDimensions::query_length, and _PSISequenceWeights::std_prob.

Referenced by _PSIComputeSequenceWeights().

◆ _PSIStructureGroupCustomization()

void _PSIStructureGroupCustomization ( _PSIMsa msa)

Enable NCBI structure group customization to discard the query sequence, as this really isn't the result of a PSI-BLAST iteration, but rather an artificial consensus sequence of the multiple sequence alignment constructed by them.

This should be called after _PSIPurgeBiasedSegments.

Definition at line 800 of file blast_psi_priv.c.

References _PSIUpdatePositionCounts(), _PSIMsa::cell, _PSIMsa::dimensions, FALSE, i, _PSIMsaCell::is_aligned, kQueryIndex, _PSIMsaCell::letter, and PSIMsaDimensions::query_length.

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIUpdateLambdaK()

void _PSIUpdateLambdaK ( const int **  pssm,
const Uint1 query,
Uint4  query_length,
const double *  std_probs,
BlastScoreBlk sbp 
)

Updates the Karlin-Altschul parameters based on the query sequence and PSSM's score frequencies.

Port of blastool.c's updateLambdaK

Parameters
pssmPSSM [in]
queryquery sequence in ncbistdaa encoding [in]
query_lengthlength of the query sequence above [in]
std_probsarray containing the standard background residue probabilities [in]
sbpScore block structure where the calculated lambda and K will be returned [in|out]

Definition at line 2732 of file blast_psi_priv.c.

References _PSIComputeScoreProbabilities(), ASSERT, Blast_KarlinBlkUngappedCalc(), Blast_ScoreFreqFree(), Blast_KarlinBlk::K, BlastScoreBlk::kbp_gap_psi, BlastScoreBlk::kbp_gap_std, BlastScoreBlk::kbp_ideal, BlastScoreBlk::kbp_psi, log, Blast_KarlinBlk::logK, and query.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ _PSIUpdatePositionCounts()

void _PSIUpdatePositionCounts ( _PSIMsa msa)

Counts the number of sequences matching the query per query position (columns of the multiple alignment) as well as the number of residues present in each position of the query.

Should be called after multiple alignment data has been purged from biased sequences.

Parameters
msamultiple sequence alignment structure [in|out]

Definition at line 991 of file blast_psi_priv.c.

References _PSIMsa::alphabet_size, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::is_aligned, _PSIMsaCell::letter, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, PSIMsaDimensions::query_length, and _PSIMsa::residue_counts.

Referenced by _PSIMsaNew(), and _PSIStructureGroupCustomization().

◆ _PSIValidateCdMSA()

int _PSIValidateCdMSA ( const PSICdMsa cd_msa,
Uint4  alphabet_size 
)

Validation of multiple alignment of conserved domains structure.

Parameters
cd_msamultiple alignment of CDs [in]
alphabet_sizealphabet size [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 862 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, PSICdMsaCell::data, PSICdMsa::dimensions, fabs, PSICdMsaCellData::iobsr, PSICdMsaCell::is_aligned, kEpsilon, PSICdMsa::msa, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_BADPROFILE, PSIERR_GAPINQUERY, PSICdMsa::query, PSIMsaDimensions::query_length, and PSICdMsaCellData::wfreqs.

Referenced by PSICreatePssmFromCDD().

◆ _PSIValidateMSA()

int _PSIValidateMSA ( const _PSIMsa msa,
Boolean  ignored_unaligned_positions 
)

Main validation function for multiple sequence alignment structure.

Should be called after _PSIPurgeBiasedSegments.

Parameters
msamultiple sequence alignment data structure [in]
ignored_unaligned_positionsdetermines whether the unaligned positions test should be performend or not [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 828 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, s_PSIValidateAlignedColumns(), s_PSIValidateNoFlankingGaps(), s_PSIValidateNoGapsInQuery(), and s_PSIValidateParticipatingSequences().

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIValidateMSA_StructureGroup()

int _PSIValidateMSA_StructureGroup ( const _PSIMsa msa)

Structure group validation function for multiple sequence alignment structure.

Should be called after _PSIStructureGroupCustomization.

Parameters
msamultiple sequence alignment data structure [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 811 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, and s_PSIValidateParticipatingSequences().

Referenced by PSICreatePssmWithDiagnostics().

◆ s_adjustColumnProbabilities()

static void s_adjustColumnProbabilities ( double *  initialProbabilities,
double *  probabilitiesToReturn,
double  standardWeight,
const double *  standardProbabilities,
double  observations 
)
static

adjust the probabilities by assigning observations weight to initialProbabilities and standardWeight to standardProbabilities copy of posit.c:adjustColumnProbabilities

Parameters
initialProbabilitiesstarting probabilities [in]
probabilitiesToReturnreturn value [out]
standardWeightsmall number of pseudocounts to avoid 0 probabilities [in]
standardProbabilitiesbackground probabilities [in]
observationsexpected number of observations [in]

Definition at line 3014 of file blast_psi_priv.c.

References EFFECTIVE_ALPHABET.

Referenced by s_columnSpecificPseudocounts().

◆ s_columnSpecificPseudocounts()

static double s_columnSpecificPseudocounts ( const _PSISequenceWeights posSearch,
int  columnNumber,
const double *  backgroundProbabilities,
const double  observations 
)
static

copy of posit.c:columnSpecificPseudocounts

Parameters
posSearchdata structure of sequence weights [in]
columnNumbercolumn in the PSSM [in]
backgroundProbabilitiesresidue background probs [in]
observationsfor each column an estimate of observed residues [in]

Definition at line 3086 of file blast_psi_priv.c.

References EFFECTIVE_ALPHABET, kPosEpsilon, PSEUDO_MAX, s_adjustColumnProbabilities(), s_computeRelativeEntropy(), and s_fillColumnProbabilities().

Referenced by _PSIComputeFreqRatios(), and _PSIComputeFreqRatiosFromCDs().

◆ s_computeRelativeEntropy()

static double s_computeRelativeEntropy ( const double *  newDistribution,
const double *  backgroundProbabilities 
)
static

compute relative entropy of first distribution to second distribution A copy of posit.c:computeRelativeEntropy

Parameters
newDistributionworking set [in]
backgroundProbabilitiesstandard set [in]

Definition at line 3045 of file blast_psi_priv.c.

References EFFECTIVE_ALPHABET, kPosEpsilon, and log.

Referenced by s_columnSpecificPseudocounts().

◆ s_effectiveObservations()

static double s_effectiveObservations ( const _PSIAlignedBlock align_blk,
const _PSISequenceWeights seq_weights,
int  columnNumber,
int  queryLength,
const double *  expno 
)
static

A method to estimate the effetive number of observations in the interval for the specified columnNumber copy of posit.c:effectiveObservations.

Parameters
align_blkdata structure describing the aligned blocks [in]
seq_weightsdata structure of sequence weights [in]
columnNumbercolumn in the PSSM [in]
queryLengthlength of the query sequence
expnotable of expectations [in]

Definition at line 3121 of file blast_psi_priv.c.

References ASSERT, EFFECTIVE_ALPHABET, i, SSeqRange::left, MAX, MAX_IND_OBSERVATIONS, MIN, _PSIAlignedBlock::pos_extnt, _PSISequenceWeights::posDistinctDistrib, _PSISequenceWeights::posNumParticipating, and SSeqRange::right.

Referenced by _PSIComputeFreqRatios().

◆ s_fillColumnProbabilities()

static void s_fillColumnProbabilities ( double *  probabilities,
const _PSISequenceWeights posSearch,
Int4  columnNumber 
)
static

Reorders in the same manner as returned by Blast_GetMatrixBackgroundFreq this function is a copy of posit.c:fillColumnProbabilities.

Definition at line 2972 of file blast_psi_priv.c.

References EFFECTIVE_ALPHABET, and _PSISequenceWeights::match_weights.

Referenced by s_columnSpecificPseudocounts().

◆ s_initializeExpNumObservations()

static void s_initializeExpNumObservations ( double *  expno,
const double *  backgroundProbabilities 
)
static

initialize the expected number of observations use background probabilities for this matrix Calculate exp.

# of distinct aa's as a function of independent trials copy of posit.c:initializeExpNumObservations

Parameters
expnotable of expectations [out]
backgroundProbabilitiesresidue background probs [in]

Definition at line 3068 of file blast_psi_priv.c.

References EFFECTIVE_ALPHABET, log, and MAX_IND_OBSERVATIONS.

Referenced by _PSIComputeFreqRatios().

◆ s_PSIComputeFrequenciesFromCDsCleanup()

static void s_PSIComputeFrequenciesFromCDsCleanup ( double *  sum_weights)
static

Definition at line 1643 of file blast_psi_priv.c.

References sfree.

Referenced by _PSIComputeFrequenciesFromCDs().

◆ s_PSIDiscardIfUnused()

static void s_PSIDiscardIfUnused ( _PSIPackedMsa msa,
unsigned int  seq_index 
)
static

◆ s_PSIPurgeNearIdenticalAlignments()

static void s_PSIPurgeNearIdenticalAlignments ( _PSIPackedMsa msa)
static

Keeps only one copy of any aligned sequences which are >kPSINearIdentical% identical to one another.

Parameters
msamultiple sequence alignment data structure [in]

Definition at line 973 of file blast_psi_priv.c.

References ASSERT, _PSIPackedMsa::dimensions, i, kPSINearIdentical, PSIMsaDimensions::num_seqs, and s_PSIPurgeSimilarAlignments().

Referenced by _PSIPurgeBiasedSegments().

◆ s_PSIPurgeSelfHits()

static void s_PSIPurgeSelfHits ( _PSIPackedMsa msa)
static

Remove those sequences which are identical to the query sequence.

Parameters
msamultiple sequence alignment data structure [in]

Definition at line 961 of file blast_psi_priv.c.

References ASSERT, _PSIPackedMsa::dimensions, kPSIIdentical, kQueryIndex, PSIMsaDimensions::num_seqs, and s_PSIPurgeSimilarAlignments().

Referenced by _PSIPurgeBiasedSegments().

◆ s_PSIPurgeSimilarAlignments()

static void s_PSIPurgeSimilarAlignments ( _PSIPackedMsa msa,
Uint4  seq_index1,
Uint4  seq_index2,
double  max_percent_identity 
)
static

This function compares the sequences in the msa->cell structure indexed by sequence_index1 and seq_index2.

If it finds aligned regions that have a greater percent identity than max_percent_identity, it removes the sequence identified by seq_index2. FIXME: needs more descriptive name

Parameters
msamultiple sequence alignment data structure [in]
seq_index1index of the sequence of interest [in]
seq_index2index of the sequence of interest [in]
max_percent_identitypercent identity needed to drop sequence identified by seq_index2 from the multiple sequence alignment data structure [in]

Definition at line 1186 of file blast_psi_priv.c.

References _handleBothAlignedSameResidueNoX(), _handleEitherAlignedEitherX(), _handleEitherAlignedNeitherX(), _handleNeitherAligned(), _PSIResetAlignmentTraits(), AMINOACID_TO_NCBISTDAA, _PSIPackedMsa::data, _PSIPackedMsa::dimensions, eCounting, FALSE, _PSIPackedMsaCell::is_aligned, kQueryIndex, _PSIPackedMsaCell::letter, PSIMsaDimensions::query_length, and _PSIPackedMsa::use_sequence.

Referenced by s_PSIPurgeNearIdenticalAlignments(), and s_PSIPurgeSelfHits().

◆ s_PSIValidateAlignedColumns()

static int s_PSIValidateAlignedColumns ( const _PSIMsa msa)
static

Validate that there are no unaligned columns or columns which only contain gaps in the multiple sequence alignment.

Note that this test is a bit redundant with s_PSIValidateNoGapsInQuery(), but it is left in here just in case the query sequence is manually disabled (normally it shouldn't, but we have seen cases where this is done).

Parameters
msamultiple sequence alignment data structure [in]
Returns
PSIERR_UNALIGNEDCOLUMN or PSIERR_COLUMNOFGAPS if validation fails, else PSI_SUCCESS

Definition at line 754 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, FALSE, _PSIMsaCell::is_aligned, kQueryIndex, _PSIMsaCell::letter, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_COLUMNOFGAPS, PSIERR_UNALIGNEDCOLUMN, PSIMsaDimensions::query_length, and TRUE.

Referenced by _PSIValidateMSA().

◆ s_PSIValidateNoFlankingGaps()

static int s_PSIValidateNoFlankingGaps ( const _PSIMsa msa)
static

Validate that there are no flanking gaps in the multiple sequence alignment.

Parameters
msamultiple sequence alignment data structure [in]
Returns
PSIERR_STARTINGGAP or PSIERR_ENDINGGAP if validation fails, else PSI_SUCCESS

Definition at line 704 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::is_aligned, _PSIMsaCell::letter, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_ENDINGGAP, PSIERR_STARTINGGAP, and PSIMsaDimensions::query_length.

Referenced by _PSIValidateMSA().

◆ s_PSIValidateNoGapsInQuery()

static int s_PSIValidateNoGapsInQuery ( const _PSIMsa msa)
static

Validate that there are no gaps in the query sequence.

Parameters
msamultiple sequence alignment data structure [in]
Returns
PSIERR_GAPINQUERY if validation fails, else PSI_SUCCESS

Definition at line 683 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, kQueryIndex, _PSIMsaCell::letter, PSI_SUCCESS, PSIERR_GAPINQUERY, _PSIMsa::query, and PSIMsaDimensions::query_length.

Referenced by _PSIValidateMSA().

◆ s_PSIValidateParticipatingSequences()

static int s_PSIValidateParticipatingSequences ( const _PSIMsa msa)
static

Verify that after purging biased sequences in multiple sequence alignment there are still sequences participating in the multiple sequences alignment.

Parameters
msamultiple sequence alignment structure [in]
Returns
PSIERR_NOALIGNEDSEQS if validation fails, else PSI_SUCCESS

Definition at line 793 of file blast_psi_priv.c.

References ASSERT, _PSIMsa::dimensions, PSIMsaDimensions::num_seqs, PSI_SUCCESS, and PSIERR_NOALIGNEDSEQS.

Referenced by _PSIValidateMSA(), and _PSIValidateMSA_StructureGroup().

Variable Documentation

◆ kEpsilon

const double kEpsilon = 0.0001

◆ kPosEpsilon

const double kPosEpsilon = 0.0001

minimum return value of s_computeRelativeEntropy

Definition at line 3038 of file blast_psi_priv.c.

Referenced by s_columnSpecificPseudocounts(), s_computeRelativeEntropy(), and s_GetPosBasedStartFreqRatios().

◆ kPositScalingNumIterations

const Uint4 kPositScalingNumIterations = 10

Constant used in scaling PSSM routines: Successor to POSIT_NUM_ITERATIONS.

Definition at line 61 of file blast_psi_priv.c.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kPositScalingPercent

const double kPositScalingPercent = 0.05

Constant used in scaling PSSM routines: Successor to POSIT_PERCENT.

Definition at line 60 of file blast_psi_priv.c.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kPSIIdentical

const double kPSIIdentical = 1.0

Percent identity threshold for discarding identical matches.

Definition at line 56 of file blast_psi_priv.c.

Referenced by s_PSIPurgeSelfHits().

◆ kPSINearIdentical

const double kPSINearIdentical = 0.94

Percent identity threshold for discarding near-identical matches.

Definition at line 55 of file blast_psi_priv.c.

Referenced by s_PSIPurgeNearIdenticalAlignments(), and CPssmInputTestData::SetupNearIdenticalHits().

◆ kPSIScaleFactor

const int kPSIScaleFactor = 200

Successor to POSIT_SCALE_FACTOR.

Definition at line 59 of file blast_psi_priv.c.

Referenced by _PSIConvertFreqRatiosToPSSM(), _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kQueryIndex

const unsigned int kQueryIndex = 0
Modified on Sat May 25 14:16:58 2024 by modify_doxy.py rev. 669887