NCBI C++ ToolKit
Classes | Macros | Typedefs | Functions | Variables
blast_psi_priv.h File Reference

Private interface for Position Iterated BLAST API, contains the PSSM generation engine. More...

#include <algo/blast/core/ncbi_std.h>
#include <algo/blast/core/blast_stat.h>
#include <algo/blast/core/blast_psi.h>
#include "matrix_freq_ratios.h"
+ Include dependency graph for blast_psi_priv.h:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

struct  _PSIPackedMsaCell
 Compact version of the PSIMsaCell structure. More...
 
struct  _PSIPackedMsa
 Compact version of PSIMsa structure. More...
 
struct  _PSIMsaCell
 Internal data structure to represent a position in the multiple sequence alignment data structure. More...
 
struct  _PSIMsa
 Internal multiple alignment data structure used by the PSSM engine. More...
 
struct  _PSIInternalPssmData
 Internal representation of a PSSM in various stages of its creation and its dimensions. More...
 
struct  _PSIAlignedBlock
 This structure keeps track of the regions aligned between the query sequence and those that were not purged. More...
 
struct  _PSISequenceWeights
 Internal data structure to keep computed sequence weights. More...
 

Macros

#define PSI_SUCCESS   (0)
 Successful operation. More...
 
#define PSIERR_BADPARAM   (-1)
 Bad parameter used in function. More...
 
#define PSIERR_OUTOFMEM   (-2)
 Out of memory. More...
 
#define PSIERR_BADSEQWEIGHTS   (-3)
 Sequence weights do not add to 1. More...
 
#define PSIERR_NOFREQRATIOS   (-4)
 No frequency ratios were found for the given scoring matrix. More...
 
#define PSIERR_POSITIVEAVGSCORE   (-5)
 Positive average score found when scaling matrix. More...
 
#define PSIERR_NOALIGNEDSEQS   (-6)
 After purge stage of PSSM creation, no sequences are left. More...
 
#define PSIERR_GAPINQUERY   (-7)
 GAP residue found in query sequence. More...
 
#define PSIERR_UNALIGNEDCOLUMN   (-8)
 Found an entire column with no participating sequences. More...
 
#define PSIERR_COLUMNOFGAPS   (-9)
 Found an entire column full of GAP residues. More...
 
#define PSIERR_STARTINGGAP   (-10)
 Found flanking gap at start of alignment. More...
 
#define PSIERR_ENDINGGAP   (-11)
 Found flanking gap at end of alignment. More...
 
#define PSIERR_BADPROFILE   (-12)
 Errors in conserved domain profile. More...
 
#define PSIERR_UNKNOWN   (-255)
 Unknown error. More...
 

Typedefs

typedef struct _PSIPackedMsaCell _PSIPackedMsaCell
 Compact version of the PSIMsaCell structure. More...
 
typedef struct _PSIPackedMsa _PSIPackedMsa
 Compact version of PSIMsa structure. More...
 
typedef struct _PSIMsaCell _PSIMsaCell
 Internal data structure to represent a position in the multiple sequence alignment data structure. More...
 
typedef struct _PSIMsa _PSIMsa
 Internal multiple alignment data structure used by the PSSM engine. More...
 
typedef struct _PSIInternalPssmData _PSIInternalPssmData
 Internal representation of a PSSM in various stages of its creation and its dimensions. More...
 
typedef struct _PSIAlignedBlock _PSIAlignedBlock
 This structure keeps track of the regions aligned between the query sequence and those that were not purged. More...
 
typedef struct _PSISequenceWeights _PSISequenceWeights
 Internal data structure to keep computed sequence weights. More...
 

Functions

void ** _PSIAllocateMatrix (unsigned int ncols, unsigned int nrows, unsigned int data_type_sz)
 Generic 2 dimensional matrix allocator. More...
 
void ** _PSIDeallocateMatrix (void **matrix, unsigned int ncols)
 Generic 2 dimensional matrix deallocator. More...
 
void _PSICopyMatrix_int (int **dest, int **src, unsigned int ncols, unsigned int nrows)
 Copies src matrix into dest matrix, both of which must be int matrices with dimensions ncols by nrows. More...
 
void _PSICopyMatrix_double (double **dest, double **src, unsigned int ncols, unsigned int nrows)
 Copies src matrix into dest matrix, both of which must be double matrices with dimensions ncols by nrows. More...
 
_PSIPackedMsa_PSIPackedMsaNew (const PSIMsa *msa)
 Allocates and initializes the compact version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine. More...
 
_PSIPackedMsa_PSIPackedMsaFree (_PSIPackedMsa *msa)
 Deallocates the _PSIMsa data structure. More...
 
unsigned int _PSIPackedMsaGetNumberOfAlignedSeqs (const _PSIPackedMsa *msa)
 Retrieve the number of aligned sequences in the compact multiple sequence alignment. More...
 
_PSIMsa_PSIMsaNew (const _PSIPackedMsa *packed_msa, Uint4 alphabet_size)
 Allocates and initializes the internal version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine. More...
 
_PSIMsa_PSIMsaFree (_PSIMsa *msa)
 Deallocates the _PSIMsa data structure. More...
 
_PSIInternalPssmData_PSIInternalPssmDataNew (Uint4 query_length, Uint4 alphabet_size)
 Allocates a new _PSIInternalPssmData structure. More...
 
_PSIInternalPssmData_PSIInternalPssmDataFree (_PSIInternalPssmData *pssm)
 Deallocates the _PSIInternalPssmData structure. More...
 
_PSIAlignedBlock_PSIAlignedBlockNew (Uint4 query_length)
 Allocates and initializes the _PSIAlignedBlock structure. More...
 
_PSIAlignedBlock_PSIAlignedBlockFree (_PSIAlignedBlock *aligned_blocks)
 Deallocates the _PSIAlignedBlock structure. More...
 
_PSISequenceWeights_PSISequenceWeightsNew (const PSIMsaDimensions *dims, const BlastScoreBlk *sbp)
 Allocates and initializes the _PSISequenceWeights structure. More...
 
_PSISequenceWeights_PSISequenceWeightsFree (_PSISequenceWeights *seq_weights)
 Deallocates the _PSISequenceWeights structure. More...
 
int _PSIPurgeBiasedSegments (_PSIPackedMsa *msa)
 Main function for keeping only those selected sequences for PSSM construction (stage 2). More...
 
int _PSIValidateMSA (const _PSIMsa *msa, Boolean ignored_unaligned_positions)
 Main validation function for multiple sequence alignment structure. More...
 
int _PSIComputeAlignmentBlocks (const _PSIMsa *msa, _PSIAlignedBlock *aligned_block)
 Main function to compute aligned blocks' properties for each position within multiple alignment (stage 3) Corresponds to posit.c:posComputeExtents. More...
 
int _PSIComputeSequenceWeights (const _PSIMsa *msa, const _PSIAlignedBlock *aligned_blocks, Boolean nsg_compatibility_mode, _PSISequenceWeights *seq_weights)
 Main function to calculate the sequence weights. More...
 
int _PSIComputeFrequenciesFromCDs (const PSICdMsa *cd_msa, BlastScoreBlk *sbp, const PSIBlastOptions *options, _PSISequenceWeights *seq_weights)
 Main function to calculate CD weights and combine weighted residue counts from matched CDs. More...
 
int _PSIComputeFreqRatios (const _PSIMsa *msa, const _PSISequenceWeights *seq_weights, const BlastScoreBlk *sbp, const _PSIAlignedBlock *aligned_blocks, Int4 pseudo_count, Boolean nsg_compatibility_mode, _PSIInternalPssmData *internal_pssm)
 Main function to compute the PSSM's frequency ratios (stage 5). More...
 
int _PSIComputeFreqRatiosFromCDs (const PSICdMsa *cd_msa, const _PSISequenceWeights *seq_weights, const BlastScoreBlk *sbp, Int4 pseudo_count, _PSIInternalPssmData *internal_pssm)
 Main function to compute CD-based PSSM's frequency ratios. More...
 
int _PSIConvertFreqRatiosToPSSM (_PSIInternalPssmData *internal_pssm, const Uint1 *query, const BlastScoreBlk *sbp, const double *std_probs)
 Converts the PSSM's frequency ratios obtained in the previous stage to a PSSM of scores. More...
 
int _PSIScaleMatrix (const Uint1 *query, const double *std_probs, _PSIInternalPssmData *internal_pssm, BlastScoreBlk *sbp)
 Scales the PSSM (stage 7) More...
 
void _PSIUpdateLambdaK (const int **pssm, const Uint1 *query, Uint4 query_length, const double *std_probs, BlastScoreBlk *sbp)
 Updates the Karlin-Altschul parameters based on the query sequence and PSSM's score frequencies. More...
 
int _IMPALAScaleMatrix (const Uint1 *query, const double *std_probs, _PSIInternalPssmData *internal_pssm, BlastScoreBlk *sbp, double scaling_factor)
 Provides a similar function to _PSIScaleMatrix but it performs the scaling as IMPALA did, i.e. More...
 
int _PSIPurgeAlignedRegion (_PSIPackedMsa *msa, unsigned int seq_index, unsigned int start, unsigned int stop)
 Marks the (start, stop] region corresponding to sequence seq_index in alignment so that it is not further considered for PSSM calculation. More...
 
void _PSIUpdatePositionCounts (_PSIMsa *msa)
 Counts the number of sequences matching the query per query position (columns of the multiple alignment) as well as the number of residues present in each position of the query. More...
 
Uint4 _PSISequenceLengthWithoutX (const Uint1 *seq, Uint4 length)
 Calculates the length of the sequence without including any 'X' residues. More...
 
Blast_ScoreFreq_PSIComputeScoreProbabilities (const int **pssm, const Uint1 *query, Uint4 query_length, const double *std_probs, const BlastScoreBlk *sbp)
 Compute the probabilities for each score in the PSSM. More...
 
int _PSISaveDiagnostics (const _PSIMsa *msa, const _PSIAlignedBlock *aligned_block, const _PSISequenceWeights *seq_weights, const _PSIInternalPssmData *internal_pssm, PSIDiagnosticsResponse *diagnostics)
 Collects diagnostic information from the process of creating the PSSM. More...
 
int _PSISaveCDDiagnostics (const PSICdMsa *msa, const _PSISequenceWeights *seq_weights, const _PSIInternalPssmData *internal_pssm, PSIDiagnosticsResponse *diagnostics)
 Collects diagnostic information from the process of creating the CDD-based PSSM. More...
 
double * _PSICalculateInformationContentFromScoreMatrix (Int4 **score_mat, const double *std_prob, const Uint1 *query, Uint4 query_length, Uint4 alphabet_sz, double lambda)
 Calculates the information content from the scoring matrix. More...
 
double * _PSICalculateInformationContentFromFreqRatios (double **freq_ratios, const double *std_prob, Uint4 query_length, Uint4 alphabet_sz)
 Calculates the information content from the residue frequencies calculated in stage 5 of the PSSM creation algorithm Corresponds to posit.c:posFreqsToInformation. More...
 
void _PSIStructureGroupCustomization (_PSIMsa *msa)
 Enable NCBI structure group customization to discard the query sequence, as this really isn't the result of a PSI-BLAST iteration, but rather an artificial consensus sequence of the multiple sequence alignment constructed by them. More...
 
int _PSIValidateMSA_StructureGroup (const _PSIMsa *msa)
 Structure group validation function for multiple sequence alignment structure. More...
 
int _PSIValidateCdMSA (const PSICdMsa *cd_msa, Uint4 alphabet_size)
 Validation of multiple alignment of conserved domains structure. More...
 

Variables

const double kPSINearIdentical
 Percent identity threshold for discarding near-identical matches. More...
 
const double kPSIIdentical
 Percent identity threshold for discarding identical matches. More...
 
const unsigned int kQueryIndex
 Index into multiple sequence alignment structure for the query sequence. More...
 
const double kEpsilon
 Small constant to test against 0. More...
 
const int kPSIScaleFactor
 Successor to POSIT_SCALE_FACTOR. More...
 
const double kPositScalingPercent
 Constant used in scaling PSSM routines: Successor to POSIT_PERCENT. More...
 
const Uint4 kPositScalingNumIterations
 Constant used in scaling PSSM routines: Successor to POSIT_NUM_ITERATIONS. More...
 

Detailed Description

Private interface for Position Iterated BLAST API, contains the PSSM generation engine.

Calculating PSSMs from Seq-aligns is a multi-stage process. These stages
include:
1) Processing the Seq-align
     Examine alignment and extract information about aligned characters,
     performed at the API level
2) Purge biased sequences: construct M multiple sequence alignment as
     described in page 3395[1] - performed at the core level; custom
     selection of sequences should be performed at the API level.
3) Compute extents of the alignment: M sub C as described in page 3395[1]
4) Compute sequence weights
5) Compute residue frequencies
6) Convert residue frequencies to PSSM
7) Scale the resulting PSSM

Definition in file blast_psi_priv.h.

Macro Definition Documentation

◆ PSI_SUCCESS

#define PSI_SUCCESS   (0)

Successful operation.

Definition at line 366 of file blast_psi_priv.h.

◆ PSIERR_BADPARAM

#define PSIERR_BADPARAM   (-1)

Bad parameter used in function.

Definition at line 368 of file blast_psi_priv.h.

◆ PSIERR_BADPROFILE

#define PSIERR_BADPROFILE   (-12)

Errors in conserved domain profile.

Definition at line 390 of file blast_psi_priv.h.

◆ PSIERR_BADSEQWEIGHTS

#define PSIERR_BADSEQWEIGHTS   (-3)

Sequence weights do not add to 1.

Definition at line 372 of file blast_psi_priv.h.

◆ PSIERR_COLUMNOFGAPS

#define PSIERR_COLUMNOFGAPS   (-9)

Found an entire column full of GAP residues.

Definition at line 384 of file blast_psi_priv.h.

◆ PSIERR_ENDINGGAP

#define PSIERR_ENDINGGAP   (-11)

Found flanking gap at end of alignment.

Definition at line 388 of file blast_psi_priv.h.

◆ PSIERR_GAPINQUERY

#define PSIERR_GAPINQUERY   (-7)

GAP residue found in query sequence.

Definition at line 380 of file blast_psi_priv.h.

◆ PSIERR_NOALIGNEDSEQS

#define PSIERR_NOALIGNEDSEQS   (-6)

After purge stage of PSSM creation, no sequences are left.

Definition at line 378 of file blast_psi_priv.h.

◆ PSIERR_NOFREQRATIOS

#define PSIERR_NOFREQRATIOS   (-4)

No frequency ratios were found for the given scoring matrix.

Definition at line 374 of file blast_psi_priv.h.

◆ PSIERR_OUTOFMEM

#define PSIERR_OUTOFMEM   (-2)

Out of memory.

Definition at line 370 of file blast_psi_priv.h.

◆ PSIERR_POSITIVEAVGSCORE

#define PSIERR_POSITIVEAVGSCORE   (-5)

Positive average score found when scaling matrix.

Definition at line 376 of file blast_psi_priv.h.

◆ PSIERR_STARTINGGAP

#define PSIERR_STARTINGGAP   (-10)

Found flanking gap at start of alignment.

Definition at line 386 of file blast_psi_priv.h.

◆ PSIERR_UNALIGNEDCOLUMN

#define PSIERR_UNALIGNEDCOLUMN   (-8)

Found an entire column with no participating sequences.

Definition at line 382 of file blast_psi_priv.h.

◆ PSIERR_UNKNOWN

#define PSIERR_UNKNOWN   (-255)

Unknown error.

Definition at line 392 of file blast_psi_priv.h.

Typedef Documentation

◆ _PSIAlignedBlock

This structure keeps track of the regions aligned between the query sequence and those that were not purged.

It is used when calculating the sequence weights (replaces posExtents in old code)

◆ _PSIInternalPssmData

Internal representation of a PSSM in various stages of its creation and its dimensions.

◆ _PSIMsa

typedef struct _PSIMsa _PSIMsa

Internal multiple alignment data structure used by the PSSM engine.

◆ _PSIMsaCell

typedef struct _PSIMsaCell _PSIMsaCell

Internal data structure to represent a position in the multiple sequence alignment data structure.

See also
_PSIMsa

◆ _PSIPackedMsa

typedef struct _PSIPackedMsa _PSIPackedMsa

Compact version of PSIMsa structure.

◆ _PSIPackedMsaCell

Compact version of the PSIMsaCell structure.

◆ _PSISequenceWeights

Internal data structure to keep computed sequence weights.

Function Documentation

◆ _IMPALAScaleMatrix()

int _IMPALAScaleMatrix ( const Uint1 query,
const double *  std_probs,
_PSIInternalPssmData internal_pssm,
BlastScoreBlk sbp,
double  scaling_factor 
)

Provides a similar function to _PSIScaleMatrix but it performs the scaling as IMPALA did, i.e.

: allowing the specification of a scaling factor and when calculating the score probabilities, the query length includes 'X' residues.

Definition at line 2599 of file blast_psi_priv.c.

References _PSICopyMatrix_int(), _PSIInternalPssmData::freq_ratios, Kappa_compactSearchItemsFree(), Kappa_compactSearchItemsNew(), Kappa_impalaScaling(), Kappa_posSearchItemsFree(), Kappa_posSearchItemsNew(), BlastScoreBlk::name, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, PSI_SUCCESS, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios().

◆ _PSIAlignedBlockFree()

_PSIAlignedBlock* _PSIAlignedBlockFree ( _PSIAlignedBlock aligned_blocks)

Deallocates the _PSIAlignedBlock structure.

Parameters
aligned_blocksdata structure to deallocate [in]
Returns
NULL

Definition at line 532 of file blast_psi_priv.c.

References NULL, _PSIAlignedBlock::pos_extnt, sfree, and _PSIAlignedBlock::size.

Referenced by _PSIAlignedBlockNew(), Deleter< _PSIAlignedBlock >::Delete(), and s_PSICreatePssmCleanUp().

◆ _PSIAlignedBlockNew()

_PSIAlignedBlock* _PSIAlignedBlockNew ( Uint4  query_length)

Allocates and initializes the _PSIAlignedBlock structure.

Parameters
query_lengthlength of the query sequence of the multiple sequence alignment [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 501 of file blast_psi_priv.c.

References _PSIAlignedBlockFree(), calloc(), i, SSeqRange::left, malloc(), NULL, _PSIAlignedBlock::pos_extnt, SSeqRange::right, and _PSIAlignedBlock::size.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIAllocateMatrix()

void** _PSIAllocateMatrix ( unsigned int  ncols,
unsigned int  nrows,
unsigned int  data_type_sz 
)

Generic 2 dimensional matrix allocator.

Allocates a ncols by nrows matrix with cells of size data_type_sz. Must be freed using x_DeallocateMatrix

Parameters
ncolsnumber of columns in matrix [in]
nrowsnumber of rows in matrix [in]
data_type_szsize of the data type (in bytes) to allocate for each element in the matrix [in]
Returns
pointer to allocated memory or NULL in case of failure

Definition at line 66 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), calloc(), i, malloc(), and NULL.

Referenced by _PSIInternalPssmDataNew(), _PSIMatrixFrequencyRatiosNew(), _PSIMsaNew(), _PSIPackedMsaNew(), _PSISequenceWeightsNew(), Kappa_posSearchItemsNew(), CRedoAlignmentTestFixture::loadPssmFromFile(), PSIDiagnosticsResponseNew(), PSIMatrixNew(), PSIMsaNew(), RPSRescalePssm(), s_RPSComputeTraceback(), s_RPSFillFreqRatiosInPsiMatrix(), SBlastScoreMatrixNew(), and SPsiBlastScoreMatrixNew().

◆ _PSICalculateInformationContentFromFreqRatios()

double* _PSICalculateInformationContentFromFreqRatios ( double **  freq_ratios,
const double *  std_prob,
Uint4  query_length,
Uint4  alphabet_sz 
)

Calculates the information content from the residue frequencies calculated in stage 5 of the PSSM creation algorithm Corresponds to posit.c:posFreqsToInformation.

See also
_PSIComputeFreqRatios: stage 5
Parameters
freq_ratiosmatrix of frequency ratios (dimensions: query_length x alphabet_sz) (const) [in]
std_probstandard residue probabilities [in]
query_lengthlength of the query [in]
alphabet_szlength of the alphabet used by the query [in]
Returns
array of length query_length containing the information content per query position or NULL on error (e.g.: out-of-memory or NULL parameters)

Definition at line 2341 of file blast_psi_priv.c.

References calloc(), kEpsilon, log, NCBIMATH_LN2, NULL, and r().

Referenced by _PSISaveCDDiagnostics(), and _PSISaveDiagnostics().

◆ _PSICalculateInformationContentFromScoreMatrix()

double* _PSICalculateInformationContentFromScoreMatrix ( Int4 **  score_mat,
const double *  std_prob,
const Uint1 query,
Uint4  query_length,
Uint4  alphabet_sz,
double  lambda 
)

Calculates the information content from the scoring matrix.

Parameters
score_matalphabet by alphabet_sz matrix of scores (const) [in]
std_probstandard residue probabilities [in]
queryquery sequence [in]
query_lengthlength of the query [in]
alphabet_szlength of the alphabet used by the query [in]
lambdalambda parameter [in] FIXME documentation
Returns
array of length query_length containing the information content per query position or NULL on error (e.g.: out-of-memory or NULL parameters)

Definition at line 2299 of file blast_psi_priv.c.

References calloc(), kEpsilon, lambda(), log, NCBIMATH_LN2, NULL, query, r(), and tmp.

◆ _PSIComputeAlignmentBlocks()

int _PSIComputeAlignmentBlocks ( const _PSIMsa msa,
_PSIAlignedBlock aligned_block 
)

Main function to compute aligned blocks' properties for each position within multiple alignment (stage 3) Corresponds to posit.c:posComputeExtents.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blockdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [out]
Returns
PSIERR_BADPARAM if arguments are NULL PSI_SUCCESS otherwise

Definition at line 1297 of file blast_psi_priv.c.

References _PSIComputeAlignedRegionLengths(), _PSIComputePositionExtents(), _PSIGetLeftExtents(), _PSIGetRightExtents(), _PSIMsa::dimensions, kQueryIndex, PSIMsaDimensions::num_seqs, PSI_SUCCESS, and PSIERR_BADPARAM.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIComputeFreqRatios()

int _PSIComputeFreqRatios ( const _PSIMsa msa,
const _PSISequenceWeights seq_weights,
const BlastScoreBlk sbp,
const _PSIAlignedBlock aligned_blocks,
Int4  pseudo_count,
Boolean  nsg_compatibility_mode,
_PSIInternalPssmData internal_pssm 
)

Main function to compute the PSSM's frequency ratios (stage 5).

Implements formula 2 in Nucleic Acids Research, 2001, Vol 29, No 14. Corresponds to posit.c:posComputePseudoFreqs

Parameters
msamultiple sequence alignment data structure [in]
seq_weightsdata structure containing the data needed to compute the sequence weights [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
aligned_blocksdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [in]
pseudo_countpseudo count constant [in]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE
internal_pssmPSSM being computed [out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSI_SUCCESS otherwise

Definition at line 2071 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, _PSIMsa::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetMatrixBackgroundFreq(), BLAST_SCORE_MIN, _PSIMsa::cell, SBlastScoreMatrix::data, SFreqRatios::data, _PSIMsa::dimensions, _PSIInternalPssmData::freq_ratios, i, _PSISequenceWeights::independent_observations, kEpsilon, kQueryIndex, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, BlastScoreBlk::matrix, MAX_IND_OBSERVATIONS, BlastScoreBlk::name, NULL, PSEUDO_MAX, _PSIInternalPssmData::pseudocounts, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_UNKNOWN, PSIMsaDimensions::query_length, r(), s_columnSpecificPseudocounts(), s_effectiveObservations(), s_initializeExpNumObservations(), and _PSISequenceWeights::std_prob.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIComputeFreqRatiosFromCDs()

int _PSIComputeFreqRatiosFromCDs ( const PSICdMsa cd_msa,
const _PSISequenceWeights seq_weights,
const BlastScoreBlk sbp,
Int4  pseudo_count,
_PSIInternalPssmData internal_pssm 
)

Main function to compute CD-based PSSM's frequency ratios.

Parameters
cd_msamultiple alignment of CDs [in]
seq_weightscontains weighted residue frequencies and effective number of observations [in]
sbpinitialized score block data structure [in]
pseudo_countpseudo count constant [in]
internal_pssmPSSM [out]
Returns
status

Definition at line 2185 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetMatrixBackgroundFreq(), BLAST_SCORE_MIN, SBlastScoreMatrix::data, SFreqRatios::data, PSICdMsa::dimensions, _PSIInternalPssmData::freq_ratios, i, _PSISequenceWeights::independent_observations, kEpsilon, _PSISequenceWeights::match_weights, BlastScoreBlk::matrix, MAX, BlastScoreBlk::name, NULL, PSEUDO_MAX, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSICdMsa::query, PSIMsaDimensions::query_length, r(), s_columnSpecificPseudocounts(), and _PSISequenceWeights::std_prob.

Referenced by BOOST_AUTO_TEST_CASE(), PSICreatePssmFromCDD(), and s_TestCreatePssmFromFreqs().

◆ _PSIComputeFrequenciesFromCDs()

int _PSIComputeFrequenciesFromCDs ( const PSICdMsa cd_msa,
BlastScoreBlk sbp,
const PSIBlastOptions options,
_PSISequenceWeights seq_weights 
)

Main function to calculate CD weights and combine weighted residue counts from matched CDs.

Parameters
cd_msamultiple alignment of conserved domains data structure [in]
sbpBLAST score block [in]
optionsCDD-related options [in]
seq_weightsdata structure with CD frequencies [out]

Definition at line 1651 of file blast_psi_priv.c.

References BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, PSICdMsaCell::data, PSICdMsa::dimensions, fabs, _PSISequenceWeights::independent_observations, PSICdMsaCellData::iobsr, PSICdMsaCell::is_aligned, malloc(), _PSISequenceWeights::match_weights, MIN, PSICdMsa::msa, NULL, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSICdMsa::query, PSIMsaDimensions::query_length, s_PSIComputeFrequenciesFromCDsCleanup(), and PSICdMsaCellData::wfreqs.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmFromCDD().

◆ _PSIComputeScoreProbabilities()

Blast_ScoreFreq* _PSIComputeScoreProbabilities ( const int **  pssm,
const Uint1 query,
Uint4  query_length,
const double *  std_probs,
const BlastScoreBlk sbp 
)

Compute the probabilities for each score in the PSSM.

This is only valid for protein sequences. FIXME: Should this be moved to blast_stat.[hc]? used in kappa.c in notposfillSfp()

Parameters
pssmPSSM for which to compute the score probabilities [in]
queryquery sequence for the PSSM above in ncbistdaa encoding [in]
query_lengthlength of the query sequence above [in]
std_probsarray containing the standard background residue probabilities [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
Returns
structure containing the score frequencies, or NULL in case of error

Definition at line 2647 of file blast_psi_priv.c.

References _PSISequenceLengthWithoutX(), BlastScoreBlk::alphabet_code, AMINOACID_TO_NCBISTDAA, ASSERT, Blast_GetStdAlphabet(), BLAST_SCORE_MAX, BLAST_SCORE_MIN, Blast_ScoreFreqNew(), BLASTAA_SEQ_CODE, BLASTAA_SIZE, kScore, MAX, MIN, NULL, Blast_ScoreFreq::obs_max, Blast_ScoreFreq::obs_min, query, r(), Blast_ScoreFreq::score_avg, and Blast_ScoreFreq::sprob.

Referenced by _PSIUpdateLambdaK().

◆ _PSIComputeSequenceWeights()

int _PSIComputeSequenceWeights ( const _PSIMsa msa,
const _PSIAlignedBlock aligned_blocks,
Boolean  nsg_compatibility_mode,
_PSISequenceWeights seq_weights 
)

Main function to calculate the sequence weights.

Should be called with the return value of PSIComputeAlignmentBlocks (stage 4) Corresponds to posit.c:posComputeSequenceWeights

Parameters
msamultiple sequence alignment data structure [in]
aligned_blocksdata structure describing the aligned blocks' properties for each position of the multiple sequence alignment [in]
nsg_compatibility_modeset to true to emulate the structure group's use of PSSM engine in the cddumper application. By default should be FALSE [in]
seq_weightsdata structure containing the data needed to compute the sequence weights [out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSIERR_OUTOFMEM in case of memory allocation failure, PSIERR_BADSEQWEIGHTS if the sequence weights fail to add up to 1.0, PSI_SUCCESS otherwise

Definition at line 1552 of file blast_psi_priv.c.

References _PSICalculateMatchWeights(), _PSICalculateNormalizedSequenceWeights(), _PSICheckSequenceWeights(), _PSIGetAlignedSequencesForPosition(), _PSISpreadGapWeights(), ASSERT, _PSIMsa::dimensions, DynamicUint4Array_AreEqual(), DynamicUint4Array_Copy(), DynamicUint4Array_Dup(), DynamicUint4ArrayFree(), DynamicUint4ArrayNewEx(), EFFECTIVE_ALPHABET, _PSISequenceWeights::norm_seq_weights, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, SDynamicUint4Array::num_used, _PSISequenceWeights::posDistinctDistrib, _PSISequenceWeights::posNumParticipating, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, _PSISequenceWeights::row_sigma, _PSISequenceWeights::sigma, and _PSIAlignedBlock::size.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIConvertFreqRatiosToPSSM()

int _PSIConvertFreqRatiosToPSSM ( _PSIInternalPssmData internal_pssm,
const Uint1 query,
const BlastScoreBlk sbp,
const double *  std_probs 
)

Converts the PSSM's frequency ratios obtained in the previous stage to a PSSM of scores.

(stage 6)

Parameters
internal_pssmPSSM being computed [in|out]
queryquery sequence in ncbistdaa encoding. The length of this sequence is read from internal_pssm->ncols [in]
sbpscore block structure initialized for the scoring system used with the query sequence [in]
std_probsarray containing the standard residue probabilities [in]
Returns
PSIERR_BADPARAM if arguments are NULL, PSI_SUCCESS otherwise

Definition at line 2392 of file blast_psi_priv.c.

References _PSIMatrixFrequencyRatiosFree(), _PSIMatrixFrequencyRatiosNew(), BlastScoreBlk::alphabet_size, AMINOACID_TO_NCBISTDAA, SFreqRatios::bit_scale_factor, BLAST_Nint(), BLAST_SCORE_MIN, SBlastScoreMatrix::data, SFreqRatios::data, FALSE, _PSIInternalPssmData::freq_ratios, i, int, BlastScoreBlk::kbp_ideal, kEpsilon, kPSIScaleFactor, Blast_KarlinBlk::Lambda, log, BlastScoreBlk::matrix, BlastScoreBlk::name, NCBIMATH_LN2, _PSIInternalPssmData::ncols, NULL, PSI_SUCCESS, PSIERR_BADPARAM, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, tmp, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios(), BOOST_AUTO_TEST_CASE(), s_ScalePosMatrix(), and s_TestCreatePssmFromFreqs().

◆ _PSICopyMatrix_double()

void _PSICopyMatrix_double ( double **  dest,
double **  src,
unsigned int  ncols,
unsigned int  nrows 
)

Copies src matrix into dest matrix, both of which must be double matrices with dimensions ncols by nrows.

Parameters
destDestination matrix [out]
srcSource matrix [in]
ncolsNumber of columns to copy [in]
nrowsNumber of rows to copy [in]

Definition at line 124 of file blast_psi_priv.c.

Referenced by PSICreatePssmFromFrequencyRatios(), and s_ScalePosMatrix().

◆ _PSICopyMatrix_int()

void _PSICopyMatrix_int ( int **  dest,
int **  src,
unsigned int  ncols,
unsigned int  nrows 
)

Copies src matrix into dest matrix, both of which must be int matrices with dimensions ncols by nrows.

Parameters
destDestination matrix [out]
srcSource matrix [in]
ncolsNumber of columns to copy [in]
nrowsNumber of rows to copy [in]

Definition at line 123 of file blast_psi_priv.c.

Referenced by _IMPALAScaleMatrix(), s_PSISavePssm(), s_ScalePosMatrix(), and CRedoAlignmentTestFixture::setupPositionBasedBlastScoreBlk().

◆ _PSIDeallocateMatrix()

void** _PSIDeallocateMatrix ( void **  matrix,
unsigned int  ncols 
)

Generic 2 dimensional matrix deallocator.

Deallocates the memory allocated by x_AllocateMatrix

Parameters
matrixmatrix to deallocate [in]
ncolsnumber of columns in the matrix [in]
Returns
NULL

Definition at line 88 of file blast_psi_priv.c.

References i, NULL, and sfree.

Referenced by _PSIAllocateMatrix(), _PSIInternalPssmDataFree(), _PSIMatrixFrequencyRatiosFree(), _PSIMsaFree(), _PSIPackedMsaFree(), _PSISequenceWeightsFree(), Kappa_posSearchItemsFree(), CRedoAlignmentTestFixture::loadPssmFromFile(), PSIDiagnosticsResponseFree(), PSIMatrixFree(), PSIMsaFree(), s_RPSComputeTraceback(), SBlastScoreMatrixFree(), CRedoAlignmentTestFixture::setupPositionBasedBlastScoreBlk(), and SPsiBlastScoreMatrixFree().

◆ _PSIInternalPssmDataFree()

_PSIInternalPssmData* _PSIInternalPssmDataFree ( _PSIInternalPssmData pssm)

◆ _PSIInternalPssmDataNew()

_PSIInternalPssmData* _PSIInternalPssmDataNew ( Uint4  query_length,
Uint4  alphabet_size 
)

Allocates a new _PSIInternalPssmData structure.

Parameters
query_lengthnumber of columns for the PSSM [in]
alphabet_sizenumber of rows for the PSSM [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 425 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIInternalPssmDataFree(), calloc(), _PSIInternalPssmData::freq_ratios, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, _PSIInternalPssmData::pseudocounts, _PSIInternalPssmData::pssm, and _PSIInternalPssmData::scaled_pssm.

Referenced by BOOST_AUTO_TEST_CASE(), PSICreatePssmFromCDD(), PSICreatePssmFromFrequencyRatios(), PSICreatePssmWithDiagnostics(), s_ScalePosMatrix(), and s_TestCreatePssmFromFreqs().

◆ _PSIMsaFree()

_PSIMsa* _PSIMsaFree ( _PSIMsa msa)

Deallocates the _PSIMsa data structure.

Parameters
msamultiple sequence alignment data structure to deallocate [in]
Returns
NULL

Definition at line 389 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), _PSIMsa::cell, _PSIMsa::dimensions, NULL, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, _PSIMsa::query, PSIMsaDimensions::query_length, _PSIMsa::residue_counts, and sfree.

Referenced by _PSIMsaNew(), Deleter< _PSIMsa >::Delete(), and s_PSICreatePssmCleanUp().

◆ _PSIMsaNew()

_PSIMsa* _PSIMsaNew ( const _PSIPackedMsa packed_msa,
Uint4  alphabet_size 
)

Allocates and initializes the internal version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine.

Parameters
packed_msacompact multiple sequence alignment data structure [in]
alphabet_sizenumber of elements in the alphabet that makes up the aligned characters in the multiple sequence alignment [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 308 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIMsaFree(), _PSIPackedMsaGetNumberOfAlignedSeqs(), _PSIUpdatePositionCounts(), _PSIMsa::alphabet_size, ASSERT, calloc(), _PSIMsa::cell, _PSIPackedMsa::data, _PSIPackedMsa::dimensions, _PSIMsa::dimensions, _PSIMsaCell::extents, _PSIPackedMsaCell::is_aligned, _PSIMsaCell::is_aligned, IS_residue, kQueryIndex, SSeqRange::left, _PSIPackedMsaCell::letter, _PSIMsaCell::letter, malloc(), NULL, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, _PSIMsa::query, PSIMsaDimensions::query_length, _PSIMsa::residue_counts, SSeqRange::right, and _PSIPackedMsa::use_sequence.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIPackedMsaFree()

_PSIPackedMsa* _PSIPackedMsaFree ( _PSIPackedMsa msa)

Deallocates the _PSIMsa data structure.

Parameters
msamultiple sequence alignment data structure to deallocate [in]
Returns
NULL

Definition at line 183 of file blast_psi_priv.c.

References _PSIDeallocateMatrix(), _PSIPackedMsa::data, _PSIPackedMsa::dimensions, NULL, PSIMsaDimensions::num_seqs, sfree, and _PSIPackedMsa::use_sequence.

Referenced by _PSIPackedMsaNew(), Deleter< _PSIPackedMsa >::Delete(), PSICreatePssmWithDiagnostics(), and s_PSICreatePssmCleanUp().

◆ _PSIPackedMsaGetNumberOfAlignedSeqs()

unsigned int _PSIPackedMsaGetNumberOfAlignedSeqs ( const _PSIPackedMsa msa)

Retrieve the number of aligned sequences in the compact multiple sequence alignment.

Parameters
msamultiple sequence alignment data structure to deallocate [in]

Definition at line 209 of file blast_psi_priv.c.

References _PSIPackedMsa::dimensions, i, PSIMsaDimensions::num_seqs, and _PSIPackedMsa::use_sequence.

Referenced by _PSIMsaNew().

◆ _PSIPackedMsaNew()

_PSIPackedMsa* _PSIPackedMsaNew ( const PSIMsa msa)

Allocates and initializes the compact version of the PSIMsa structure (makes a deep copy) for internal use by the PSSM engine.

Parameters
msamultiple sequence alignment data structure provided by the user [in]
Returns
newly allocated structure or NULL in case of memory allocation failure

Definition at line 129 of file blast_psi_priv.c.

References _PSIAllocateMatrix(), _PSIPackedMsaFree(), ASSERT, BLASTAA_SIZE, Boolean, calloc(), _PSIPackedMsa::data, _PSIPackedMsa::dimensions, _PSIPackedMsaCell::is_aligned, _PSIPackedMsaCell::letter, malloc(), NULL, TRUE, and _PSIPackedMsa::use_sequence.

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSIPurgeAlignedRegion()

int _PSIPurgeAlignedRegion ( _PSIPackedMsa msa,
unsigned int  seq_index,
unsigned int  start,
unsigned int  stop 
)

Marks the (start, stop] region corresponding to sequence seq_index in alignment so that it is not further considered for PSSM calculation.

Note that the query sequence cannot be purged.

Parameters
msamultiple sequence alignment data [in|out]
seq_indexindex of the sequence of interested in alignment [in]
startstart of the region to remove [in]
stopstop of the region to remove [in]
Returns
PSIERR_BADPARAM if no alignment is given, or if seq_index or stop are invalid, PSI_SUCCESS otherwise

Definition at line 2781 of file blast_psi_priv.c.

References _PSIPackedMsa::data, _PSIPackedMsa::dimensions, FALSE, i, _PSIPackedMsaCell::is_aligned, _PSIPackedMsaCell::letter, NULL, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIMsaDimensions::query_length, and s_PSIDiscardIfUnused().

Referenced by _handleNeitherAligned().

◆ _PSIPurgeBiasedSegments()

int _PSIPurgeBiasedSegments ( _PSIPackedMsa msa)

Main function for keeping only those selected sequences for PSSM construction (stage 2).

After this function the multiple sequence alignment data will not be modified.

See also
implementation of PSICreatePssmWithDiagnostics
Parameters
msamultiple sequence alignment data structure [in]
Returns
PSIERR_BADPARAM if alignment is NULL; PSI_SUCCESS otherwise

Definition at line 948 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, s_PSIPurgeNearIdenticalAlignments(), and s_PSIPurgeSelfHits().

Referenced by BOOST_AUTO_TEST_CASE(), and PSICreatePssmWithDiagnostics().

◆ _PSISaveCDDiagnostics()

int _PSISaveCDDiagnostics ( const PSICdMsa msa,
const _PSISequenceWeights seq_weights,
const _PSIInternalPssmData internal_pssm,
PSIDiagnosticsResponse diagnostics 
)

Collects diagnostic information from the process of creating the CDD-based PSSM.

Parameters
cd_msamultiple alignment of CDs data structure [in]
seq_weightssequence weights data structure [in]
internal_pssmstructure containing PSSM's frequency ratios [in]
diagnosticsoutput parameter [out]
Returns
PSI_SUCCESS on success, PSIERR_OUTOFMEM if memory allocation fails or PSIERR_BADPARAM if any of its arguments is NULL

Definition at line 2912 of file blast_psi_priv.c.

References _PSICalculateInformationContentFromFreqRatios(), PSIDiagnosticsResponse::alphabet_size, ASSERT, PSICdMsa::dimensions, _PSIInternalPssmData::freq_ratios, PSIDiagnosticsResponse::frequency_ratios, PSIDiagnosticsResponse::independent_observations, _PSISequenceWeights::independent_observations, info, PSIDiagnosticsResponse::information_content, _PSISequenceWeights::match_weights, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, PSIDiagnosticsResponse::query_length, r(), sfree, _PSISequenceWeights::std_prob, and PSIDiagnosticsResponse::weighted_residue_freqs.

Referenced by PSICreatePssmFromCDD().

◆ _PSISaveDiagnostics()

int _PSISaveDiagnostics ( const _PSIMsa msa,
const _PSIAlignedBlock aligned_block,
const _PSISequenceWeights seq_weights,
const _PSIInternalPssmData internal_pssm,
PSIDiagnosticsResponse diagnostics 
)

Collects diagnostic information from the process of creating the PSSM.

Parameters
msamultiple sequence alignment data structure [in]
aligned_blockaligned regions' extents [in]
seq_weightssequence weights data structure [in]
internal_pssmstructure containing PSSM's frequency ratios [in]
diagnosticsoutput parameter [out]
Returns
PSI_SUCCESS on success, PSIERR_OUTOFMEM if memory allocation fails or PSIERR_BADPARAM if any of its arguments is NULL

Definition at line 2809 of file blast_psi_priv.c.

References _PSICalculateInformationContentFromFreqRatios(), PSIDiagnosticsResponse::alphabet_size, AMINOACID_TO_NCBISTDAA, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIInternalPssmData::freq_ratios, PSIDiagnosticsResponse::frequency_ratios, PSIDiagnosticsResponse::gapless_column_weights, _PSISequenceWeights::gapless_column_weights, PSIDiagnosticsResponse::independent_observations, _PSISequenceWeights::independent_observations, info, PSIDiagnosticsResponse::information_content, PSIDiagnosticsResponse::interval_sizes, _PSIMsaCell::letter, _PSISequenceWeights::match_weights, PSIDiagnosticsResponse::num_matching_seqs, _PSIMsa::num_matching_seqs, _PSIInternalPssmData::pseudocounts, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_OUTOFMEM, PSIMsaDimensions::query_length, PSIDiagnosticsResponse::query_length, r(), _PSIMsa::residue_counts, PSIDiagnosticsResponse::residue_freqs, sfree, PSIDiagnosticsResponse::sigma, _PSISequenceWeights::sigma, _PSIAlignedBlock::size, _PSISequenceWeights::std_prob, and PSIDiagnosticsResponse::weighted_residue_freqs.

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIScaleMatrix()

int _PSIScaleMatrix ( const Uint1 query,
const double *  std_probs,
_PSIInternalPssmData internal_pssm,
BlastScoreBlk sbp 
)

Scales the PSSM (stage 7)

Parameters
queryquery sequence in ncbistdaa encoding. The length of this sequence is read from internal_pssm->ncols [in]
std_probsarray containing the standard background residue probabilities [in]
internal_pssmPSSM being computed [in|out]
sbpscore block structure initialized for the scoring system used with the query sequence [in|out]
Returns
PSIERR_BADPARAM if arguments are NULL, PSIERR_POSITIVEAVGSCORE if the average score of the generated PSSM is positive, PSI_SUCCESS otherwise

Definition at line 2481 of file blast_psi_priv.c.

References _PSIUpdateLambdaK(), ASSERT, BLAST_Nint(), BLAST_SCORE_MIN, FALSE, i, int, BlastScoreBlk::kbp_ideal, BlastScoreBlk::kbp_psi, kPositScalingNumIterations, kPositScalingPercent, kPSIScaleFactor, Blast_KarlinBlk::Lambda, _PSIInternalPssmData::ncols, _PSIInternalPssmData::nrows, NULL, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_POSITIVEAVGSCORE, _PSIInternalPssmData::pssm, query, _PSIInternalPssmData::scaled_pssm, and TRUE.

Referenced by _PSICreateAndScalePssmFromFrequencyRatios(), and BOOST_AUTO_TEST_CASE().

◆ _PSISequenceLengthWithoutX()

Uint4 _PSISequenceLengthWithoutX ( const Uint1 seq,
Uint4  length 
)

Calculates the length of the sequence without including any 'X' residues.

used in kappa.c

Parameters
seqsequence to examine [in]
lengthlength of the sequence above [in]
Returns
number of non-X residues in the sequence

Definition at line 2629 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, ASSERT, and i.

Referenced by _PSIComputeScoreProbabilities().

◆ _PSISequenceWeightsFree()

_PSISequenceWeights* _PSISequenceWeightsFree ( _PSISequenceWeights seq_weights)

◆ _PSISequenceWeightsNew()

_PSISequenceWeights* _PSISequenceWeightsNew ( const PSIMsaDimensions dims,
const BlastScoreBlk sbp 
)

◆ _PSIStructureGroupCustomization()

void _PSIStructureGroupCustomization ( _PSIMsa msa)

Enable NCBI structure group customization to discard the query sequence, as this really isn't the result of a PSI-BLAST iteration, but rather an artificial consensus sequence of the multiple sequence alignment constructed by them.

This should be called after _PSIPurgeBiasedSegments.

Definition at line 800 of file blast_psi_priv.c.

References _PSIUpdatePositionCounts(), _PSIMsa::cell, _PSIMsa::dimensions, FALSE, i, _PSIMsaCell::is_aligned, kQueryIndex, _PSIMsaCell::letter, and PSIMsaDimensions::query_length.

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIUpdateLambdaK()

void _PSIUpdateLambdaK ( const int **  pssm,
const Uint1 query,
Uint4  query_length,
const double *  std_probs,
BlastScoreBlk sbp 
)

Updates the Karlin-Altschul parameters based on the query sequence and PSSM's score frequencies.

Port of blastool.c's updateLambdaK

Parameters
pssmPSSM [in]
queryquery sequence in ncbistdaa encoding [in]
query_lengthlength of the query sequence above [in]
std_probsarray containing the standard background residue probabilities [in]
sbpScore block structure where the calculated lambda and K will be returned [in|out]

Definition at line 2732 of file blast_psi_priv.c.

References _PSIComputeScoreProbabilities(), ASSERT, Blast_KarlinBlkUngappedCalc(), Blast_ScoreFreqFree(), Blast_KarlinBlk::K, BlastScoreBlk::kbp_gap_psi, BlastScoreBlk::kbp_gap_std, BlastScoreBlk::kbp_ideal, BlastScoreBlk::kbp_psi, log, Blast_KarlinBlk::logK, and query.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ _PSIUpdatePositionCounts()

void _PSIUpdatePositionCounts ( _PSIMsa msa)

Counts the number of sequences matching the query per query position (columns of the multiple alignment) as well as the number of residues present in each position of the query.

Should be called after multiple alignment data has been purged from biased sequences.

Parameters
msamultiple sequence alignment structure [in|out]

Definition at line 991 of file blast_psi_priv.c.

References _PSIMsa::alphabet_size, ASSERT, _PSIMsa::cell, _PSIMsa::dimensions, _PSIMsaCell::is_aligned, _PSIMsaCell::letter, _PSIMsa::num_matching_seqs, PSIMsaDimensions::num_seqs, PSIMsaDimensions::query_length, and _PSIMsa::residue_counts.

Referenced by _PSIMsaNew(), and _PSIStructureGroupCustomization().

◆ _PSIValidateCdMSA()

int _PSIValidateCdMSA ( const PSICdMsa cd_msa,
Uint4  alphabet_size 
)

Validation of multiple alignment of conserved domains structure.

Parameters
cd_msamultiple alignment of CDs [in]
alphabet_sizealphabet size [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 862 of file blast_psi_priv.c.

References AMINOACID_TO_NCBISTDAA, PSICdMsaCell::data, PSICdMsa::dimensions, fabs, PSICdMsaCellData::iobsr, PSICdMsaCell::is_aligned, kEpsilon, PSICdMsa::msa, PSIMsaDimensions::num_seqs, PSI_SUCCESS, PSIERR_BADPARAM, PSIERR_BADPROFILE, PSIERR_GAPINQUERY, PSICdMsa::query, PSIMsaDimensions::query_length, and PSICdMsaCellData::wfreqs.

Referenced by PSICreatePssmFromCDD().

◆ _PSIValidateMSA()

int _PSIValidateMSA ( const _PSIMsa msa,
Boolean  ignored_unaligned_positions 
)

Main validation function for multiple sequence alignment structure.

Should be called after _PSIPurgeBiasedSegments.

Parameters
msamultiple sequence alignment data structure [in]
ignored_unaligned_positionsdetermines whether the unaligned positions test should be performend or not [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 828 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, s_PSIValidateAlignedColumns(), s_PSIValidateNoFlankingGaps(), s_PSIValidateNoGapsInQuery(), and s_PSIValidateParticipatingSequences().

Referenced by PSICreatePssmWithDiagnostics().

◆ _PSIValidateMSA_StructureGroup()

int _PSIValidateMSA_StructureGroup ( const _PSIMsa msa)

Structure group validation function for multiple sequence alignment structure.

Should be called after _PSIStructureGroupCustomization.

Parameters
msamultiple sequence alignment data structure [in]
Returns
One of the errors defined above if validation fails or bad parameter is passed in, else PSI_SUCCESS

Definition at line 811 of file blast_psi_priv.c.

References PSI_SUCCESS, PSIERR_BADPARAM, and s_PSIValidateParticipatingSequences().

Referenced by PSICreatePssmWithDiagnostics().

Variable Documentation

◆ kEpsilon

const double kEpsilon
extern

◆ kPositScalingNumIterations

const Uint4 kPositScalingNumIterations
extern

Constant used in scaling PSSM routines: Successor to POSIT_NUM_ITERATIONS.

Definition at line 61 of file blast_psi_priv.c.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kPositScalingPercent

const double kPositScalingPercent
extern

Constant used in scaling PSSM routines: Successor to POSIT_PERCENT.

Definition at line 60 of file blast_psi_priv.c.

Referenced by _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kPSIIdentical

const double kPSIIdentical
extern

Percent identity threshold for discarding identical matches.

Definition at line 56 of file blast_psi_priv.c.

Referenced by s_PSIPurgeSelfHits().

◆ kPSINearIdentical

const double kPSINearIdentical
extern

Percent identity threshold for discarding near-identical matches.

Definition at line 55 of file blast_psi_priv.c.

Referenced by s_PSIPurgeNearIdenticalAlignments(), and CPssmInputTestData::SetupNearIdenticalHits().

◆ kPSIScaleFactor

const int kPSIScaleFactor
extern

Successor to POSIT_SCALE_FACTOR.

Definition at line 59 of file blast_psi_priv.c.

Referenced by _PSIConvertFreqRatiosToPSSM(), _PSIScaleMatrix(), and impalaScaleMatrix().

◆ kQueryIndex

const unsigned int kQueryIndex
extern
Modified on Fri Sep 20 14:58:27 2024 by modify_doxy.py rev. 669887