NCBI C++ ToolKit
Classes | Macros | Typedefs | Enumerations | Functions
pattern.h File Reference

Functions for finding pattern matches in sequence (PHI-BLAST). More...

#include <algo/blast/core/ncbi_std.h>
#include <algo/blast/core/blast_export.h>
#include <algo/blast/core/blast_def.h>
#include <algo/blast/core/blast_query_info.h>
+ Include dependency graph for pattern.h:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

struct  SDNAShortPatternItems
 Structure containing auxiliary items needed for a DNA search with a pattern that fits in a single word. More...
 
struct  SShortPatternItems
 Auxiliary items needed for a PHI BLAST search with a pattern that fits in a single word. More...
 
struct  SExtraLongPatternItems
 Auxiliary items needed for a PHI BLAST search with pattern that contains pieces longer than a word. More...
 
struct  SDNALongPatternItems
 Auxiliary items needed for a DNA pattern search with pattern containing multiple words. More...
 
struct  SLongPatternItems
 Auxiliary items needed for a PHI BLAST search with pattern containing multiple words. More...
 
struct  SPHIPatternSearchBlk
 Structure containing all auxiliary information needed in a pattern search. More...
 

Macros

#define PHI_BUF_SIZE   100
 Default size for buffers. More...
 
#define PHI_ASCII_SIZE   256
 Size of ASCII alphabet. More...
 
#define PHI_BITS_PACKED_PER_WORD   30
 Number of bits packed in a word. More...
 
#define PHI_MAX_WORD_SIZE   11
 Maximal word size. More...
 
#define PHI_MAX_PATTERN_LENGTH   (PHI_BITS_PACKED_PER_WORD * PHI_MAX_WORD_SIZE)
 Threshold pattern length. More...
 
#define PHI_MAX_WORDS_IN_PATTERN   100
 Maximal number of words in pattern. More...
 
#define PHI_MAX_HIT   20000
 Maximal size of an array of pattern hits. More...
 

Typedefs

typedef enum EPatternProgram EPatternProgram
 Options for running the pattern search. More...
 
typedef enum EPatternType EPatternType
 Type of pattern: fits in single word, several words, or is very long. More...
 
typedef struct SDNAShortPatternItems SDNAShortPatternItems
 Structure containing auxiliary items needed for a DNA search with a pattern that fits in a single word. More...
 
typedef struct SShortPatternItems SShortPatternItems
 Auxiliary items needed for a PHI BLAST search with a pattern that fits in a single word. More...
 
typedef struct SExtraLongPatternItems SExtraLongPatternItems
 Auxiliary items needed for a PHI BLAST search with pattern that contains pieces longer than a word. More...
 
typedef struct SDNALongPatternItems SDNALongPatternItems
 Auxiliary items needed for a DNA pattern search with pattern containing multiple words. More...
 
typedef struct SLongPatternItems SLongPatternItems
 Auxiliary items needed for a PHI BLAST search with pattern containing multiple words. More...
 
typedef struct SPHIPatternSearchBlk SPHIPatternSearchBlk
 Structure containing all auxiliary information needed in a pattern search. More...
 

Enumerations

enum  EPatternProgram { eSeed = 1 , ePattern , ePatSeed , ePatMatch }
 Options for running the pattern search. More...
 
enum  EPatternType { eOneWord = 0 , eMultiWord , eVeryLong }
 Type of pattern: fits in single word, several words, or is very long. More...
 

Functions

Int4 FindPatternHits (Int4 *hitArray, const Uint1 *seq, Int4 len, Boolean is_dna, const SPHIPatternSearchBlk *patternSearch)
 Find the places where the pattern matches seq; 3 different methods are used depending on the length of the pattern. More...
 
SPHIQueryInfoSPHIQueryInfoNew (void)
 Allocates the pattern occurrences structure. More...
 
SPHIQueryInfoSPHIQueryInfoFree (SPHIQueryInfo *pat_info)
 Frees the pattern information structure. More...
 
SPHIQueryInfoSPHIQueryInfoCopy (const SPHIQueryInfo *pat_info)
 Copies the SPHIQueryInfo structure. More...
 
Int4 PHIGetPatternOccurrences (const SPHIPatternSearchBlk *pattern_blk, const BLAST_SequenceBlk *query, const BlastSeqLoc *location, Boolean is_dna, BlastQueryInfo *query_info)
 Finds all pattern hits in a given query and saves them in the previously allocated SPHIQueryInfo structure. More...
 

Detailed Description

Functions for finding pattern matches in sequence (PHI-BLAST).

Definition in file pattern.h.

Macro Definition Documentation

◆ PHI_ASCII_SIZE

#define PHI_ASCII_SIZE   256

Size of ASCII alphabet.

Definition at line 51 of file pattern.h.

◆ PHI_BITS_PACKED_PER_WORD

#define PHI_BITS_PACKED_PER_WORD   30

Number of bits packed in a word.

Definition at line 52 of file pattern.h.

◆ PHI_BUF_SIZE

#define PHI_BUF_SIZE   100

Default size for buffers.

Definition at line 50 of file pattern.h.

◆ PHI_MAX_HIT

#define PHI_MAX_HIT   20000

Maximal size of an array of pattern hits.

Definition at line 59 of file pattern.h.

◆ PHI_MAX_PATTERN_LENGTH

#define PHI_MAX_PATTERN_LENGTH   (PHI_BITS_PACKED_PER_WORD * PHI_MAX_WORD_SIZE)

Threshold pattern length.

Definition at line 55 of file pattern.h.

◆ PHI_MAX_WORD_SIZE

#define PHI_MAX_WORD_SIZE   11

Maximal word size.

Definition at line 53 of file pattern.h.

◆ PHI_MAX_WORDS_IN_PATTERN

#define PHI_MAX_WORDS_IN_PATTERN   100

Maximal number of words in pattern.

Definition at line 56 of file pattern.h.

Typedef Documentation

◆ EPatternProgram

Options for running the pattern search.

◆ EPatternType

typedef enum EPatternType EPatternType

Type of pattern: fits in single word, several words, or is very long.

◆ SDNALongPatternItems

Auxiliary items needed for a DNA pattern search with pattern containing multiple words.

◆ SDNAShortPatternItems

Structure containing auxiliary items needed for a DNA search with a pattern that fits in a single word.

◆ SExtraLongPatternItems

Auxiliary items needed for a PHI BLAST search with pattern that contains pieces longer than a word.

◆ SLongPatternItems

Auxiliary items needed for a PHI BLAST search with pattern containing multiple words.

◆ SPHIPatternSearchBlk

Structure containing all auxiliary information needed in a pattern search.

◆ SShortPatternItems

Auxiliary items needed for a PHI BLAST search with a pattern that fits in a single word.

Enumeration Type Documentation

◆ EPatternProgram

Options for running the pattern search.

Enumerator
eSeed 

Use only those query occurrences that are specified in the input pattern file.

ePattern 

Only find pattern occurrences in database, but do not perform alignments.

ePatSeed 

Search a BLAST database using pattern occurrences as seeds.

ePatMatch 

Only find pattern occurrences in query, but do not search the database.

Definition at line 62 of file pattern.h.

◆ EPatternType

Type of pattern: fits in single word, several words, or is very long.

Enumerator
eOneWord 

Does pattern consist of a single word?

eMultiWord 

Does pattern consist of a multiple words?

eVeryLong 

Is pattern too long for a simple multi-word processing?

Definition at line 73 of file pattern.h.

Function Documentation

◆ FindPatternHits()

Int4 FindPatternHits ( Int4 hitArray,
const Uint1 seq,
Int4  len,
Boolean  is_dna,
const SPHIPatternSearchBlk patternSearch 
)

Find the places where the pattern matches seq; 3 different methods are used depending on the length of the pattern.

Parameters
hitArrayStores the results as pairs of positions in consecutive entries [out]
seqSequence [in]
lenLength of the sequence [in]
is_dnaIndicates whether seq is made of DNA or protein letters [in]
patternSearchPattern information [in]
Returns
Twice the number of hits (length of hitArray filled in)

Definition at line 468 of file pattern.c.

References eMultiWord, eOneWord, SPHIPatternSearchBlk::flagPatternLength, len, s_FindHitsLong(), s_FindHitsShortHead(), and s_FindHitsVeryLong().

Referenced by PHIBlastScanSubject(), PHIGetPatternOccurrences(), and CMultiAligner::x_FindPatternHits().

◆ PHIGetPatternOccurrences()

Int4 PHIGetPatternOccurrences ( const SPHIPatternSearchBlk pattern_blk,
const BLAST_SequenceBlk query,
const BlastSeqLoc location,
Boolean  is_dna,
BlastQueryInfo query_info 
)

Finds all pattern hits in a given query and saves them in the previously allocated SPHIQueryInfo structure.

Parameters
pattern_blkStructure containing pattern structure. [in]
queryQuery sequence(s) [in]
locationSegments in the query sequence where to look for pattern [in]
is_dnaIs this a nucleotide sequence? [in]
query_infoUsed to store pattern occurrences and get length of query (for error checking) [out]
Returns
a negative number is an unknown error, INT4_MAX indicates the pattern (illegally) covered the entire query, other non-negative numbers indicate the nubmer of pattern occurrences found.

Definition at line 553 of file pattern.c.

References ASSERT, BlastQueryInfoGetQueryLength(), calloc(), eBlastTypePhiBlastn, eBlastTypePhiBlastp, FindPatternHits(), i, INT4_MAX, SSeqRange::left, location, BlastSeqLoc::next, SPHIQueryInfo::num_patterns, BlastQueryInfo::pattern_info, query, SSeqRange::right, s_PHIBlastAddPatternHit(), sfree, and BlastSeqLoc::ssr.

Referenced by Blast_SetPHIPatternInfo().

◆ SPHIQueryInfoCopy()

SPHIQueryInfo* SPHIQueryInfoCopy ( const SPHIQueryInfo pat_info)

Copies the SPHIQueryInfo structure.

Parameters
pat_infoStructure to copy [in]
Returns
New structure.

Definition at line 507 of file pattern.c.

References BlastMemDup(), NULL, SPHIQueryInfo::num_patterns, SPHIQueryInfo::occurrences, and SPHIQueryInfo::pattern.

Referenced by BlastQueryInfoDup(), and CSearchResults::CSearchResults().

◆ SPHIQueryInfoFree()

SPHIQueryInfo* SPHIQueryInfoFree ( SPHIQueryInfo pat_info)

Frees the pattern information structure.

Parameters
pat_infoStructure to free. [in]
Returns
NULL.

Definition at line 496 of file pattern.c.

References NULL, SPHIQueryInfo::occurrences, SPHIQueryInfo::pattern, and sfree.

Referenced by BlastQueryInfoFree(), BOOST_AUTO_TEST_CASE(), and CSearchResults::~CSearchResults().

◆ SPHIQueryInfoNew()

SPHIQueryInfo* SPHIQueryInfoNew ( void  )

Allocates the pattern occurrences structure.

Definition at line 478 of file pattern.c.

References SPHIQueryInfo::allocated_size, calloc(), NULL, and SPHIQueryInfo::occurrences.

Referenced by Blast_SetPHIPatternInfo(), and CPhiblastTestFixture::x_SetupPatternInfo().

Modified on Mon May 20 05:03:52 2024 by modify_doxy.py rev. 669887