NCBI C++ ToolKit
Public Types | Public Member Functions | Static Public Member Functions | List of all members
CMultipatternSearch Class Reference

Search Toolkit Book for CMultipatternSearch

CMultipatternSearch. More...

#include <util/multipattern_search.hpp>

Public Types

enum  EFlags {
  fNoCase = 1 << 0 , fBeginString = 1 << 1 , fEndString = 1 << 2 , fWholeString = fBeginString | fEndString ,
  fBeginWord = 1 << 3 , fEndWord = 1 << 4 , fWholeWord = fBeginWord | fEndWord
}
 Search flags (for non-RegEx patterns only!) More...
 
enum  EOnFind { eStopSearch , eContinueSearch }
 When the pattern is found, the search can be stopped or continued. More...
 

Public Member Functions

 CMultipatternSearch ()
 
 ~CMultipatternSearch ()
 
 DECLARE_SAFE_FLAGS_TYPE (EFlags, TFlags)
 
void GenerateDotGraph (ostream &out) const
 Generate graphical representation of the FSM in DOT format. More...
 
void GenerateArrayMapData (ostream &out) const
 Generate C++ array/map data. More...
 
void GenerateSourceCode (ostream &out) const
 Generate C code for the FSM search. More...
 
void AddPattern (const char *pattern, TFlags flags=0)
 
void AddPattern (const string &pattern, TFlags flags=0)
 
void AddPatterns (const vector< string > &patterns)
 
void AddPatterns (const vector< pair< string, TFlags >> &patterns)
 

Static Public Member Functions

static string QuoteString (const string &str)
 Quote special characters to insert string into regular expression. More...
 
typedef std::function< void(size_t)> VoidCall1
 
typedef std::function< void(size_t, size_t)> VoidCall2
 
typedef std::function< bool(size_t)> BoolCall1
 
typedef std::function< bool(size_t, size_t)> BoolCall2
 
void Search (const char *input, VoidCall1 found_callback) const
 
void Search (const string &input, VoidCall1 found_callback) const
 
void Search (const char *input, VoidCall2 found_callback) const
 
void Search (const string &input, VoidCall2 found_callback) const
 
void Search (const char *input, BoolCall1 found_callback) const
 
void Search (const string &input, BoolCall1 found_callback) const
 
void Search (const char *input, BoolCall2 found_callback) const
 
void Search (const string &input, BoolCall2 found_callback) const
 
unique_ptr< CRegExFSAm_FSM
 
static void Search (const char *input, const FSM::CCompiledFSM &fsm, VoidCall1 found_callback)
 
static void Search (const string &input, const FSM::CCompiledFSM &fsm, VoidCall1 found_callback)
 
static void Search (const char *input, const FSM::CCompiledFSM &fsm, VoidCall2 found_callback)
 
static void Search (const string &input, const FSM::CCompiledFSM &fsm, VoidCall2 found_callback)
 
static void Search (const char *input, const FSM::CCompiledFSM &fsm, BoolCall1 found_callback)
 
static void Search (const string &input, const FSM::CCompiledFSM &fsm, BoolCall1 found_callback)
 
static void Search (const char *input, const FSM::CCompiledFSM &fsm, BoolCall2 found_callback)
 
static void Search (const string &input, const FSM::CCompiledFSM &fsm, BoolCall2 found_callback)
 

Detailed Description

CMultipatternSearch.

Simultaneous search of multiple string or RegEx patterns in the input string

CMultipatternSearch builds a Finite State Machine (FSM) from a list of search strings or regular expression patterns. It can then search all patterns simultaneously, requiring only a single traversal of the input string. Use this class to increase the search performance when the number of search patterns is large (10 and more) If the patterns are known in advance, FSM can be exported as C code and compiled for the further performance improvement.

Definition at line 63 of file multipattern_search.hpp.

Member Typedef Documentation

◆ BoolCall1

typedef std::function<bool(size_t)> CMultipatternSearch::BoolCall1

Definition at line 146 of file multipattern_search.hpp.

◆ BoolCall2

typedef std::function<bool(size_t, size_t)> CMultipatternSearch::BoolCall2

Definition at line 147 of file multipattern_search.hpp.

◆ VoidCall1

typedef std::function<void(size_t)> CMultipatternSearch::VoidCall1

Run the FSM search on the input string

Parameters
inputInput string.
found_callbackFunction or function-like object to call when the pattern is found. It can accept one or two parameters and return void or EOnFind. If it returns eStopSearch, the search terminates. If it returns eContinueSearch or void, the search continues.
See also
CFoundCallback
Search(str, [](size_t pattern) { cout << "Found " << pattern << "\n"; });
Search(str, [](size_t pattern) -> CMultipatternSearch::EOnFind { cout << "Found " << pattern << "\n"; return CMultipatternSearch::eContinueSearch; });
Search(str, [](size_t pattern, size_t position) { cout << "Found " << pattern << " " << position << "\n"; });
Search(str, [](size_t pattern, size_t position) -> CMultipatternSearch::EOnFind { cout << "Found " << pattern << " " << position << "\n"; return CMultipatternSearch::eContinueSearch; });
EOnFind
When the pattern is found, the search can be stopped or continued.
void Search(const char *input, VoidCall1 found_callback) const
static const char * str(char *buf, int n)
Definition: stats.c:84

Definition at line 144 of file multipattern_search.hpp.

◆ VoidCall2

typedef std::function<void(size_t, size_t)> CMultipatternSearch::VoidCall2

Definition at line 145 of file multipattern_search.hpp.

Member Enumeration Documentation

◆ EFlags

Search flags (for non-RegEx patterns only!)

Enumerator
fNoCase 
fBeginString 
fEndString 
fWholeString 
fBeginWord 
fEndWord 
fWholeWord 

Definition at line 70 of file multipattern_search.hpp.

◆ EOnFind

When the pattern is found, the search can be stopped or continued.

Enumerator
eStopSearch 
eContinueSearch 

Definition at line 120 of file multipattern_search.hpp.

Constructor & Destructor Documentation

◆ CMultipatternSearch()

CMultipatternSearch::CMultipatternSearch ( )

Definition at line 40 of file multipattern_search.cpp.

◆ ~CMultipatternSearch()

CMultipatternSearch::~CMultipatternSearch ( )

Definition at line 41 of file multipattern_search.cpp.

Member Function Documentation

◆ AddPattern() [1/2]

void CMultipatternSearch::AddPattern ( const char *  pattern,
TFlags  flags = 0 
)

Add search pattern to the FSM

Parameters
patternA search pattern to add to the FSM. If begins with a slash (/), then it is considered a RegEx.
flagsAdditional search conditions. If the first argument is a RegEx, then the flags are ignored.

Definition at line 43 of file multipattern_search.cpp.

References f(), and m_FSM.

◆ AddPattern() [2/2]

void CMultipatternSearch::AddPattern ( const string pattern,
TFlags  flags = 0 
)
inline

Definition at line 92 of file multipattern_search.hpp.

References AddPattern(), and flags.

Referenced by AddPattern().

◆ AddPatterns() [1/2]

void CMultipatternSearch::AddPatterns ( const vector< pair< string, TFlags >> &  patterns)

Definition at line 54 of file multipattern_search.cpp.

References m_FSM, and patterns.

◆ AddPatterns() [2/2]

void CMultipatternSearch::AddPatterns ( const vector< string > &  patterns)

Definition at line 45 of file multipattern_search.cpp.

References m_FSM, and patterns.

◆ DECLARE_SAFE_FLAGS_TYPE()

CMultipatternSearch::DECLARE_SAFE_FLAGS_TYPE ( EFlags  ,
TFlags   
)

◆ GenerateArrayMapData()

void CMultipatternSearch::GenerateArrayMapData ( ostream &  out) const

Generate C++ array/map data.

Parameters
outA stream to receive the output.

Definition at line 65 of file multipattern_search.cpp.

References m_FSM, and out().

◆ GenerateDotGraph()

void CMultipatternSearch::GenerateDotGraph ( ostream &  out) const

Generate graphical representation of the FSM in DOT format.

For more details, see http://www.graphviz.org/

Parameters
outA stream to receive the output.

Definition at line 63 of file multipattern_search.cpp.

References m_FSM, and out().

◆ GenerateSourceCode()

void CMultipatternSearch::GenerateSourceCode ( ostream &  out) const

Generate C code for the FSM search.

Parameters
outA stream to receive the output.

Definition at line 67 of file multipattern_search.cpp.

References m_FSM, and out().

◆ QuoteString()

string CMultipatternSearch::QuoteString ( const string str)
static

Quote special characters to insert string into regular expression.

Definition at line 69 of file multipattern_search.cpp.

References out(), and str().

Referenced by CSearch_func::GetRegex().

◆ Search() [1/16]

void CMultipatternSearch::Search ( const char *  input,
BoolCall1  found_callback 
) const

Definition at line 192 of file multipattern_search.cpp.

References input(), m_FSM, and xMultiPatternSearch().

◆ Search() [2/16]

void CMultipatternSearch::Search ( const char *  input,
BoolCall2  found_callback 
) const

Definition at line 199 of file multipattern_search.cpp.

References input(), m_FSM, and xMultiPatternSearch().

◆ Search() [3/16]

static void CMultipatternSearch::Search ( const char *  input,
const FSM::CCompiledFSM fsm,
BoolCall1  found_callback 
)
static

◆ Search() [4/16]

static void CMultipatternSearch::Search ( const char *  input,
const FSM::CCompiledFSM fsm,
BoolCall2  found_callback 
)
static

◆ Search() [5/16]

static void CMultipatternSearch::Search ( const char *  input,
const FSM::CCompiledFSM fsm,
VoidCall1  found_callback 
)
static

Run the FSM search on the input string using the strucutre prebuilt by multipattern -A

Parameters
inputInput string.
statesgenerated by multipattern -A
emitgenerated by multipattern -A
hitsgenerated by multipattern -A
found_callbackFunction or function-like object to call when the pattern is found. It can accept one or two parameters and return void or EOnFind. If it returns eStopSearch, the search terminates. If it returns eContinueSearch or void, the search continues.
See also
CFoundCallback
Search(str, [](size_t pattern) { cout << "Found " << pattern << "\n"; });
Search(str, [](size_t pattern) -> CMultipatternSearch::EOnFind { cout << "Found " << pattern << "\n"; return CMultipatternSearch::eContinueSearch; });
Search(str, [](size_t pattern, size_t position) { cout << "Found " << pattern << " " << position << "\n"; });
Search(str, [](size_t pattern, size_t position) -> CMultipatternSearch::EOnFind { cout << "Found " << pattern << " " << position << "\n"; return CMultipatternSearch::eContinueSearch; });

Example:

static void Screen(const char* str, bool *result) { #define _FSM_EMIT static bool emit[] #define _FSM_HITS static map<size_t, vector<size_t>> hits #define _FSM_STATES static size_t states[] #include "GENERATED_FILE.inc" #undef _FSM_EMIT #undef _FSM_HITS #undef _FSM_STATES CMultipatternSearch::Search(str, states, emit, hits, [result](size_t n) { result[n] = true; }); }

◆ Search() [6/16]

static void CMultipatternSearch::Search ( const char *  input,
const FSM::CCompiledFSM fsm,
VoidCall2  found_callback 
)
static

◆ Search() [7/16]

void CMultipatternSearch::Search ( const char *  input,
VoidCall1  found_callback 
) const

◆ Search() [8/16]

void CMultipatternSearch::Search ( const char *  input,
VoidCall2  found_callback 
) const

Definition at line 185 of file multipattern_search.cpp.

References input(), m_FSM, and xMultiPatternSearch().

◆ Search() [9/16]

void CMultipatternSearch::Search ( const string input,
BoolCall1  found_callback 
) const
inline

Definition at line 154 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [10/16]

void CMultipatternSearch::Search ( const string input,
BoolCall2  found_callback 
) const
inline

Definition at line 156 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [11/16]

static void CMultipatternSearch::Search ( const string input,
const FSM::CCompiledFSM fsm,
BoolCall1  found_callback 
)
inlinestatic

Definition at line 203 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [12/16]

static void CMultipatternSearch::Search ( const string input,
const FSM::CCompiledFSM fsm,
BoolCall2  found_callback 
)
inlinestatic

Definition at line 205 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [13/16]

static void CMultipatternSearch::Search ( const string input,
const FSM::CCompiledFSM fsm,
VoidCall1  found_callback 
)
inlinestatic

Definition at line 199 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [14/16]

static void CMultipatternSearch::Search ( const string input,
const FSM::CCompiledFSM fsm,
VoidCall2  found_callback 
)
inlinestatic

Definition at line 201 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [15/16]

void CMultipatternSearch::Search ( const string input,
VoidCall1  found_callback 
) const
inline

Definition at line 150 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

◆ Search() [16/16]

void CMultipatternSearch::Search ( const string input,
VoidCall2  found_callback 
) const
inline

Definition at line 152 of file multipattern_search.hpp.

References input(), and Search().

Referenced by Search().

Member Data Documentation

◆ m_FSM

unique_ptr<CRegExFSA> CMultipatternSearch::m_FSM
private

Finit State Machine that does all work

Definition at line 210 of file multipattern_search.hpp.

Referenced by AddPattern(), AddPatterns(), GenerateArrayMapData(), GenerateDotGraph(), GenerateSourceCode(), and Search().


The documentation for this class was generated from the following files:
Modified on Wed Sep 04 15:03:19 2024 by modify_doxy.py rev. 669887