NCBI C++ ToolKit
|
Search Toolkit Book for CMultipatternSearch
#include <util/multipattern_search.hpp>
Public Types | |
enum | EFlags { fNoCase = 1 << 0 , fBeginString = 1 << 1 , fEndString = 1 << 2 , fWholeString = fBeginString | fEndString , fBeginWord = 1 << 3 , fEndWord = 1 << 4 , fWholeWord = fBeginWord | fEndWord } |
Search flags (for non-RegEx patterns only!) More... | |
enum | EOnFind { eStopSearch , eContinueSearch } |
When the pattern is found, the search can be stopped or continued. More... | |
Public Member Functions | |
CMultipatternSearch () | |
~CMultipatternSearch () | |
DECLARE_SAFE_FLAGS_TYPE (EFlags, TFlags) | |
void | GenerateDotGraph (ostream &out) const |
Generate graphical representation of the FSM in DOT format. More... | |
void | GenerateArrayMapData (ostream &out) const |
Generate C++ array/map data. More... | |
void | GenerateSourceCode (ostream &out) const |
Generate C code for the FSM search. More... | |
void | AddPattern (const char *pattern, TFlags flags=0) |
void | AddPattern (const string &pattern, TFlags flags=0) |
void | AddPatterns (const vector< string > &patterns) |
void | AddPatterns (const vector< pair< string, TFlags >> &patterns) |
Static Public Member Functions | |
static string | QuoteString (const string &str) |
Quote special characters to insert string into regular expression. More... | |
typedef std::function< void(size_t)> | VoidCall1 |
typedef std::function< void(size_t, size_t)> | VoidCall2 |
typedef std::function< bool(size_t)> | BoolCall1 |
typedef std::function< bool(size_t, size_t)> | BoolCall2 |
void | Search (const char *input, VoidCall1 found_callback) const |
void | Search (const string &input, VoidCall1 found_callback) const |
void | Search (const char *input, VoidCall2 found_callback) const |
void | Search (const string &input, VoidCall2 found_callback) const |
void | Search (const char *input, BoolCall1 found_callback) const |
void | Search (const string &input, BoolCall1 found_callback) const |
void | Search (const char *input, BoolCall2 found_callback) const |
void | Search (const string &input, BoolCall2 found_callback) const |
unique_ptr< CRegExFSA > | m_FSM |
static void | Search (const char *input, const FSM::CCompiledFSM &fsm, VoidCall1 found_callback) |
static void | Search (const string &input, const FSM::CCompiledFSM &fsm, VoidCall1 found_callback) |
static void | Search (const char *input, const FSM::CCompiledFSM &fsm, VoidCall2 found_callback) |
static void | Search (const string &input, const FSM::CCompiledFSM &fsm, VoidCall2 found_callback) |
static void | Search (const char *input, const FSM::CCompiledFSM &fsm, BoolCall1 found_callback) |
static void | Search (const string &input, const FSM::CCompiledFSM &fsm, BoolCall1 found_callback) |
static void | Search (const char *input, const FSM::CCompiledFSM &fsm, BoolCall2 found_callback) |
static void | Search (const string &input, const FSM::CCompiledFSM &fsm, BoolCall2 found_callback) |
Simultaneous search of multiple string or RegEx patterns in the input string
CMultipatternSearch builds a Finite State Machine (FSM) from a list of search strings or regular expression patterns. It can then search all patterns simultaneously, requiring only a single traversal of the input string. Use this class to increase the search performance when the number of search patterns is large (10 and more) If the patterns are known in advance, FSM can be exported as C code and compiled for the further performance improvement.
Definition at line 63 of file multipattern_search.hpp.
typedef std::function<bool(size_t)> CMultipatternSearch::BoolCall1 |
Definition at line 146 of file multipattern_search.hpp.
typedef std::function<bool(size_t, size_t)> CMultipatternSearch::BoolCall2 |
Definition at line 147 of file multipattern_search.hpp.
typedef std::function<void(size_t)> CMultipatternSearch::VoidCall1 |
Run the FSM search on the input string
input | Input string. |
found_callback | Function or function-like object to call when the pattern is found. It can accept one or two parameters and return void or EOnFind. If it returns eStopSearch, the search terminates. If it returns eContinueSearch or void, the search continues. |
Definition at line 144 of file multipattern_search.hpp.
typedef std::function<void(size_t, size_t)> CMultipatternSearch::VoidCall2 |
Definition at line 145 of file multipattern_search.hpp.
Search flags (for non-RegEx patterns only!)
Enumerator | |
---|---|
fNoCase | |
fBeginString | |
fEndString | |
fWholeString | |
fBeginWord | |
fEndWord | |
fWholeWord |
Definition at line 70 of file multipattern_search.hpp.
When the pattern is found, the search can be stopped or continued.
Enumerator | |
---|---|
eStopSearch | |
eContinueSearch |
Definition at line 120 of file multipattern_search.hpp.
CMultipatternSearch::CMultipatternSearch | ( | ) |
Definition at line 40 of file multipattern_search.cpp.
CMultipatternSearch::~CMultipatternSearch | ( | ) |
Definition at line 41 of file multipattern_search.cpp.
void CMultipatternSearch::AddPattern | ( | const char * | pattern, |
TFlags | flags = 0 |
||
) |
Add search pattern to the FSM
pattern | A search pattern to add to the FSM. If begins with a slash (/), then it is considered a RegEx. |
flags | Additional search conditions. If the first argument is a RegEx, then the flags are ignored. |
Definition at line 43 of file multipattern_search.cpp.
Definition at line 92 of file multipattern_search.hpp.
References AddPattern(), and flags.
Referenced by AddPattern().
Definition at line 54 of file multipattern_search.cpp.
Definition at line 45 of file multipattern_search.cpp.
CMultipatternSearch::DECLARE_SAFE_FLAGS_TYPE | ( | EFlags | , |
TFlags | |||
) |
void CMultipatternSearch::GenerateArrayMapData | ( | ostream & | out | ) | const |
Generate C++ array/map data.
out | A stream to receive the output. |
Definition at line 65 of file multipattern_search.cpp.
void CMultipatternSearch::GenerateDotGraph | ( | ostream & | out | ) | const |
Generate graphical representation of the FSM in DOT format.
For more details, see http://www.graphviz.org/
out | A stream to receive the output. |
Definition at line 63 of file multipattern_search.cpp.
void CMultipatternSearch::GenerateSourceCode | ( | ostream & | out | ) | const |
Generate C code for the FSM search.
out | A stream to receive the output. |
Definition at line 67 of file multipattern_search.cpp.
Quote special characters to insert string into regular expression.
Definition at line 69 of file multipattern_search.cpp.
Referenced by CSearch_func::GetRegex().
Definition at line 192 of file multipattern_search.cpp.
References input(), m_FSM, and xMultiPatternSearch().
Definition at line 199 of file multipattern_search.cpp.
References input(), m_FSM, and xMultiPatternSearch().
|
static |
|
static |
|
static |
Run the FSM search on the input string using the strucutre prebuilt by multipattern -A
input | Input string. |
states | generated by multipattern -A |
emit | generated by multipattern -A |
hits | generated by multipattern -A |
found_callback | Function or function-like object to call when the pattern is found. It can accept one or two parameters and return void or EOnFind. If it returns eStopSearch, the search terminates. If it returns eContinueSearch or void, the search continues. |
Example:
static void Screen(const char* str, bool *result) { #define _FSM_EMIT static bool emit[] #define _FSM_HITS static map<size_t, vector<size_t>> hits #define _FSM_STATES static size_t states[] #include "GENERATED_FILE.inc" #undef _FSM_EMIT #undef _FSM_HITS #undef _FSM_STATES CMultipatternSearch::Search(str, states, emit, hits, [result](size_t n) { result[n] = true; }); }
|
static |
Definition at line 178 of file multipattern_search.cpp.
References input(), m_FSM, and xMultiPatternSearch().
Referenced by FindFlatfileText(), CSuspect_rule_set::Screen(), and CMatchString::x_PopWeasel().
Definition at line 185 of file multipattern_search.cpp.
References input(), m_FSM, and xMultiPatternSearch().
Definition at line 154 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
Definition at line 156 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
|
inlinestatic |
Definition at line 203 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
|
inlinestatic |
Definition at line 205 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
|
inlinestatic |
Definition at line 199 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
|
inlinestatic |
Definition at line 201 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
Definition at line 150 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
Definition at line 152 of file multipattern_search.hpp.
References input(), and Search().
Referenced by Search().
|
private |
Finit State Machine that does all work
Definition at line 210 of file multipattern_search.hpp.
Referenced by AddPattern(), AddPatterns(), GenerateArrayMapData(), GenerateDotGraph(), GenerateSourceCode(), and Search().