NCBI C++ ToolKit
|
Search Toolkit Book for CWinMaskCountsGenerator
This class encapsulates the n-mer frequency counts generation functionality of winmasker. More...
#include <algo/winmask/win_mask_gen_counts.hpp>
Classes | |
class | GenCountsException |
Exceptions that CWinMaskCountsGenerator may throw. More... | |
Private Member Functions | |
void | process (Uint4 prefix, Uint1 prefix_size, const vector< string > &input, bool do_output) |
Uint8 | fastalen (const string &fname) const |
Private Attributes | |
string | input |
CRef< CSeqMaskerOstat > | ustat |
Uint8 | max_mem |
Uint4 | unit_size |
Uint8 | genome_size |
Uint4 | min_count |
Uint4 | max_count |
Uint4 | t_high |
bool | has_min_count |
bool | no_extra_pass |
bool | check_duplicates |
bool | use_list |
Uint4 | total_ecodes |
vector< Uint4 > | score_counts |
double | th [4] |
const CWinMaskUtil::CIdSet * | ids |
const CWinMaskUtil::CIdSet * | exclude_ids |
string | infmt |
This class encapsulates the n-mer frequency counts generation functionality of winmasker.
Definition at line 61 of file win_mask_gen_counts.hpp.
CWinMaskCountsGenerator::CWinMaskCountsGenerator | ( | const string & | input, |
const string & | output, | ||
const string & | infmt, | ||
const string & | sformat, | ||
const string & | th, | ||
Uint4 | mem_avail, | ||
Uint1 | unit_size, | ||
Uint8 | genome_size, | ||
Uint4 | min_count, | ||
Uint4 | max_count, | ||
bool | check_duplicates, | ||
bool | use_list, | ||
const CWinMaskUtil::CIdSet * | ids, | ||
const CWinMaskUtil::CIdSet * | exclude_ids, | ||
bool | use_ba, | ||
string const & | metadata, | ||
double | min_pct = -1.0 , |
||
double | extend_pct = -1.0 , |
||
double | thres_pct = -1.0 , |
||
double | max_pct = -1.0 |
||
) |
Constructor.
Creates an instance based on configuration parameters.
input | input file name or a name of the file containing a list of input files (one per line) depending on the value of use_list parameter |
output | name of the output file (empty means standard output) |
infmt | input format |
sformat | counts format |
th | string describing 4 percentage values (comma separated) used to compute winmask score thresholds |
mem_avail | memory (in megabytes) available to the function |
unit_size | n-mer size (value of n) |
min_count | do not consider n-mers with counts less than the value this parameter |
max_count | maximum n-mer count to consider in winmask thresholds computations |
check_duplicates | true if input checking for duplicates is requested; false otherwise |
use_list | true if input file contains the list of fasta file names; false if input is the name of the fasta file itself |
ids | set of ids to consider |
exclude_ids | set of ids to ignore |
use_ba | use bit array optimization for optimized binary unit counts format |
metadata | the metadata string |
min_pct | min score as percentage of counts |
extend_pct | interval extension score as percentage of counts |
thres_pct | masking threshold score as percentage of counts |
max_pct | max score as percentage of counts |
Definition at line 171 of file win_mask_gen_counts.cpp.
CWinMaskCountsGenerator::CWinMaskCountsGenerator | ( | const string & | input, |
CNcbiOstream & | os, | ||
const string & | infmt, | ||
const string & | sformat, | ||
const string & | th, | ||
Uint4 | mem_avail, | ||
Uint1 | unit_size, | ||
Uint8 | genome_size, | ||
Uint4 | min_count, | ||
Uint4 | max_count, | ||
bool | check_duplicates, | ||
bool | use_list, | ||
const CWinMaskUtil::CIdSet * | ids, | ||
const CWinMaskUtil::CIdSet * | exclude_ids, | ||
bool | use_ba, | ||
string const & | metadata, | ||
double | min_pct = -1.0 , |
||
double | extend_pct = -1.0 , |
||
double | thres_pct = -1.0 , |
||
double | max_pct = -1.0 |
||
) |
Constructor.
Creates an instance based on configuration parameters.
input | input file name or a name of the file containing a list of input files (one per line) depending on the value of use_list parameter |
os | the output stream |
infmt | input format |
sformat | counts format |
th | string describing 4 percentage values (comma separated) used to compute winmask score thresholds |
mem_avail | memory (in megabytes) available to the function |
unit_size | n-mer size (value of n) |
min_count | do not consider n-mers with counts less than the value this parameter |
max_count | maximum n-mer count to consider in winmask thresholds computations |
check_duplicates | true if input checking for duplicates is requested; false otherwise |
use_list | true if input file contains the list of fasta file names; false if input is the name of the fasta file itself |
ids | set of ids to consider |
exclude_ids | set of ids to ignore |
use_ba | use bit array optimization for optimized binary unit counts format |
metadata | the metadata string |
min_pct | min score as percentage of counts |
extend_pct | interval extension score as percentage of counts |
thres_pct | masking threshold score as percentage of counts |
max_pct | max score as percentage of counts |
Definition at line 124 of file win_mask_gen_counts.cpp.
CWinMaskCountsGenerator::~CWinMaskCountsGenerator | ( | ) |
Object destructor.
Definition at line 223 of file win_mask_gen_counts.cpp.
Definition at line 104 of file win_mask_gen_counts.cpp.
References CWinMaskUtil::consider(), exclude_ids, CBioseq_Handle::GetBioseqLength(), ids, infmt, and result.
Referenced by operator()().
void CWinMaskCountsGenerator::operator() | ( | void | ) |
This function does the actual n-mer counting.
Determines the prefix length based on the available memory and calls process for each prefix to compute partial counts.
Definition at line 226 of file win_mask_gen_counts.cpp.
References _TRACE, check_duplicates, CheckDuplicates(), exclude_ids, fastalen(), CSeqMaskerOstat::finalize(), genome_size, has_min_count, i, ids, infmt, input, LOG_POST, max_count, max_mem, min_count, NCBI_ASSERT, NCBI_THROW, no_extra_pass, offset, process(), score_counts, CSeqMaskerOstat::setComment(), CSeqMaskerOstat::SetCount(), CSeqMaskerOstat::SetMaxCount(), CSeqMaskerOstat::setParam(), CSeqMaskerOstat::setUnitSize(), NStr::Split(), t_high, th, total_ecodes, unit_size, use_list, and ustat.
|
private |
Definition at line 409 of file win_mask_gen_counts.cpp.
References _TRACE, ambig(), CWinMaskUtil::consider(), count, data, CBioseq_Handle::eCoding_Iupac, exclude_ids, CObjectManager::GetInstance(), i, ids, infmt, letter(), max_count, min_count, om, reverse_complement(), score_counts, CSeqMaskerOstat::setUnitCount(), t_high, total_ecodes, unit_size, and ustat.
Referenced by operator()().
|
private |
Definition at line 248 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 256 of file win_mask_gen_counts.hpp.
Referenced by fastalen(), operator()(), and process().
|
private |
Definition at line 242 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 246 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 255 of file win_mask_gen_counts.hpp.
Referenced by fastalen(), operator()(), and process().
|
private |
Definition at line 258 of file win_mask_gen_counts.hpp.
Referenced by fastalen(), operator()(), and process().
|
private |
Definition at line 238 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 244 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 240 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 243 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 247 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 252 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 245 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 253 of file win_mask_gen_counts.hpp.
Referenced by CWinMaskCountsGenerator(), and operator()().
|
private |
Definition at line 251 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 241 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().
|
private |
Definition at line 249 of file win_mask_gen_counts.hpp.
Referenced by operator()().
|
private |
Definition at line 239 of file win_mask_gen_counts.hpp.
Referenced by operator()(), and process().