NCBI C++ ToolKit
Classes | Public Member Functions | Private Member Functions | Private Attributes | List of all members
CWinMaskCountsGenerator Class Reference

Search Toolkit Book for CWinMaskCountsGenerator

This class encapsulates the n-mer frequency counts generation functionality of winmasker. More...

#include <algo/winmask/win_mask_gen_counts.hpp>

+ Collaboration diagram for CWinMaskCountsGenerator:

Classes

class  GenCountsException
 Exceptions that CWinMaskCountsGenerator may throw. More...
 

Public Member Functions

 CWinMaskCountsGenerator (const string &input, const string &output, const string &infmt, const string &sformat, const string &th, Uint4 mem_avail, Uint1 unit_size, Uint8 genome_size, Uint4 min_count, Uint4 max_count, bool check_duplicates, bool use_list, const CWinMaskUtil::CIdSet *ids, const CWinMaskUtil::CIdSet *exclude_ids, bool use_ba, string const &metadata, double min_pct=-1.0, double extend_pct=-1.0, double thres_pct=-1.0, double max_pct=-1.0)
 Constructor. More...
 
 CWinMaskCountsGenerator (const string &input, CNcbiOstream &os, const string &infmt, const string &sformat, const string &th, Uint4 mem_avail, Uint1 unit_size, Uint8 genome_size, Uint4 min_count, Uint4 max_count, bool check_duplicates, bool use_list, const CWinMaskUtil::CIdSet *ids, const CWinMaskUtil::CIdSet *exclude_ids, bool use_ba, string const &metadata, double min_pct=-1.0, double extend_pct=-1.0, double thres_pct=-1.0, double max_pct=-1.0)
 Constructor. More...
 
 ~CWinMaskCountsGenerator ()
 Object destructor. More...
 
void operator() ()
 This function does the actual n-mer counting. More...
 

Private Member Functions

void process (Uint4 prefix, Uint1 prefix_size, const vector< string > &input, bool do_output)
 
Uint8 fastalen (const string &fname) const
 

Private Attributes

string input
 
CRef< CSeqMaskerOstatustat
 
Uint8 max_mem
 
Uint4 unit_size
 
Uint8 genome_size
 
Uint4 min_count
 
Uint4 max_count
 
Uint4 t_high
 
bool has_min_count
 
bool no_extra_pass
 
bool check_duplicates
 
bool use_list
 
Uint4 total_ecodes
 
vector< Uint4score_counts
 
double th [4]
 
const CWinMaskUtil::CIdSetids
 
const CWinMaskUtil::CIdSetexclude_ids
 
string infmt
 

Detailed Description

This class encapsulates the n-mer frequency counts generation functionality of winmasker.

Definition at line 61 of file win_mask_gen_counts.hpp.

Constructor & Destructor Documentation

◆ CWinMaskCountsGenerator() [1/2]

CWinMaskCountsGenerator::CWinMaskCountsGenerator ( const string input,
const string output,
const string infmt,
const string sformat,
const string th,
Uint4  mem_avail,
Uint1  unit_size,
Uint8  genome_size,
Uint4  min_count,
Uint4  max_count,
bool  check_duplicates,
bool  use_list,
const CWinMaskUtil::CIdSet ids,
const CWinMaskUtil::CIdSet exclude_ids,
bool  use_ba,
string const metadata,
double  min_pct = -1.0,
double  extend_pct = -1.0,
double  thres_pct = -1.0,
double  max_pct = -1.0 
)

Constructor.

Creates an instance based on configuration parameters.

Parameters
inputinput file name or a name of the file containing a list of input files (one per line) depending on the value of use_list parameter
outputname of the output file (empty means standard output)
infmtinput format
sformatcounts format
thstring describing 4 percentage values (comma separated) used to compute winmask score thresholds
mem_availmemory (in megabytes) available to the function
unit_sizen-mer size (value of n)
min_countdo not consider n-mers with counts less than the value this parameter
max_countmaximum n-mer count to consider in winmask thresholds computations
check_duplicatestrue if input checking for duplicates is requested; false otherwise
use_listtrue if input file contains the list of fasta file names; false if input is the name of the fasta file itself
idsset of ids to consider
exclude_idsset of ids to ignore
use_bause bit array optimization for optimized binary unit counts format
metadatathe metadata string
min_pctmin score as percentage of counts
extend_pctinterval extension score as percentage of counts
thres_pctmasking threshold score as percentage of counts
max_pctmax score as percentage of counts

Definition at line 171 of file win_mask_gen_counts.cpp.

References count, and th.

◆ CWinMaskCountsGenerator() [2/2]

CWinMaskCountsGenerator::CWinMaskCountsGenerator ( const string input,
CNcbiOstream os,
const string infmt,
const string sformat,
const string th,
Uint4  mem_avail,
Uint1  unit_size,
Uint8  genome_size,
Uint4  min_count,
Uint4  max_count,
bool  check_duplicates,
bool  use_list,
const CWinMaskUtil::CIdSet ids,
const CWinMaskUtil::CIdSet exclude_ids,
bool  use_ba,
string const metadata,
double  min_pct = -1.0,
double  extend_pct = -1.0,
double  thres_pct = -1.0,
double  max_pct = -1.0 
)

Constructor.

Creates an instance based on configuration parameters.

Parameters
inputinput file name or a name of the file containing a list of input files (one per line) depending on the value of use_list parameter
osthe output stream
infmtinput format
sformatcounts format
thstring describing 4 percentage values (comma separated) used to compute winmask score thresholds
mem_availmemory (in megabytes) available to the function
unit_sizen-mer size (value of n)
min_countdo not consider n-mers with counts less than the value this parameter
max_countmaximum n-mer count to consider in winmask thresholds computations
check_duplicatestrue if input checking for duplicates is requested; false otherwise
use_listtrue if input file contains the list of fasta file names; false if input is the name of the fasta file itself
idsset of ids to consider
exclude_idsset of ids to ignore
use_bause bit array optimization for optimized binary unit counts format
metadatathe metadata string
min_pctmin score as percentage of counts
extend_pctinterval extension score as percentage of counts
thres_pctmasking threshold score as percentage of counts
max_pctmax score as percentage of counts

Definition at line 124 of file win_mask_gen_counts.cpp.

References count, and th.

◆ ~CWinMaskCountsGenerator()

CWinMaskCountsGenerator::~CWinMaskCountsGenerator ( )

Object destructor.

Definition at line 223 of file win_mask_gen_counts.cpp.

Member Function Documentation

◆ fastalen()

Uint8 CWinMaskCountsGenerator::fastalen ( const string fname) const
private

◆ operator()()

void CWinMaskCountsGenerator::operator() ( void  )

◆ process()

void CWinMaskCountsGenerator::process ( Uint4  prefix,
Uint1  prefix_size,
const vector< string > &  input,
bool  do_output 
)
private

Member Data Documentation

◆ check_duplicates

bool CWinMaskCountsGenerator::check_duplicates
private

Definition at line 248 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ exclude_ids

const CWinMaskUtil::CIdSet* CWinMaskCountsGenerator::exclude_ids
private

Definition at line 256 of file win_mask_gen_counts.hpp.

Referenced by fastalen(), operator()(), and process().

◆ genome_size

Uint8 CWinMaskCountsGenerator::genome_size
private

Definition at line 242 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ has_min_count

bool CWinMaskCountsGenerator::has_min_count
private

Definition at line 246 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ ids

const CWinMaskUtil::CIdSet* CWinMaskCountsGenerator::ids
private

Definition at line 255 of file win_mask_gen_counts.hpp.

Referenced by fastalen(), operator()(), and process().

◆ infmt

string CWinMaskCountsGenerator::infmt
private

Definition at line 258 of file win_mask_gen_counts.hpp.

Referenced by fastalen(), operator()(), and process().

◆ input

string CWinMaskCountsGenerator::input
private

Definition at line 238 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ max_count

Uint4 CWinMaskCountsGenerator::max_count
private

Definition at line 244 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ max_mem

Uint8 CWinMaskCountsGenerator::max_mem
private

Definition at line 240 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ min_count

Uint4 CWinMaskCountsGenerator::min_count
private

Definition at line 243 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ no_extra_pass

bool CWinMaskCountsGenerator::no_extra_pass
private

Definition at line 247 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ score_counts

vector< Uint4 > CWinMaskCountsGenerator::score_counts
private

Definition at line 252 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ t_high

Uint4 CWinMaskCountsGenerator::t_high
private

Definition at line 245 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ th

double CWinMaskCountsGenerator::th[4]
private

Definition at line 253 of file win_mask_gen_counts.hpp.

Referenced by CWinMaskCountsGenerator(), and operator()().

◆ total_ecodes

Uint4 CWinMaskCountsGenerator::total_ecodes
private

Definition at line 251 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ unit_size

Uint4 CWinMaskCountsGenerator::unit_size
private

Definition at line 241 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().

◆ use_list

bool CWinMaskCountsGenerator::use_list
private

Definition at line 249 of file win_mask_gen_counts.hpp.

Referenced by operator()().

◆ ustat

CRef< CSeqMaskerOstat > CWinMaskCountsGenerator::ustat
private

Definition at line 239 of file win_mask_gen_counts.hpp.

Referenced by operator()(), and process().


The documentation for this class was generated from the following files:
Modified on Fri Sep 20 14:58:04 2024 by modify_doxy.py rev. 669887