NCBI C++ ToolKit
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
CBlastInputSourceConfig Class Reference

Search Toolkit Book for CBlastInputSourceConfig

Class that centralizes the configuration data for sequences to be converted. More...

#include <algo/blast/blastinput/blast_input.hpp>

+ Collaboration diagram for CBlastInputSourceConfig:

Public Member Functions

 CBlastInputSourceConfig (const SDataLoaderConfig &dlconfig, objects::ENa_strand strand=objects::eNa_strand_other, bool lowercase=false, bool believe_defline=false, TSeqRange range=TSeqRange(), bool retrieve_seq_data=true, int local_id_counter=1, unsigned int seqlen_thresh2guess=numeric_limits< unsigned int >::max(), bool skip_seq_check=false)
 Constructor. More...
 
 ~CBlastInputSourceConfig ()
 Destructor. More...
 
void SetStrand (objects::ENa_strand strand)
 Set the strand to a specified value. More...
 
objects::ENa_strand GetStrand () const
 Retrieve the current strand value. More...
 
void SetLowercaseMask (bool mask)
 Turn lowercase masking on/off. More...
 
bool GetLowercaseMask () const
 Retrieve lowercase mask status. More...
 
void SetBelieveDeflines (bool believe)
 Turn parsing of sequence IDs on/off. More...
 
bool GetBelieveDeflines () const
 Retrieve current sequence ID parsing status. More...
 
bool GetSkipSeqCheck () const
 Retrieve status of sequence alphabet validation. More...
 
void SetSkipSeqCheck (bool skip)
 Turn validation of sequence on/off. More...
 
void SetRange (const TSeqRange &r)
 Set range for all sequences. More...
 
TSeqRangeSetRange (void)
 Set range for all sequences. More...
 
TSeqRange GetRange () const
 Get range for all sequences. More...
 
SDataLoaderConfigSetDataLoaderConfig ()
 Retrieve the data loader configuration object for manipulation. More...
 
const SDataLoaderConfigGetDataLoaderConfig ()
 Retrieve the data loader configuration object for read-only access. More...
 
bool IsProteinInput () const
 Determine if this object is for configuring reading protein sequences. More...
 
bool RetrieveSeqData () const
 True if the sequence data must be fetched. More...
 
void SetRetrieveSeqData (bool value)
 Turn on or off the retrieval of sequence data. More...
 
int GetLocalIdCounterInitValue () const
 Retrieve the local id counter initial value. More...
 
void SetLocalIdCounterInitValue (int val)
 Set the local id counter initial value. More...
 
const stringGetLocalIdPrefix () const
 Retrieve the custom prefix string used for generating local ids. More...
 
void SetLocalIdPrefix (const string &prefix)
 Set the custom prefix string used for generating local ids. More...
 
void SetQueryLocalIdMode ()
 Append query-specific prefix codes to all generated local ids. More...
 
void SetSubjectLocalIdMode ()
 Append subject-specific prefix codes to all generated local ids. More...
 
unsigned int GetSeqLenThreshold2Guess () const
 Retrieve the sequence length threshold to guess the molecule type. More...
 
void SetSeqLenThreshold2Guess (unsigned int val)
 Set the sequence length threshold to guess the molecule type. More...
 
bool GetConvertGapsToNs (void) const
 Retrieve gaps to Ns converstion option value. More...
 
void SetConvertGapsToNs (bool val)
 Turn on/off converting gaps to Ns in read FASTA sequences. More...
 

Static Public Attributes

static const unsigned int kSeqLenThreshold2Guess = 25
 This value and the seqlen_thresh2guess argument to this class' constructor are related as follows: if the default parameter value is used, then no sequence type guessing will occurs, instead the sequence type specified in CBlastInputSourceConfig::SDataLoader::m_IsLoadingProteins is assumed correct. More...
 

Private Attributes

objects::ENa_strand m_Strand
 Strand to assign to sequences. More...
 
bool m_LowerCaseMask
 Whether to save lowercase mask locs. More...
 
bool m_BelieveDeflines
 Whether to parse sequence IDs. More...
 
bool m_SkipSeqCheck
 Whether to validate sequence data -RMH-. More...
 
TSeqRange m_Range
 Sequence range. More...
 
SDataLoaderConfig m_DLConfig
 Configuration object for data loaders, used by CBlastInputReader. More...
 
bool m_RetrieveSeqData
 Configuration for CBlastInputReader. More...
 
int m_LocalIdCounter
 Initialization parameter to CSeqidGenerator. More...
 
unsigned int m_SeqLenThreshold2Guess
 The sequence length threshold to guess molecule type. More...
 
string m_LocalIdPrefix
 Custom prefix string passed to CSeqidGenerator. More...
 
bool m_GapsToNs
 Convert gaps to Ns in FASTA sequences. More...
 

Detailed Description

Class that centralizes the configuration data for sequences to be converted.

Definition at line 48 of file blast_input.hpp.

Constructor & Destructor Documentation

◆ CBlastInputSourceConfig()

CBlastInputSourceConfig::CBlastInputSourceConfig ( const SDataLoaderConfig dlconfig,
objects::ENa_strand  strand = objects::eNa_strand_other,
bool  lowercase = false,
bool  believe_defline = false,
TSeqRange  range = TSeqRange(),
bool  retrieve_seq_data = true,
int  local_id_counter = 1,
unsigned int  seqlen_thresh2guess = numeric_limits<unsigned int>::max(),
bool  skip_seq_check = false 
)

Constructor.

Parameters
dlconfigConfiguration object for the data loaders used in CBlastScopeSource [in]
strandAll SeqLoc types will have this strand assigned; If set to 'other', the strand will be set to 'unknown' for protein sequences and 'both' for nucleotide [in]
lowercaseIf true, lowercase mask locations are generated for all input sequences [in]
believe_deflineIf true, all sequences ID's are parsed; otherwise all sequences receive a local ID set to a monotonically increasing count value [in]
retrieve_seq_dataWhen gis/accessions are provided in the input, should the sequence data be fetched by this library?
rangeRange restriction for all sequences (default means no restriction). To support the specification of a single coordinate (start or stop), use the SetRange() method, the missing coordinate will be set the default value (e.g.: 0 for starting coordinate, sequence length for ending coordinate) [in]
seqlen_thresh2guesssequence length threshold for molecule type guessing (see kSeqLenThreshold2Guess) [in]
local_id_countercounter used to create the CSeqidGenerator to create local identifiers for sequences read [in]
skip_seq_checkWhen set this will avoid the sequence validation step when using the CFastaReader. -RMH-

Definition at line 48 of file blast_input.cpp.

References eNa_strand_both, eNa_strand_other, and eNa_strand_unknown.

◆ ~CBlastInputSourceConfig()

CBlastInputSourceConfig::~CBlastInputSourceConfig ( )
inline

Destructor.

Definition at line 111 of file blast_input.hpp.

Member Function Documentation

◆ GetBelieveDeflines()

bool CBlastInputSourceConfig::GetBelieveDeflines ( ) const
inline

Retrieve current sequence ID parsing status.

Returns
boolean to toggle parsing of seq IDs

Definition at line 140 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), and CBlastFastaInputSource::x_InitInputReader().

◆ GetConvertGapsToNs()

bool CBlastInputSourceConfig::GetConvertGapsToNs ( void  ) const
inline

Retrieve gaps to Ns converstion option value.

Definition at line 203 of file blast_input.hpp.

◆ GetDataLoaderConfig()

const SDataLoaderConfig& CBlastInputSourceConfig::GetDataLoaderConfig ( )
inline

Retrieve the data loader configuration object for read-only access.

Definition at line 168 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ GetLocalIdCounterInitValue()

int CBlastInputSourceConfig::GetLocalIdCounterInitValue ( ) const
inline

Retrieve the local id counter initial value.

Definition at line 180 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ GetLocalIdPrefix()

const string& CBlastInputSourceConfig::GetLocalIdPrefix ( ) const
inline

Retrieve the custom prefix string used for generating local ids.

Definition at line 185 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ GetLowercaseMask()

bool CBlastInputSourceConfig::GetLowercaseMask ( ) const
inline

Retrieve lowercase mask status.

Returns
boolean to toggle lowercase masking

Definition at line 130 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), CBlastFastaInputSource::GetNextSequence(), CBlastFastaInputSource::GetNextSSeqLoc(), and CBlastFastaInputSource::x_FastaToSeqLoc().

◆ GetRange()

TSeqRange CBlastInputSourceConfig::GetRange ( ) const
inline

Get range for all sequences.

Returns
range specified for all sequences

Definition at line 163 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_FastaToSeqLoc().

◆ GetSeqLenThreshold2Guess()

unsigned int CBlastInputSourceConfig::GetSeqLenThreshold2Guess ( ) const
inline

Retrieve the sequence length threshold to guess the molecule type.

Definition at line 194 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ GetSkipSeqCheck()

bool CBlastInputSourceConfig::GetSkipSeqCheck ( ) const
inline

Retrieve status of sequence alphabet validation.

Returns
boolean to toggle validation of seq data -RMH-

Definition at line 146 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ GetStrand()

objects::ENa_strand CBlastInputSourceConfig::GetStrand ( void  ) const
inline

Retrieve the current strand value.

Returns
the strand

Definition at line 120 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), and CBlastFastaInputSource::x_FastaToSeqLoc().

◆ IsProteinInput()

bool CBlastInputSourceConfig::IsProteinInput ( ) const
inline

Determine if this object is for configuring reading protein sequences.

Definition at line 171 of file blast_input.hpp.

◆ RetrieveSeqData()

bool CBlastInputSourceConfig::RetrieveSeqData ( ) const
inline

True if the sequence data must be fetched.

Definition at line 174 of file blast_input.hpp.

Referenced by CBlastFastaInputSource::x_InitInputReader().

◆ SetBelieveDeflines()

void CBlastInputSourceConfig::SetBelieveDeflines ( bool  believe)
inline

Turn parsing of sequence IDs on/off.

Parameters
believeboolean to toggle parsing of seq IDs

Definition at line 135 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), and ReadSequencesToBlast().

◆ SetConvertGapsToNs()

void CBlastInputSourceConfig::SetConvertGapsToNs ( bool  val)
inline

Turn on/off converting gaps to Ns in read FASTA sequences.

Definition at line 208 of file blast_input.hpp.

References val.

Referenced by ReadSequencesToBlast().

◆ SetDataLoaderConfig()

SDataLoaderConfig& CBlastInputSourceConfig::SetDataLoaderConfig ( )
inline

Retrieve the data loader configuration object for manipulation.

Definition at line 166 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetLocalIdCounterInitValue()

void CBlastInputSourceConfig::SetLocalIdCounterInitValue ( int  val)
inline

Set the local id counter initial value.

Definition at line 182 of file blast_input.hpp.

References val.

◆ SetLocalIdPrefix()

void CBlastInputSourceConfig::SetLocalIdPrefix ( const string prefix)
inline

Set the custom prefix string used for generating local ids.

Definition at line 187 of file blast_input.hpp.

References prefix.

◆ SetLowercaseMask()

void CBlastInputSourceConfig::SetLowercaseMask ( bool  mask)
inline

Turn lowercase masking on/off.

Parameters
maskboolean to toggle lowercase masking

Definition at line 125 of file blast_input.hpp.

References mask.

Referenced by BOOST_AUTO_TEST_CASE(), and ReadSequencesToBlast().

◆ SetQueryLocalIdMode()

void CBlastInputSourceConfig::SetQueryLocalIdMode ( )
inline

Append query-specific prefix codes to all generated local ids.

Definition at line 189 of file blast_input.hpp.

Referenced by CIgBlastnApp::Run(), CIgBlastpApp::Run(), CVecScreenApp::Run(), CVDBBlastnApp::Run(), and CVDBTblastnApp::Run().

◆ SetRange() [1/2]

void CBlastInputSourceConfig::SetRange ( const TSeqRange r)
inline

Set range for all sequences.

Parameters
rrange to use [in]

Definition at line 156 of file blast_input.hpp.

References r().

Referenced by BOOST_AUTO_TEST_CASE(), and ReadSequencesToBlast().

◆ SetRange() [2/2]

TSeqRange& CBlastInputSourceConfig::SetRange ( void  )
inline

Set range for all sequences.

Returns
range to modify

Definition at line 159 of file blast_input.hpp.

◆ SetRetrieveSeqData()

void CBlastInputSourceConfig::SetRetrieveSeqData ( bool  value)
inline

Turn on or off the retrieval of sequence data.

Parameters
valuetrue to turn on, false to turn off [in]

Definition at line 177 of file blast_input.hpp.

References rapidjson::value.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetSeqLenThreshold2Guess()

void CBlastInputSourceConfig::SetSeqLenThreshold2Guess ( unsigned int  val)
inline

Set the sequence length threshold to guess the molecule type.

Definition at line 198 of file blast_input.hpp.

References val.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetSkipSeqCheck()

void CBlastInputSourceConfig::SetSkipSeqCheck ( bool  skip)
inline

Turn validation of sequence on/off.

Parameters
skipboolean to toggle validation of sequence -RMH-

Definition at line 152 of file blast_input.hpp.

◆ SetStrand()

void CBlastInputSourceConfig::SetStrand ( objects::ENa_strand  strand)
inline

Set the strand to a specified value.

Parameters
strandThe strand value

Definition at line 116 of file blast_input.hpp.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetSubjectLocalIdMode()

void CBlastInputSourceConfig::SetSubjectLocalIdMode ( )
inline

Append subject-specific prefix codes to all generated local ids.

Definition at line 191 of file blast_input.hpp.

Referenced by ReadSequencesToBlast().

Member Data Documentation

◆ kSeqLenThreshold2Guess

const unsigned int CBlastInputSourceConfig::kSeqLenThreshold2Guess = 25
static

This value and the seqlen_thresh2guess argument to this class' constructor are related as follows: if the default parameter value is used, then no sequence type guessing will occurs, instead the sequence type specified in CBlastInputSourceConfig::SDataLoader::m_IsLoadingProteins is assumed correct.

If an alternate value is specified, then any sequences shorter than that length will be treated as described above, otherwise those sequences will have their sequence type guessed (and be subject to validation between what is guessed by CFastaReader and what is expected by CBlastInputSource).

By design, the default setting should be fine for command line BLAST search binaries, but on the BLAST web pages we use kSeqLenThreshold2Guess to validate sequences longer than that length, and to accept sequences shorter than that length.

See also
Implementation in CCustomizedFastaReader
TestSmallDubiousSequences unit test

Definition at line 71 of file blast_input.hpp.

◆ m_BelieveDeflines

bool CBlastInputSourceConfig::m_BelieveDeflines
private

Whether to parse sequence IDs.

Definition at line 218 of file blast_input.hpp.

◆ m_DLConfig

SDataLoaderConfig CBlastInputSourceConfig::m_DLConfig
private

Configuration object for data loaders, used by CBlastInputReader.

Definition at line 224 of file blast_input.hpp.

◆ m_GapsToNs

bool CBlastInputSourceConfig::m_GapsToNs
private

Convert gaps to Ns in FASTA sequences.

Definition at line 234 of file blast_input.hpp.

◆ m_LocalIdCounter

int CBlastInputSourceConfig::m_LocalIdCounter
private

Initialization parameter to CSeqidGenerator.

Definition at line 228 of file blast_input.hpp.

◆ m_LocalIdPrefix

string CBlastInputSourceConfig::m_LocalIdPrefix
private

Custom prefix string passed to CSeqidGenerator.

Definition at line 232 of file blast_input.hpp.

◆ m_LowerCaseMask

bool CBlastInputSourceConfig::m_LowerCaseMask
private

Whether to save lowercase mask locs.

Definition at line 216 of file blast_input.hpp.

◆ m_Range

TSeqRange CBlastInputSourceConfig::m_Range
private

Sequence range.

Definition at line 222 of file blast_input.hpp.

◆ m_RetrieveSeqData

bool CBlastInputSourceConfig::m_RetrieveSeqData
private

Configuration for CBlastInputReader.

Definition at line 226 of file blast_input.hpp.

◆ m_SeqLenThreshold2Guess

unsigned int CBlastInputSourceConfig::m_SeqLenThreshold2Guess
private

The sequence length threshold to guess molecule type.

Definition at line 230 of file blast_input.hpp.

◆ m_SkipSeqCheck

bool CBlastInputSourceConfig::m_SkipSeqCheck
private

Whether to validate sequence data -RMH-.

Definition at line 220 of file blast_input.hpp.

◆ m_Strand

objects::ENa_strand CBlastInputSourceConfig::m_Strand
private

Strand to assign to sequences.

Definition at line 214 of file blast_input.hpp.


The documentation for this class was generated from the following files:
Modified on Fri May 24 14:50:16 2024 by modify_doxy.py rev. 669887