NCBI C++ ToolKit
Classes | Macros | Functions
blast_input_aux.hpp File Reference

Auxiliary classes/functions for BLAST input library. More...

#include <algo/blast/api/sseqloc.hpp>
#include <corelib/ncbiargs.hpp>
#include <objects/seqset/Bioseq_set.hpp>
+ Include dependency graph for blast_input_aux.hpp:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Go to the SVN repository for this file.

Classes

class  CAutoOutputFileReset
 Auxiliary class to store the name of an output file, which is reset every time its GetStream method is invoked. More...
 
class  CArgAllowMaximumFileNameLength
 Class to constrain the length of the file name passed to a given CArgDescriptions key. More...
 
class  CArgAllowValuesGreaterThanOrEqual
 Class to constrain the values of an argument to those greater than or equal to the value specified in the constructor. More...
 
class  CArgAllowValuesLessThanOrEqual
 Class to constrain the values of an argument to those less than or equal to the value specified in the constructor. More...
 
class  CArgAllowValuesBetween
 Class to constrain the values of an argument to those in between the values specified in the constructor. More...
 
class  CArgAllowIntegerSet
 
class  CArgAllowStringSet
 

Macros

#define DEFINE_CARGALLOW_SET_CLASS(ClassName, DataType, String2DataTypeFn)
 Macro to create a subclass of CArgAllow that allows the specification of sets of data. More...
 

Functions

TSeqRange ParseSequenceRange (const string &range_str, const char *error_prefix=NULL)
 Parse and extract a sequence range from argument provided to this function. More...
 
TSeqRange ParseSequenceRangeOpenEnd (const string &range_str, const char *error_prefix=NULL)
 Parse and extract a sequence range from argument provided to this function. More...
 
int GetQueryBatchSize (EProgram program, bool is_ungapped=false, bool remote=false, bool use_default=true, string task="", bool mt_mode=false)
 Retrieve the appropriate batch size for the specified task. More...
 
CRef< objects::CScope > ReadSequencesToBlast (CNcbiIstream &in, bool read_proteins, const TSeqRange &range, bool parse_deflines, bool use_lcase_masking, CRef< CBlastQueryVector > &sequences, bool gaps_to_Ns=false)
 Read sequence input for BLAST. More...
 
string CalculateFormattingParams (TSeqPos max_target_seqs, TSeqPos *num_descriptions, TSeqPos *num_alignments, TSeqPos *num_overview=NULL)
 Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a. More...
 
bool HasRawSequenceData (const objects::CBioseq &bioseq)
 Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field. More...
 
void CheckForEmptySequences (const TSeqLocVector &sequences, string &warnings)
 Inspect the sequences parameter for empty sequences. More...
 
void CheckForEmptySequences (CRef< CBlastQueryVector > sequences, string &warnings)
 Inspect the sequences parameter for empty sequences. More...
 
void CheckForEmptySequences (CRef< objects::CBioseq_set > sequences, string &warnings)
 Inspect the sequences parameter for empty sequences. More...
 

Detailed Description

Auxiliary classes/functions for BLAST input library.

Definition in file blast_input_aux.hpp.

Macro Definition Documentation

◆ DEFINE_CARGALLOW_SET_CLASS

#define DEFINE_CARGALLOW_SET_CLASS (   ClassName,
  DataType,
  String2DataTypeFn 
)

Macro to create a subclass of CArgAllow that allows the specification of sets of data.

Parameters
ClassNameName of the class to be created [in]
DataTypedata type of the allowed arguments [in]
String2DataTypeFnConversion function from a string to DataType [in]

Definition at line 201 of file blast_input_aux.hpp.

Function Documentation

◆ CalculateFormattingParams()

string CalculateFormattingParams ( TSeqPos  max_target_seqs,
TSeqPos num_descriptions,
TSeqPos num_alignments,
TSeqPos num_overview = NULL 
)

Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a.

: hitlist size).

Parameters
max_target_seqsthe hitlist size [in]
num_descriptionsthe number of one-line descriptions to show [out]
num_alignmentsthe number of alignments to show [out]
num_overviewthe number of sequences to show in the overview image displayed in the BLAST report on the web [out]
Returns
string containing warnings (if any)

Definition at line 250 of file blast_input_aux.cpp.

References _ASSERT, NStr::IntToString(), kDfltArgMaxTargetSequences, and min().

◆ CheckForEmptySequences() [1/3]

void CheckForEmptySequences ( const TSeqLocVector sequences,
string warnings 
)

Inspect the sequences parameter for empty sequences.

Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.

Parameters
sequencessequence set to inspect [in]
warningspopulated if empty sequence(s) are found among non-empty sequences [in|out]
Exceptions
CInputExceptionif there is only 1 empty sequence

Definition at line 368 of file blast_input_aux.cpp.

References GetLength(), i, ITERATE, NCBI_THROW, and query.

Referenced by BOOST_AUTO_TEST_CASE().

◆ CheckForEmptySequences() [2/3]

void CheckForEmptySequences ( CRef< CBlastQueryVector sequences,
string warnings 
)

Inspect the sequences parameter for empty sequences.

Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.

Parameters
sequencessequence set to inspect [in]
warningspopulated if empty sequence(s) are found among non-empty sequences [in|out]
Exceptions
CInputExceptionif there is only 1 empty sequence

Definition at line 332 of file blast_input_aux.cpp.

References CBlastQueryVector::Empty(), CRef< C, Locker >::Empty(), i, ITERATE, NCBI_THROW, and query.

◆ CheckForEmptySequences() [3/3]

void CheckForEmptySequences ( CRef< objects::CBioseq_set >  sequences,
string warnings 
)

Inspect the sequences parameter for empty sequences.

Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.

Parameters
sequencessequence set to inspect [in]
warningspopulated if empty sequence(s) are found among non-empty sequences [in|out]
Exceptions
CInputExceptionif there is only 1 empty sequence

◆ GetQueryBatchSize()

int GetQueryBatchSize ( EProgram  program,
bool  is_ungapped = false,
bool  remote = false,
bool  use_default = true,
string  task = "",
bool  mt_mode = false 
)

Retrieve the appropriate batch size for the specified task.

Parameters
programBLAST task [in]
is_ungappedtrue if ungapped BLAST search is requested [in]
remotetrue if remote BLAST search is requested [in]
use_defaulttrue if a defaut value should be returned [in]
tasktask (e.g., blastx-fast). Auto-set if empty [in]
mt_modethread by queries (true) or by database (false) [in]

Definition at line 70 of file blast_input_aux.cpp.

References _TRACE, eBlastn, eBlastp, eBlastx, eDiscMegablast, eMapper, eMegablast, EProgramToTaskName(), eTblastn, eTblastx, and NStr::StringToInt().

Referenced by BOOST_AUTO_TEST_CASE(), GetMTByQueriesBatchSize(), CBlastnAppArgs::GetQueryBatchSize(), CBlastnNodeArgs::GetQueryBatchSize(), CBlastpAppArgs::GetQueryBatchSize(), CBlastpNodeArgs::GetQueryBatchSize(), CBlastxAppArgs::GetQueryBatchSize(), CBlastxNodeArgs::GetQueryBatchSize(), CIgBlastnAppArgs::GetQueryBatchSize(), CIgBlastpAppArgs::GetQueryBatchSize(), CMagicBlastAppArgs::GetQueryBatchSize(), CPsiBlastAppArgs::GetQueryBatchSize(), CRMBlastnAppArgs::GetQueryBatchSize(), CRPSBlastAppArgs::GetQueryBatchSize(), CRPSBlastNodeArgs::GetQueryBatchSize(), CRPSTBlastnAppArgs::GetQueryBatchSize(), CRPSTBlastnNodeArgs::GetQueryBatchSize(), CTblastnAppArgs::GetQueryBatchSize(), CTblastnNodeArgs::GetQueryBatchSize(), CTblastxAppArgs::GetQueryBatchSize(), CBlastnVdbAppArgs::GetQueryBatchSize(), CTblastnVdbAppArgs::GetQueryBatchSize(), CDeltaBlastAppArgs::GetQueryBatchSize(), and CBlastInputDemoApplication::Run().

◆ HasRawSequenceData()

bool HasRawSequenceData ( const objects::CBioseq &  bioseq)

Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field.

Parameters
bioseqBioseq to examine [in]

Definition at line 302 of file blast_input_aux.cpp.

References CDelta_seq_Base::e_Loc, CSeq_inst_Base::eRepr_delta, CSeq_inst_Base::eRepr_virtual, CBlastBioseqMaker::IsEmptyBioseq(), and ITERATE.

Referenced by CBlastFastaInputSource::x_FastaToSeqLoc(), CBlastFormatterApp::x_QueryBioseqToSSeqLoc(), and CBlastFormatterVdbApp::x_QueryBioseqToSSeqLoc().

◆ ParseSequenceRange()

TSeqRange ParseSequenceRange ( const string range_str,
const char *  error_prefix = NULL 
)

Parse and extract a sequence range from argument provided to this function.

The format is N-M, where N and M are positive integers in 1-based offsets and N < M.

Parameters
range_strstring to extract the range from [in]
error_prefixerror message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in]
Returns
properly constructed range if parsing succeeded in 0-based offsets.
Exceptions
CStringExceptionor CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets)

Definition at line 146 of file blast_input_aux.cpp.

References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CQueryOptionsArgs::ExtractAlgorithmOptions().

◆ ParseSequenceRangeOpenEnd()

TSeqRange ParseSequenceRangeOpenEnd ( const string range_str,
const char *  error_prefix = NULL 
)

Parse and extract a sequence range from argument provided to this function.

The format is N-M, where N and M are positive integers in 1-based offsets and N < M. Open end range N- and single range N-N formats are supported.

Parameters
range_strstring to extract the range from [in]
error_prefixerror message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in]
Returns
properly constructed range if parsing succeeded in 0-based offsets.
Exceptions
CStringExceptionor CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets)

Definition at line 182 of file blast_input_aux.cpp.

References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().

Referenced by CBlastDBCmdApp::x_InitSearchRequest(), and CBlastDBCmdApp::x_ModifyConfigForBatchEntry().

◆ ReadSequencesToBlast()

CRef<objects::CScope> ReadSequencesToBlast ( CNcbiIstream in,
bool  read_proteins,
const TSeqRange range,
bool  parse_deflines,
bool  use_lcase_masking,
CRef< CBlastQueryVector > &  sequences,
bool  gaps_to_Ns = false 
)

Read sequence input for BLAST.

Parameters
ininput stream from which to read [in]
read_proteinsexpect proteins or nucleotides as input [in]
rangerange restriction to apply to sequences read [in]
parse_deflinestrue if the subject deflines should be parsed [in]
use_lcase_maskingtrue if the subject lowercase sequence characters should be interpreted as masked regions [in]
sequencesoutput will be placed here [in|out] @praram gaps_to_Ns convert all gaps in the sequences to Ns (only for nucleotide sequences) [in]
Returns
CScope object which contains all the sequences read

Definition at line 222 of file blast_input_aux.cpp.

References CObjectManager::GetInstance(), in(), input(), SDataLoaderConfig::OptimizeForWholeLargeSequenceRetrieval(), compile_time_bits::range(), CBlastInputSourceConfig::SetBelieveDeflines(), CBlastInputSourceConfig::SetConvertGapsToNs(), CBlastInputSourceConfig::SetLowercaseMask(), CBlastInputSourceConfig::SetRange(), and CBlastInputSourceConfig::SetSubjectLocalIdMode().

Referenced by CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CIgBlastArgs::ExtractAlgorithmOptions().

Modified on Tue May 21 11:02:02 2024 by modify_doxy.py rev. 669887