NCBI C++ ToolKit
|
Auxiliary classes/functions for BLAST input library. More...
#include <algo/blast/api/sseqloc.hpp>
#include <corelib/ncbiargs.hpp>
#include <objects/seqset/Bioseq_set.hpp>
Go to the source code of this file.
Go to the SVN repository for this file.
Classes | |
class | CAutoOutputFileReset |
Auxiliary class to store the name of an output file, which is reset every time its GetStream method is invoked. More... | |
class | CArgAllowMaximumFileNameLength |
Class to constrain the length of the file name passed to a given CArgDescriptions key. More... | |
class | CArgAllowValuesGreaterThanOrEqual |
Class to constrain the values of an argument to those greater than or equal to the value specified in the constructor. More... | |
class | CArgAllowValuesLessThanOrEqual |
Class to constrain the values of an argument to those less than or equal to the value specified in the constructor. More... | |
class | CArgAllowValuesBetween |
Class to constrain the values of an argument to those in between the values specified in the constructor. More... | |
class | CArgAllowIntegerSet |
class | CArgAllowStringSet |
Macros | |
#define | DEFINE_CARGALLOW_SET_CLASS(ClassName, DataType, String2DataTypeFn) |
Macro to create a subclass of CArgAllow that allows the specification of sets of data. More... | |
Functions | |
TSeqRange | ParseSequenceRange (const string &range_str, const char *error_prefix=NULL) |
Parse and extract a sequence range from argument provided to this function. More... | |
TSeqRange | ParseSequenceRangeOpenEnd (const string &range_str, const char *error_prefix=NULL) |
Parse and extract a sequence range from argument provided to this function. More... | |
int | GetQueryBatchSize (EProgram program, bool is_ungapped=false, bool remote=false, bool use_default=true, string task="", bool mt_mode=false) |
Retrieve the appropriate batch size for the specified task. More... | |
CRef< objects::CScope > | ReadSequencesToBlast (CNcbiIstream &in, bool read_proteins, const TSeqRange &range, bool parse_deflines, bool use_lcase_masking, CRef< CBlastQueryVector > &sequences, bool gaps_to_Ns=false) |
Read sequence input for BLAST. More... | |
string | CalculateFormattingParams (TSeqPos max_target_seqs, TSeqPos *num_descriptions, TSeqPos *num_alignments, TSeqPos *num_overview=NULL) |
Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a. More... | |
bool | HasRawSequenceData (const objects::CBioseq &bioseq) |
Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field. More... | |
void | CheckForEmptySequences (const TSeqLocVector &sequences, string &warnings) |
Inspect the sequences parameter for empty sequences. More... | |
void | CheckForEmptySequences (CRef< CBlastQueryVector > sequences, string &warnings) |
Inspect the sequences parameter for empty sequences. More... | |
void | CheckForEmptySequences (CRef< objects::CBioseq_set > sequences, string &warnings) |
Inspect the sequences parameter for empty sequences. More... | |
Auxiliary classes/functions for BLAST input library.
Definition in file blast_input_aux.hpp.
#define DEFINE_CARGALLOW_SET_CLASS | ( | ClassName, | |
DataType, | |||
String2DataTypeFn | |||
) |
Macro to create a subclass of CArgAllow that allows the specification of sets of data.
ClassName | Name of the class to be created [in] |
DataType | data type of the allowed arguments [in] |
String2DataTypeFn | Conversion function from a string to DataType [in] |
Definition at line 201 of file blast_input_aux.hpp.
string CalculateFormattingParams | ( | TSeqPos | max_target_seqs, |
TSeqPos * | num_descriptions, | ||
TSeqPos * | num_alignments, | ||
TSeqPos * | num_overview = NULL |
||
) |
Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a.
: hitlist size).
max_target_seqs | the hitlist size [in] |
num_descriptions | the number of one-line descriptions to show [out] |
num_alignments | the number of alignments to show [out] |
num_overview | the number of sequences to show in the overview image displayed in the BLAST report on the web [out] |
Definition at line 250 of file blast_input_aux.cpp.
References _ASSERT, NStr::IntToString(), kDfltArgMaxTargetSequences, and min().
void CheckForEmptySequences | ( | const TSeqLocVector & | sequences, |
string & | warnings | ||
) |
Inspect the sequences parameter for empty sequences.
Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.
sequences | sequence set to inspect [in] |
warnings | populated if empty sequence(s) are found among non-empty sequences [in|out] |
CInputException | if there is only 1 empty sequence |
Definition at line 368 of file blast_input_aux.cpp.
References GetLength(), i, ITERATE, NCBI_THROW, and query.
Referenced by BOOST_AUTO_TEST_CASE().
void CheckForEmptySequences | ( | CRef< CBlastQueryVector > | sequences, |
string & | warnings | ||
) |
Inspect the sequences parameter for empty sequences.
Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.
sequences | sequence set to inspect [in] |
warnings | populated if empty sequence(s) are found among non-empty sequences [in|out] |
CInputException | if there is only 1 empty sequence |
Definition at line 332 of file blast_input_aux.cpp.
References CBlastQueryVector::Empty(), CRef< C, Locker >::Empty(), i, ITERATE, NCBI_THROW, and query.
Inspect the sequences parameter for empty sequences.
Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.
sequences | sequence set to inspect [in] |
warnings | populated if empty sequence(s) are found among non-empty sequences [in|out] |
CInputException | if there is only 1 empty sequence |
int GetQueryBatchSize | ( | EProgram | program, |
bool | is_ungapped = false , |
||
bool | remote = false , |
||
bool | use_default = true , |
||
string | task = "" , |
||
bool | mt_mode = false |
||
) |
Retrieve the appropriate batch size for the specified task.
program | BLAST task [in] |
is_ungapped | true if ungapped BLAST search is requested [in] |
remote | true if remote BLAST search is requested [in] |
use_default | true if a defaut value should be returned [in] |
task | task (e.g., blastx-fast). Auto-set if empty [in] |
mt_mode | thread by queries (true) or by database (false) [in] |
Definition at line 70 of file blast_input_aux.cpp.
References _TRACE, eBlastn, eBlastp, eBlastx, eDiscMegablast, eMapper, eMegablast, EProgramToTaskName(), eTblastn, eTblastx, and NStr::StringToInt().
Referenced by BOOST_AUTO_TEST_CASE(), GetMTByQueriesBatchSize(), CBlastnAppArgs::GetQueryBatchSize(), CBlastnNodeArgs::GetQueryBatchSize(), CBlastpAppArgs::GetQueryBatchSize(), CBlastpNodeArgs::GetQueryBatchSize(), CBlastxAppArgs::GetQueryBatchSize(), CBlastxNodeArgs::GetQueryBatchSize(), CIgBlastnAppArgs::GetQueryBatchSize(), CIgBlastpAppArgs::GetQueryBatchSize(), CMagicBlastAppArgs::GetQueryBatchSize(), CPsiBlastAppArgs::GetQueryBatchSize(), CRMBlastnAppArgs::GetQueryBatchSize(), CRPSBlastAppArgs::GetQueryBatchSize(), CRPSBlastNodeArgs::GetQueryBatchSize(), CRPSTBlastnAppArgs::GetQueryBatchSize(), CRPSTBlastnNodeArgs::GetQueryBatchSize(), CTblastnAppArgs::GetQueryBatchSize(), CTblastnNodeArgs::GetQueryBatchSize(), CTblastxAppArgs::GetQueryBatchSize(), CBlastnVdbAppArgs::GetQueryBatchSize(), CTblastnVdbAppArgs::GetQueryBatchSize(), CDeltaBlastAppArgs::GetQueryBatchSize(), and CBlastInputDemoApplication::Run().
Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field.
bioseq | Bioseq to examine [in] |
Definition at line 302 of file blast_input_aux.cpp.
References CDelta_seq_Base::e_Loc, CSeq_inst_Base::eRepr_delta, CSeq_inst_Base::eRepr_virtual, CBlastBioseqMaker::IsEmptyBioseq(), and ITERATE.
Referenced by CBlastFastaInputSource::x_FastaToSeqLoc(), CBlastFormatterApp::x_QueryBioseqToSSeqLoc(), and CBlastFormatterVdbApp::x_QueryBioseqToSSeqLoc().
Parse and extract a sequence range from argument provided to this function.
The format is N-M, where N and M are positive integers in 1-based offsets and N < M.
range_str | string to extract the range from [in] |
error_prefix | error message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in] |
CStringException | or CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets) |
Definition at line 146 of file blast_input_aux.cpp.
References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().
Referenced by BOOST_AUTO_TEST_CASE(), CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CQueryOptionsArgs::ExtractAlgorithmOptions().
Parse and extract a sequence range from argument provided to this function.
The format is N-M, where N and M are positive integers in 1-based offsets and N < M. Open end range N- and single range N-N formats are supported.
range_str | string to extract the range from [in] |
error_prefix | error message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in] |
CStringException | or CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets) |
Definition at line 182 of file blast_input_aux.cpp.
References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().
Referenced by CBlastDBCmdApp::x_InitSearchRequest(), and CBlastDBCmdApp::x_ModifyConfigForBatchEntry().
CRef<objects::CScope> ReadSequencesToBlast | ( | CNcbiIstream & | in, |
bool | read_proteins, | ||
const TSeqRange & | range, | ||
bool | parse_deflines, | ||
bool | use_lcase_masking, | ||
CRef< CBlastQueryVector > & | sequences, | ||
bool | gaps_to_Ns = false |
||
) |
Read sequence input for BLAST.
in | input stream from which to read [in] |
read_proteins | expect proteins or nucleotides as input [in] |
range | range restriction to apply to sequences read [in] |
parse_deflines | true if the subject deflines should be parsed [in] |
use_lcase_masking | true if the subject lowercase sequence characters should be interpreted as masked regions [in] |
sequences | output will be placed here [in|out] @praram gaps_to_Ns convert all gaps in the sequences to Ns (only for nucleotide sequences) [in] |
Definition at line 222 of file blast_input_aux.cpp.
References CObjectManager::GetInstance(), in(), input(), SDataLoaderConfig::OptimizeForWholeLargeSequenceRetrieval(), compile_time_bits::range(), CBlastInputSourceConfig::SetBelieveDeflines(), CBlastInputSourceConfig::SetConvertGapsToNs(), CBlastInputSourceConfig::SetLowercaseMask(), CBlastInputSourceConfig::SetRange(), and CBlastInputSourceConfig::SetSubjectLocalIdMode().
Referenced by CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CIgBlastArgs::ExtractAlgorithmOptions().