NCBI C++ ToolKit
Functions
blast_input_aux.cpp File Reference

Auxiliary functions for BLAST input library. More...

#include <ncbi_pch.hpp>
#include <algo/blast/blastinput/blast_input_aux.hpp>
#include <algo/blast/api/blast_exception.hpp>
#include <serial/iterator.hpp>
#include <algo/blast/blastinput/blast_fasta_input.hpp>
#include <algo/blast/blastinput/psiblast_args.hpp>
#include <objects/seq/Delta_seq.hpp>
#include <objects/seq/Delta_ext.hpp>
#include <objects/seq/Seq_ext.hpp>
#include <objmgr/util/seq_loc_util.hpp>
+ Include dependency graph for blast_input_aux.cpp:

Go to the source code of this file.

Go to the SVN repository for this file.

Functions

int GetQueryBatchSize (EProgram program, bool is_ungapped, bool is_remote, bool use_default, string task_name, bool mt_mode)
 Retrieve the appropriate batch size for the specified task. More...
 
TSeqRange ParseSequenceRange (const string &range_str, const char *error_prefix)
 Parse and extract a sequence range from argument provided to this function. More...
 
TSeqRange ParseSequenceRangeOpenEnd (const string &range_str, const char *error_prefix)
 Parse and extract a sequence range from argument provided to this function. More...
 
CRef< CScopeReadSequencesToBlast (CNcbiIstream &in, bool read_proteins, const TSeqRange &range, bool parse_deflines, bool use_lcase_masking, CRef< CBlastQueryVector > &sequences, bool gaps_to_Ns)
 Read sequence input for BLAST. More...
 
string CalculateFormattingParams (TSeqPos max_target_seqs, TSeqPos *num_descriptions, TSeqPos *num_alignments, TSeqPos *num_overview)
 Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a. More...
 
bool HasRawSequenceData (const objects::CBioseq &bioseq)
 Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field. More...
 
void CheckForEmptySequences (CRef< CBlastQueryVector > sequences, string &warnings)
 Inspect the sequences parameter for empty sequences. More...
 
void CheckForEmptySequences (const TSeqLocVector &sequences, string &warnings)
 Inspect the sequences parameter for empty sequences. More...
 
void CheckForEmptySequences (CRef< CBioseq_set > sequences, string &warnings)
 

Detailed Description

Auxiliary functions for BLAST input library.

Definition in file blast_input_aux.cpp.

Function Documentation

◆ CalculateFormattingParams()

string CalculateFormattingParams ( TSeqPos  max_target_seqs,
TSeqPos num_descriptions,
TSeqPos num_alignments,
TSeqPos num_overview = NULL 
)

Calculates the formatting parameters based on the maximum number of target sequences selected (a.k.a.

: hitlist size).

Parameters
max_target_seqsthe hitlist size [in]
num_descriptionsthe number of one-line descriptions to show [out]
num_alignmentsthe number of alignments to show [out]
num_overviewthe number of sequences to show in the overview image displayed in the BLAST report on the web [out]
Returns
string containing warnings (if any)

Definition at line 250 of file blast_input_aux.cpp.

References _ASSERT, NStr::IntToString(), kDfltArgMaxTargetSequences, and min().

◆ CheckForEmptySequences() [1/3]

void CheckForEmptySequences ( const TSeqLocVector sequences,
string warnings 
)

Inspect the sequences parameter for empty sequences.

Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.

Parameters
sequencessequence set to inspect [in]
warningspopulated if empty sequence(s) are found among non-empty sequences [in|out]
Exceptions
CInputExceptionif there is only 1 empty sequence

Definition at line 368 of file blast_input_aux.cpp.

References GetLength(), i, ITERATE, NCBI_THROW, and query.

Referenced by BOOST_AUTO_TEST_CASE().

◆ CheckForEmptySequences() [2/3]

void CheckForEmptySequences ( CRef< CBioseq_set sequences,
string warnings 
)

Definition at line 404 of file blast_input_aux.cpp.

References ConstBegin(), eDetectLoops, CRef< C, Locker >::Empty(), i, and NCBI_THROW.

◆ CheckForEmptySequences() [3/3]

void CheckForEmptySequences ( CRef< CBlastQueryVector sequences,
string warnings 
)

Inspect the sequences parameter for empty sequences.

Returns a non-empty string in the warnings parameter if there are empty sequence(s) in its first parameter.

Parameters
sequencessequence set to inspect [in]
warningspopulated if empty sequence(s) are found among non-empty sequences [in|out]
Exceptions
CInputExceptionif there is only 1 empty sequence

Definition at line 332 of file blast_input_aux.cpp.

References CBlastQueryVector::Empty(), CRef< C, Locker >::Empty(), i, ITERATE, NCBI_THROW, and query.

◆ GetQueryBatchSize()

int GetQueryBatchSize ( EProgram  program,
bool  is_ungapped = false,
bool  remote = false,
bool  use_default = true,
string  task = "",
bool  mt_mode = false 
)

Retrieve the appropriate batch size for the specified task.

Parameters
programBLAST task [in]
is_ungappedtrue if ungapped BLAST search is requested [in]
remotetrue if remote BLAST search is requested [in]
use_defaulttrue if a defaut value should be returned [in]
tasktask (e.g., blastx-fast). Auto-set if empty [in]
mt_modethread by queries (true) or by database (false) [in]

Definition at line 70 of file blast_input_aux.cpp.

References _TRACE, eBlastn, eBlastp, eBlastx, eDiscMegablast, eMapper, eMegablast, EProgramToTaskName(), eTblastn, eTblastx, and NStr::StringToInt().

Referenced by BOOST_AUTO_TEST_CASE(), GetMTByQueriesBatchSize(), CBlastnAppArgs::GetQueryBatchSize(), CBlastnNodeArgs::GetQueryBatchSize(), CBlastpAppArgs::GetQueryBatchSize(), CBlastpNodeArgs::GetQueryBatchSize(), CBlastxAppArgs::GetQueryBatchSize(), CBlastxNodeArgs::GetQueryBatchSize(), CIgBlastnAppArgs::GetQueryBatchSize(), CIgBlastpAppArgs::GetQueryBatchSize(), CMagicBlastAppArgs::GetQueryBatchSize(), CPsiBlastAppArgs::GetQueryBatchSize(), CRMBlastnAppArgs::GetQueryBatchSize(), CRPSBlastAppArgs::GetQueryBatchSize(), CRPSBlastNodeArgs::GetQueryBatchSize(), CRPSTBlastnAppArgs::GetQueryBatchSize(), CRPSTBlastnNodeArgs::GetQueryBatchSize(), CTblastnAppArgs::GetQueryBatchSize(), CTblastnNodeArgs::GetQueryBatchSize(), CTblastxAppArgs::GetQueryBatchSize(), CBlastnVdbAppArgs::GetQueryBatchSize(), CTblastnVdbAppArgs::GetQueryBatchSize(), CDeltaBlastAppArgs::GetQueryBatchSize(), and CBlastInputDemoApplication::Run().

◆ HasRawSequenceData()

bool HasRawSequenceData ( const objects::CBioseq &  bioseq)

Returns true if the Bioseq passed as argument has the full, raw sequence data in its Seq-inst field.

Parameters
bioseqBioseq to examine [in]

Definition at line 302 of file blast_input_aux.cpp.

References CDelta_seq_Base::e_Loc, CSeq_inst_Base::eRepr_delta, CSeq_inst_Base::eRepr_virtual, CBlastBioseqMaker::IsEmptyBioseq(), and ITERATE.

Referenced by CBlastFastaInputSource::x_FastaToSeqLoc(), CBlastFormatterApp::x_QueryBioseqToSSeqLoc(), and CBlastFormatterVdbApp::x_QueryBioseqToSSeqLoc().

◆ ParseSequenceRange()

TSeqRange ParseSequenceRange ( const string range_str,
const char *  error_prefix = NULL 
)

Parse and extract a sequence range from argument provided to this function.

The format is N-M, where N and M are positive integers in 1-based offsets and N < M.

Parameters
range_strstring to extract the range from [in]
error_prefixerror message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in]
Returns
properly constructed range if parsing succeeded in 0-based offsets.
Exceptions
CStringExceptionor CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets)

Definition at line 146 of file blast_input_aux.cpp.

References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CQueryOptionsArgs::ExtractAlgorithmOptions().

◆ ParseSequenceRangeOpenEnd()

TSeqRange ParseSequenceRangeOpenEnd ( const string range_str,
const char *  error_prefix = NULL 
)

Parse and extract a sequence range from argument provided to this function.

The format is N-M, where N and M are positive integers in 1-based offsets and N < M. Open end range N- and single range N-N formats are supported.

Parameters
range_strstring to extract the range from [in]
error_prefixerror message prefix which will be encoded in the exception thrown in case of error (if NULL a default message will be used) [in]
Returns
properly constructed range if parsing succeeded in 0-based offsets.
Exceptions
CStringExceptionor CBlastException with error code eInvalidArgument if parsing fails or the range is invalid (i.e.: empty, negative, N>M, in 0-based offsets)

Definition at line 182 of file blast_input_aux.cpp.

References NCBI_THROW, CRange_Base::SetFrom(), CRange_Base::SetTo(), NStr::Split(), and NStr::StringToInt().

Referenced by CBlastDBCmdApp::x_InitSearchRequest(), and CBlastDBCmdApp::x_ModifyConfigForBatchEntry().

◆ ReadSequencesToBlast()

CRef<CScope> ReadSequencesToBlast ( CNcbiIstream in,
bool  read_proteins,
const TSeqRange range,
bool  parse_deflines,
bool  use_lcase_masking,
CRef< CBlastQueryVector > &  sequences,
bool  gaps_to_Ns = false 
)

Read sequence input for BLAST.

Parameters
ininput stream from which to read [in]
read_proteinsexpect proteins or nucleotides as input [in]
rangerange restriction to apply to sequences read [in]
parse_deflinestrue if the subject deflines should be parsed [in]
use_lcase_maskingtrue if the subject lowercase sequence characters should be interpreted as masked regions [in]
sequencesoutput will be placed here [in|out] @praram gaps_to_Ns convert all gaps in the sequences to Ns (only for nucleotide sequences) [in]
Returns
CScope object which contains all the sequences read

Definition at line 222 of file blast_input_aux.cpp.

References CObjectManager::GetInstance(), in(), input(), SDataLoaderConfig::OptimizeForWholeLargeSequenceRetrieval(), compile_time_bits::range(), CBlastInputSourceConfig::SetBelieveDeflines(), CBlastInputSourceConfig::SetConvertGapsToNs(), CBlastInputSourceConfig::SetLowercaseMask(), CBlastInputSourceConfig::SetRange(), and CBlastInputSourceConfig::SetSubjectLocalIdMode().

Referenced by CBlastDatabaseArgs::ExtractAlgorithmOptions(), and CIgBlastArgs::ExtractAlgorithmOptions().

Modified on Tue Apr 23 07:37:57 2024 by modify_doxy.py rev. 669887