NCBI C++ ToolKit
Public Types | Static Public Member Functions | Static Public Attributes | List of all members
COrf Class Reference

Search Toolkit Book for COrf

This class provides functions for finding all the ORFs of a specified minimum length in a DNA sequence. More...

#include <algo/sequence/orf.hpp>

+ Collaboration diagram for COrf:

Public Types

typedef vector< CRef< objects::CSeq_loc > > TLocVec
 

Static Public Member Functions

static void FindOrfs (const string &seq, TLocVec &results, unsigned int min_length_bp=3, int genetic_code=1, const vector< string > &allowable_starts=vector< string >(), bool longest_orfs=true, size_t max_seq_gap=k_default_max_seq_gap)
 Find ORFs in both orientations. More...
 
static void FindOrfs (const vector< char > &seq, TLocVec &results, unsigned int min_length_bp=3, int genetic_code=1, const vector< string > &allowable_starts=vector< string >(), bool longest_orfs=true, size_t max_seq_gap=k_default_max_seq_gap)
 
static void FindOrfs (const objects::CSeqVector &seq, TLocVec &results, unsigned int min_length_bp=3, int genetic_code=1, const vector< string > &allowable_starts=vector< string >(), bool longest_orfs=true, size_t max_seq_gap=k_default_max_seq_gap)
 
static vector< stringGetStartCodons (int genetic_code, bool include_atg, bool include_alt)
 Create vector of allowable_starts by genetic-code. More...
 
static void FindStrongKozakUOrfs (const objects::CSeqVector &seq, TSeqPos cds_start, TLocVec &overlap_results, TLocVec &non_overlap_results, unsigned int min_length_bp=3, unsigned int non_overlap_min_length_bp=105, int genetic_code=1, size_t max_seq_gap=k_default_max_seq_gap)
 Specifically find ORFS with a strong Kozak signal that are upstream of cds_start. More...
 
static CRef< objects::CSeq_annot > MakeCDSAnnot (const TLocVec &orfs, int genetic_code=1, objects::CSeq_id *id=NULL)
 / This version returns an annot full of CDS features. More...
 
template<typename TSeqType >
static double GetKozakIdentity (const TSeqType &iupacna_seq, size_t seq_len, size_t start_codon_pos)
 Calculate identity to Kozak PWM for vertebrates. More...
 

Static Public Attributes

static const size_t k_default_max_seq_gap = 30
 

Detailed Description

This class provides functions for finding all the ORFs of a specified minimum length in a DNA sequence.

Definition at line 52 of file orf.hpp.

Member Typedef Documentation

◆ TLocVec

typedef vector< CRef<objects::CSeq_loc> > COrf::TLocVec

Definition at line 55 of file orf.hpp.

Member Function Documentation

◆ FindOrfs() [1/3]

static void COrf::FindOrfs ( const objects::CSeqVector &  seq,
TLocVec results,
unsigned int  min_length_bp = 3,
int  genetic_code = 1,
const vector< string > &  allowable_starts = vector< string >(),
bool  longest_orfs = true,
size_t  max_seq_gap = k_default_max_seq_gap 
)
static

◆ FindOrfs() [2/3]

void COrf::FindOrfs ( const string seq,
TLocVec results,
unsigned int  min_length_bp = 3,
int  genetic_code = 1,
const vector< string > &  allowable_starts = vector<string>(),
bool  longest_orfs = true,
size_t  max_seq_gap = k_default_max_seq_gap 
)
static

Find ORFs in both orientations.

Circularity is ignored. Report results as Seq-locs (without seq-id set). Partial ORFs are trimmed to frame. seq must be in iupac. allowable_starts may contain "STOP" for stop-to-stop ORFs If

  • allowable_starts empty (the default) or just "STOP", longest_orfs = any - finds stop-to-stop ORFs, only ORFs near edges are marked partial
  • allowable_starts not empty, no "STOP", longest_orfs = false - find all proper or partial (if no start) ORFs
  • allowable_starts not empty, no "STOP", longest_orfs = true - find longest proper or partial (if no start) ORFs
  • allowable_starts not empty, plus "STOP", longest_orfs = false - find all proper or partial ORFs and stop-to-stop ORFs
  • allowable_starts not empty, plus "STOP", longest_orfs = true - find partial ORFs or stop-to-stop ORFs, only those without proper start are marked partial Do not allow more than max_seq_gap consecutive N-or-gap bases in an ORF, longer gaps split the sequence, resulting in no spanning ORFs and possibly partial ORFs before and after the gap ORFs below min_length_bp long are not reported

Definition at line 336 of file orf.cpp.

References s_FindOrfs().

Referenced by BOOST_AUTO_TEST_CASE(), FindStrongKozakUOrfs(), CSplign::GetCds(), CTestTranscript_Orfs::RunTest(), and COrfSearchJob::x_DoSearch().

◆ FindOrfs() [3/3]

void COrf::FindOrfs ( const vector< char > &  seq,
TLocVec results,
unsigned int  min_length_bp = 3,
int  genetic_code = 1,
const vector< string > &  allowable_starts = vector<string>(),
bool  longest_orfs = true,
size_t  max_seq_gap = k_default_max_seq_gap 
)
static

Definition at line 351 of file orf.cpp.

References s_FindOrfs().

◆ FindStrongKozakUOrfs()

void COrf::FindStrongKozakUOrfs ( const objects::CSeqVector &  seq,
TSeqPos  cds_start,
TLocVec overlap_results,
TLocVec non_overlap_results,
unsigned int  min_length_bp = 3,
unsigned int  non_overlap_min_length_bp = 105,
int  genetic_code = 1,
size_t  max_seq_gap = k_default_max_seq_gap 
)
static

Specifically find ORFS with a strong Kozak signal that are upstream of cds_start.

Separately report uORFS overlapping cds start and uORFs of sufficiantly length that don't overlap cds start

Definition at line 383 of file orf.cpp.

References eExtreme_Biological, eNa_strand_minus, eUnknown, FindOrfs(), CSeqVector::GetSeqData(), ITERATE, NCBI_THROW, and CSeqVector::size().

Referenced by TestStrongKozakUorfs().

◆ GetKozakIdentity()

template<typename TSeqType >
static double COrf::GetKozakIdentity ( const TSeqType &  iupacna_seq,
size_t  seq_len,
size_t  start_codon_pos 
)
inlinestatic

Calculate identity to Kozak PWM for vertebrates.

Definition at line 147 of file orf.hpp.

References i, and nuc.

◆ GetStartCodons()

vector< string > COrf::GetStartCodons ( int  genetic_code,
bool  include_atg,
bool  include_alt 
)
static

Create vector of allowable_starts by genetic-code.

satisfying CTrans_table::IsATGStart() and/or CTrans_table::IsAltStart()

Definition at line 300 of file orf.cpp.

References codons.

◆ MakeCDSAnnot()

CRef< CSeq_annot > COrf::MakeCDSAnnot ( const TLocVec orfs,
int  genetic_code = 1,
objects::CSeq_id *  id = NULL 
)
static

/ This version returns an annot full of CDS features.

/ Optionally takes a CSeq_id (by CRef) for use in / the feature table; otherwise ids are left unset. template<class Seq> static CRef<objects::CSeq_annot> FindOrfs(const Seq& seq, unsigned int min_length_bp = 3, int genetic_code = 1, CRef<objects::CSeq_id> id = CRef<objects::CSeq_id>()) { place to store orfs TLocVec orfs; FindOrfs(seq, orfs, min_length_bp, genetic_code); return MakeCDSAnnot(orfs, genetic_code, id); } Build an annot full of CDS features from CSeq_loc's. Optionally takes a CSeq_id (by CRef) for use in the feature table; otherwise ids are left unset.

Definition at line 438 of file orf.cpp.

References CSeq_feat_Base::eExp_ev_not_experimental, CCdregion_Base::eFrame_one, ITERATE, CSeq_annot_Base::SetData(), CSeq_feat_Base::SetData(), CSeq_feat_Base::SetExp_ev(), CSeq_feat_Base::SetLocation(), and CSeq_feat_Base::SetTitle().

Referenced by BOOST_AUTO_TEST_CASE().

Member Data Documentation

◆ k_default_max_seq_gap

const size_t COrf::k_default_max_seq_gap = 30
static

Definition at line 57 of file orf.hpp.


The documentation for this class was generated from the following files:
Modified on Sat May 25 14:18:43 2024 by modify_doxy.py rev. 669887