NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Attributes | Static Protected Attributes | Static Private Member Functions | List of all members
CSparseKmerCounts Class Reference

Search Toolkit Book for CSparseKmerCounts

Kmer counts for alignment free sequence similarity computation implemented as a sparse vector. More...

#include <algo/cobalt/kmercounts.hpp>

Classes

struct  SVectorElement
 Element of the sparse vector. More...
 

Public Types

typedef Uint1 TCount
 
typedef vector< SVectorElement >::const_iterator TNonZeroCounts_CI
 

Public Member Functions

 CSparseKmerCounts (void)
 Create empty counts vector. More...
 
 CSparseKmerCounts (const objects::CSeq_loc &seq, objects::CScope &scope)
 Create k-mer counts vector from SSeqLoc with defalut k-mer length and alphabet size. More...
 
void Reset (const objects::CSeq_loc &seq, objects::CScope &scope)
 Reset the counts vector. More...
 
unsigned int GetSeqLength (void) const
 Get sequence length. More...
 
unsigned int GetNumCounts (void) const
 Get number of all k-mers found in the sequence. More...
 
TNonZeroCounts_CI BeginNonZero (void) const
 Get non-zero counts iterator. More...
 
TNonZeroCounts_CI EndNonZero (void) const
 Get non-zero counts iterator. More...
 
CNcbiOstreamPrint (CNcbiOstream &ostr) const
 Print counts. More...
 

Static Public Member Functions

static unsigned int GetKmerLength (void)
 Get default kmer length. More...
 
static unsigned int GetAlphabetSize (void)
 Get default alphabet size. More...
 
static void SetKmerLength (unsigned len)
 Set default k-mer length. More...
 
static void SetAlphabetSize (unsigned size)
 Set Default alphabet size. More...
 
static vector< Uint1 > & SetTransTable (void)
 Set default compressed alphabet letter translation table. More...
 
static void SetUseCompressed (bool use_comp)
 Set default option for using compressed alphabet. More...
 
static double FractionCommonKmersDist (const CSparseKmerCounts &vect1, const CSparseKmerCounts &vect2)
 
static double FractionCommonKmersGlobalDist (const CSparseKmerCounts &v1, const CSparseKmerCounts &v2)
 
static unsigned int CountCommonKmers (const CSparseKmerCounts &v1, const CSparseKmerCounts &v2, bool repetitions=true)
 Copmute number of common kmers between two count vectors. More...
 
static void PreCount (void)
 Perform preparations before k-mer counting common to all sequences. More...
 
static void PostCount (void)
 Perform post-kmer counting tasks. More...
 

Protected Attributes

vector< SVectorElementm_Counts
 
unsigned int m_SeqLength
 
unsigned int m_NumCounts
 

Static Protected Attributes

static unsigned int sm_KmerLength = 4
 
static unsigned int sm_AlphabetSize = kAlphabetSize
 
static vector< Uint1sm_TransTable
 
static bool sm_UseCompressed = false
 
static TCountsm_Buffer = NULL
 
static bool sm_ForceSmallerMem = false
 
static const unsigned int kLengthBitsThreshold = 32
 

Static Private Member Functions

static TCountReserveCountsMem (unsigned int num_bits)
 
static Uint4 GetAALetter (Uint1 letter)
 
static bool InitPosBits (const objects::CSeqVector &sv, Uint4 &pos, unsigned int &index, Uint4 num_bits, Uint4 kmer_len)
 Initializes element index as bit vector for first k letters, skipping Xaa. More...
 

Detailed Description

Kmer counts for alignment free sequence similarity computation implemented as a sparse vector.

Definition at line 60 of file kmercounts.hpp.

Member Typedef Documentation

◆ TCount

Definition at line 63 of file kmercounts.hpp.

◆ TNonZeroCounts_CI

typedef vector<SVectorElement>::const_iterator CSparseKmerCounts::TNonZeroCounts_CI

Definition at line 80 of file kmercounts.hpp.

Constructor & Destructor Documentation

◆ CSparseKmerCounts() [1/2]

CSparseKmerCounts::CSparseKmerCounts ( void  )
inline

Create empty counts vector.

Definition at line 86 of file kmercounts.hpp.

◆ CSparseKmerCounts() [2/2]

CSparseKmerCounts::CSparseKmerCounts ( const objects::CSeq_loc &  seq,
objects::CScope &  scope 
)

Create k-mer counts vector from SSeqLoc with defalut k-mer length and alphabet size.

Parameters
seqThe sequence to be represented as k-mer counts [in]
scopeScope

Definition at line 69 of file kmercounts.cpp.

References Reset().

Member Function Documentation

◆ BeginNonZero()

TNonZeroCounts_CI CSparseKmerCounts::BeginNonZero ( void  ) const
inline

Get non-zero counts iterator.

Returns
Non-zero counts iterator pointing to the begining

Definition at line 126 of file kmercounts.hpp.

Referenced by Print().

◆ CountCommonKmers()

unsigned int CSparseKmerCounts::CountCommonKmers ( const CSparseKmerCounts v1,
const CSparseKmerCounts v2,
bool  repetitions = true 
)
static

Copmute number of common kmers between two count vectors.

Parameters
v1K-mer counts vector [in]
v2K-mer counts vecotr [in]
repetitionsShould multiple copies of the same k-mer be counted
Returns
Number of k-mers that are present in both counts vectors

Definition at line 409 of file kmercounts.cpp.

References m_Counts, and result.

Referenced by BOOST_AUTO_TEST_CASE(), FractionCommonKmersDist(), and FractionCommonKmersGlobalDist().

◆ EndNonZero()

TNonZeroCounts_CI CSparseKmerCounts::EndNonZero ( void  ) const
inline

Get non-zero counts iterator.

Returns
Non-zero counts iterator pointing to the end

Definition at line 131 of file kmercounts.hpp.

Referenced by Print().

◆ FractionCommonKmersDist()

double CSparseKmerCounts::FractionCommonKmersDist ( const CSparseKmerCounts vect1,
const CSparseKmerCounts vect2 
)
static
  • k + 1), where t - k-mer, n_x(t) - number of k-mer t in x, L_x - length of x excluding Xaa, k - k-mer length F(x, y) is described in RC Edgar, BMC Bioinformatics 5:113, 2004

Definition at line 369 of file kmercounts.cpp.

References _ASSERT, CountCommonKmers(), GetKmerLength(), GetNumCounts(), and v2.

◆ FractionCommonKmersGlobalDist()

double CSparseKmerCounts::FractionCommonKmersGlobalDist ( const CSparseKmerCounts v1,
const CSparseKmerCounts v2 
)
static
  • k + 1), where t - k-mer, n_x(t) - number of k-mer t in x, L_x - length of x excluding Xaa, k - k-mer length F(x, y) is modified version of measure presented RC Edgar, BMC Bioinformatics 5:113, 2004

Definition at line 389 of file kmercounts.cpp.

References _ASSERT, CountCommonKmers(), GetKmerLength(), GetNumCounts(), and v2.

◆ GetAALetter()

static Uint4 CSparseKmerCounts::GetAALetter ( Uint1  letter)
inlinestaticprivate

Definition at line 215 of file kmercounts.hpp.

References _ASSERT, int, and letter().

Referenced by InitPosBits(), and Reset().

◆ GetAlphabetSize()

static unsigned int CSparseKmerCounts::GetAlphabetSize ( void  )
inlinestatic

Get default alphabet size.

Returns
Default alphabet size

Definition at line 121 of file kmercounts.hpp.

◆ GetKmerLength()

static unsigned int CSparseKmerCounts::GetKmerLength ( void  )
inlinestatic

Get default kmer length.

Returns
Default k-mer length

Definition at line 115 of file kmercounts.hpp.

References sm_KmerLength.

Referenced by BOOST_AUTO_TEST_CASE(), FractionCommonKmersDist(), and FractionCommonKmersGlobalDist().

◆ GetNumCounts()

unsigned int CSparseKmerCounts::GetNumCounts ( void  ) const
inline

Get number of all k-mers found in the sequence.

Returns
Number of all k-mers

Definition at line 110 of file kmercounts.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), FractionCommonKmersDist(), and FractionCommonKmersGlobalDist().

◆ GetSeqLength()

unsigned int CSparseKmerCounts::GetSeqLength ( void  ) const
inline

Get sequence length.

Returns
Sequence length

Definition at line 105 of file kmercounts.hpp.

◆ InitPosBits()

bool CSparseKmerCounts::InitPosBits ( const objects::CSeqVector &  sv,
Uint4 pos,
unsigned int index,
Uint4  num_bits,
Uint4  kmer_len 
)
staticprivate

Initializes element index as bit vector for first k letters, skipping Xaa.

Parameters
svSequence [in]
posElement index in sparse vector [out]
indexIndex of letter in the sequence where k-mer counting starts. At exit index points to the next letter after first k-mer [in|out]
num_bitsNumber of bits in pos per letter [in]
kmer_lenK-mer length [in]
Returns
True if pos was initialized, false otherwise (if no k-mer without X was found)

Definition at line 82 of file kmercounts.cpp.

References GetAALetter(), i, and kXaa.

Referenced by Reset().

◆ PostCount()

void CSparseKmerCounts::PostCount ( void  )
static

Perform post-kmer counting tasks.

Free buffer.

Definition at line 471 of file kmercounts.cpp.

References NULL, sm_Buffer, and sm_ForceSmallerMem.

◆ PreCount()

void CSparseKmerCounts::PreCount ( void  )
static

Perform preparations before k-mer counting common to all sequences.

Allocate buffer for storing temporary counts

Definition at line 457 of file kmercounts.cpp.

References mask, ReserveCountsMem(), sm_AlphabetSize, and sm_Buffer.

◆ Print()

CNcbiOstream & CSparseKmerCounts::Print ( CNcbiOstream ostr) const

Print counts.

Parameters
ostrOutput stream [in|out]
Returns
Output stream

Definition at line 481 of file kmercounts.cpp.

References BeginNonZero(), EndNonZero(), int, and NcbiEndl.

◆ ReserveCountsMem()

CSparseKmerCounts::TCount * CSparseKmerCounts::ReserveCountsMem ( unsigned int  num_bits)
staticprivate

Definition at line 121 of file kmercounts.cpp.

References kLengthBitsThreshold, NCBI_THROW, NULL, sm_AlphabetSize, sm_ForceSmallerMem, and sm_KmerLength.

Referenced by PreCount(), and Reset().

◆ Reset()

void CSparseKmerCounts::Reset ( const objects::CSeq_loc &  seq,
objects::CScope &  scope 
)

◆ SetAlphabetSize()

static void CSparseKmerCounts::SetAlphabetSize ( unsigned  size)
inlinestatic

Set Default alphabet size.

Parameters
sizeDefault alphabet size [in]

Definition at line 148 of file kmercounts.hpp.

References ncbi::grid::netcache::search::fields::size.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetKmerLength()

static void CSparseKmerCounts::SetKmerLength ( unsigned  len)
inlinestatic

Set default k-mer length.

Parameters
lenDefault k-mer length [in]

Definition at line 142 of file kmercounts.hpp.

References len.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetTransTable()

static vector<Uint1>& CSparseKmerCounts::SetTransTable ( void  )
inlinestatic

Set default compressed alphabet letter translation table.

Returns
Reference to translation table [in|out]

Definition at line 154 of file kmercounts.hpp.

Referenced by BOOST_AUTO_TEST_CASE().

◆ SetUseCompressed()

static void CSparseKmerCounts::SetUseCompressed ( bool  use_comp)
inlinestatic

Set default option for using compressed alphabet.

Parameters
use_compWill compressed alphabet be used [in]

Definition at line 159 of file kmercounts.hpp.

Referenced by BOOST_AUTO_TEST_CASE().

Member Data Documentation

◆ kLengthBitsThreshold

const unsigned int CSparseKmerCounts::kLengthBitsThreshold = 32
staticprotected

Definition at line 247 of file kmercounts.hpp.

Referenced by ReserveCountsMem(), and Reset().

◆ m_Counts

vector<SVectorElement> CSparseKmerCounts::m_Counts
protected

Definition at line 238 of file kmercounts.hpp.

Referenced by CountCommonKmers(), and Reset().

◆ m_NumCounts

unsigned int CSparseKmerCounts::m_NumCounts
protected

Definition at line 240 of file kmercounts.hpp.

Referenced by Reset().

◆ m_SeqLength

unsigned int CSparseKmerCounts::m_SeqLength
protected

Definition at line 239 of file kmercounts.hpp.

Referenced by Reset().

◆ sm_AlphabetSize

unsigned int CSparseKmerCounts::sm_AlphabetSize = kAlphabetSize
staticprotected

Definition at line 242 of file kmercounts.hpp.

Referenced by PreCount(), ReserveCountsMem(), and Reset().

◆ sm_Buffer

CSparseKmerCounts::TCount * CSparseKmerCounts::sm_Buffer = NULL
staticprotected

Definition at line 245 of file kmercounts.hpp.

Referenced by PostCount(), PreCount(), and Reset().

◆ sm_ForceSmallerMem

bool CSparseKmerCounts::sm_ForceSmallerMem = false
staticprotected

Definition at line 246 of file kmercounts.hpp.

Referenced by PostCount(), ReserveCountsMem(), and Reset().

◆ sm_KmerLength

unsigned int CSparseKmerCounts::sm_KmerLength = 4
staticprotected

Definition at line 241 of file kmercounts.hpp.

Referenced by GetKmerLength(), ReserveCountsMem(), and Reset().

◆ sm_TransTable

vector< Uint1 > CSparseKmerCounts::sm_TransTable
staticprotected

Definition at line 243 of file kmercounts.hpp.

Referenced by Reset().

◆ sm_UseCompressed

bool CSparseKmerCounts::sm_UseCompressed = false
staticprotected

Definition at line 244 of file kmercounts.hpp.

Referenced by Reset().


The documentation for this class was generated from the following files:
Modified on Fri Mar 01 10:06:42 2024 by modify_doxy.py rev. 669887