NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
CClusterer Class Reference

Search Toolkit Book for CClusterer

Interface for CClusterer class used for clustering any type of data based on distance matrix. More...

#include <algo/cobalt/clusterer.hpp>

+ Collaboration diagram for CClusterer:

Classes

class  CSingleCluster
 Single cluster. More...
 

Public Types

enum  EDistMethod { eCompleteLinkage = 0 , eAverageLinkage }
 Method for computing distance between clusters. More...
 
enum  EClustMethod { eClique = 0 , eDist }
 Method for clustering from links. More...
 
typedef CNcbiMatrix< double > TDistMatrix
 
typedef CSingleCluster TSingleCluster
 
typedef vector< TSingleClusterTClusters
 

Public Member Functions

 CClusterer (void)
 Create empty clusterer. More...
 
 CClusterer (const TDistMatrix &dmat)
 Create clusterer. More...
 
 CClusterer (shared_ptr< TDistMatrix > &dmat)
 Create clusterer. More...
 
 CClusterer (CRef< CLinks > links)
 Create clusterer. More...
 
 ~CClusterer ()
 Destructor. More...
 
void SetDistMatrix (const TDistMatrix &dmat)
 Set new distance matrix. More...
 
void SetDistMatrix (shared_ptr< TDistMatrix > &dmat)
 Set new distance matrix without copying. More...
 
void SetLinks (CRef< CLinks > links)
 Set distance links. More...
 
const TDistMatrixGetDistMatrix (void) const
 Get distance matrix. More...
 
void SetMaxClusterDiameter (double diam)
 Set maximum diameter for single cluster. More...
 
double GetMaxClusterDiameter (void) const
 Get maximum diameter for single cluster. More...
 
void SetClustMethod (EClustMethod method)
 Set clustering method for links. More...
 
EClustMethod GetClustMethod (void) const
 Get clustering method for links. More...
 
void SetMakeTrees (bool trees)
 Set make cluster tree/dendrogram option. More...
 
void SetReportSingletons (bool b)
 Set reporting of single element clusters. More...
 
bool GetReportSingletons (void) const
 Get reporting mode for single element clusters. More...
 
void ComputeClusters (double max_diam, EDistMethod dist_method=eCompleteLinkage, bool do_trees=true, double infinity=-1.0)
 Compute clusters. More...
 
void ComputeClustersFromLinks (void)
 Compute clusters using graph of distances between elements. More...
 
const TSingleClusterGetSingleCluster (size_t index) const
 Get list of elements of a specified cluster. More...
 
const TClustersGetClusters (void) const
 Get clusters. More...
 
TClustersSetClusters (void)
 Set clusters. More...
 
int GetClusterId (int elem) const
 Find id of cluster to which given element belongs. More...
 
void GetTrees (vector< TPhyTreeNode * > &trees) const
 Get list of trees for clusters. More...
 
void ReleaseTrees (vector< TPhyTreeNode * > &trees)
 Get list of trees for clusters and release ownership to caller. More...
 
vector< TPhyTreeNode * > & GetTrees (void)
 Get list of trees for clusters. More...
 
const TPhyTreeNodeGetTree (int index=0) const
 Get tree for specific cluster. More...
 
TPhyTreeNodeReleaseTree (int index=0)
 Get cluster tree and release ownership to caller. More...
 
void SetPrototypes (void)
 Set prototypes for all clusters as center elements. More...
 
void GetClusterDistMatrix (int index, TDistMatrix &mat) const
 Get distance matrix for elements of a selected cluster. More...
 
void PurgeDistMatrix (void)
 Delete distance matrix. More...
 
void Reset (void)
 Clear clusters and distance matrix. More...
 
void Run (void)
 Cluster elements. More...
 

Protected Member Functions

 CClusterer (const CClusterer &)
 Forbid copy constructor. More...
 
CClustereroperator= (const CClusterer &)
 Forbid assignment operator. More...
 
void x_Init (void)
 Initialize parameters. More...
 
void x_JoinElements (const CLinks::SLink &link)
 Join two elements and form a cluster. More...
 
void x_JoinClustElem (int cluster_id, int elem, double dist)
 Add element to a cluster. More...
 
void x_JoinClusters (int cluster1_id, int cluster2_id, double dist)
 Join two clusters. More...
 
void x_CreateCluster (int elem)
 Create one-element cluster. More...
 
bool x_CanAddElem (int cluster_id, int elem, double &dist) const
 Check whether element can be added to the cluster. More...
 
bool x_CanJoinClusters (int cluster1_id, int cluster2_id, double &dist) const
 Check whether two clusters can be joined. More...
 

Protected Attributes

shared_ptr< TDistMatrixm_DistMatrix
 
TClusters m_Clusters
 
vector< TPhyTreeNode * > m_Trees
 
double m_MaxDiameter
 
EClustMethod m_LinkMethod
 
CRef< CLinksm_Links
 
vector< intm_ClusterId
 
list< intm_UnusedEntries
 
bool m_MakeTrees
 
bool m_ReportSingletons
 

Detailed Description

Interface for CClusterer class used for clustering any type of data based on distance matrix.

The class operates on ideces in the distance matrix.

Definition at line 52 of file clusterer.hpp.

Member Typedef Documentation

◆ TClusters

Definition at line 160 of file clusterer.hpp.

◆ TDistMatrix

Definition at line 56 of file clusterer.hpp.

◆ TSingleCluster

Definition at line 159 of file clusterer.hpp.

Member Enumeration Documentation

◆ EClustMethod

Method for clustering from links.

Enumerator
eClique 

Clusters can be joined if there is a link between all pairs of their elements.

eDist 

Clusters can be joined if there is a link between at least one pair of elements.

Definition at line 66 of file clusterer.hpp.

◆ EDistMethod

Method for computing distance between clusters.

Enumerator
eCompleteLinkage 

Maximum distance between elements.

eAverageLinkage 

Avegrae distance between elements.

Definition at line 60 of file clusterer.hpp.

Constructor & Destructor Documentation

◆ CClusterer() [1/5]

CClusterer::CClusterer ( void  )

Create empty clusterer.

Definition at line 51 of file clusterer.cpp.

References x_Init().

◆ CClusterer() [2/5]

CClusterer::CClusterer ( const TDistMatrix dmat)

Create clusterer.

Parameters
dmatDistance matrix

Definition at line 56 of file clusterer.cpp.

References m_DistMatrix, s_CheckDistMatrix(), and x_Init().

◆ CClusterer() [3/5]

CClusterer::CClusterer ( shared_ptr< TDistMatrix > &  dmat)

Create clusterer.

Parameters
dmatPointer to distance matrix

◆ CClusterer() [4/5]

CClusterer::CClusterer ( CRef< CLinks links)

Create clusterer.

Parameters
linksGraph of distances between elements

Definition at line 70 of file clusterer.cpp.

References x_Init().

◆ ~CClusterer()

CClusterer::~CClusterer ( )

Destructor.

Definition at line 84 of file clusterer.cpp.

References m_Trees, and s_PurgeTrees().

◆ CClusterer() [5/5]

CClusterer::CClusterer ( const CClusterer )
protected

Forbid copy constructor.

Member Function Documentation

◆ ComputeClusters()

void CClusterer::ComputeClusters ( double  max_diam,
CClusterer::EDistMethod  dist_method = eCompleteLinkage,
bool  do_trees = true,
double  infinity = -1.0 
)

Compute clusters.

Computes complete linkage distance-based clustering with constrainted maxium pairwise distance between cluster elements. Cluster dendrogram can be computed for each such cluster indepenently.

Parameters
max_dimMaximum distance between two elements in a cluster [in]
dist_methodMethod for computing distance between clusters [in]
do_treesIf true, cluster dendrogram will be computed for each cluster [in]
infinityDistance above which two single elements cannot be joined in a cluster. They are added to exising clusters. [in]

Definition at line 272 of file clusterer.cpp.

References _ASSERT, CClusterer::CSingleCluster::AddElement(), CTreeNode< TValue, TKeyGetterP >::AddNode(), eAverageLinkage, eCompleteLinkage, ctll::empty(), CTreeNode< TValue, TKeyGetterP >::GetValue(), i, infinity, int, ITERATE, m_Clusters, m_DistMatrix, m_Trees, NCBI_THROW, NON_CONST_ITERATE, NULL, s_CreateTreeLeaf(), s_FindDist(), s_FindDistAsMean(), s_FindMean(), s_PurgeTrees(), ncbi::grid::netcache::search::fields::size, and swap().

Referenced by BOOST_AUTO_TEST_CASE(), Run(), and CMultiAligner::x_ComputeTree().

◆ ComputeClustersFromLinks()

void CClusterer::ComputeClustersFromLinks ( void  )

◆ GetClusterDistMatrix()

void CClusterer::GetClusterDistMatrix ( int  index,
TDistMatrix mat 
) const

Get distance matrix for elements of a selected cluster.

Parameters
indexCluster index [in]
matDistance matrix for cluster elements [out]

Definition at line 1123 of file clusterer.cpp.

References i, m_Clusters, m_DistMatrix, NCBI_THROW, CNcbiMatrix< T >::Resize(), and CClusterer::CSingleCluster::size().

Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::x_ComputeClusterTrees().

◆ GetClusterId()

int CClusterer::GetClusterId ( int  elem) const

Find id of cluster to which given element belongs.

Parameters
elemElement [in]
Returns
Cluster numerical id

Definition at line 1053 of file clusterer.cpp.

References m_ClusterId, and NCBI_THROW.

Referenced by CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().

◆ GetClusters()

const TClusters& CClusterer::GetClusters ( void  ) const
inline

◆ GetClustMethod()

EClustMethod CClusterer::GetClustMethod ( void  ) const
inline

Get clustering method for links.

Returns
Clustering method for links

Definition at line 225 of file clusterer.hpp.

◆ GetDistMatrix()

const CClusterer::TDistMatrix & CClusterer::GetDistMatrix ( void  ) const

◆ GetMaxClusterDiameter()

double CClusterer::GetMaxClusterDiameter ( void  ) const
inline

Get maximum diameter for single cluster.

Returns
Maximum cluster diameter

Definition at line 215 of file clusterer.hpp.

◆ GetReportSingletons()

bool CClusterer::GetReportSingletons ( void  ) const
inline

Get reporting mode for single element clusters.

Returns
If true, single element clusters are reported

Definition at line 240 of file clusterer.hpp.

◆ GetSingleCluster()

const CClusterer::TSingleCluster & CClusterer::GetSingleCluster ( size_t  index) const

Get list of elements of a specified cluster.

Parameters
indexCluster index
Returns
list of element indeces that belong to the cluster

Definition at line 130 of file clusterer.cpp.

References m_Clusters, and NCBI_THROW.

Referenced by BOOST_AUTO_TEST_CASE().

◆ GetTree()

const TPhyTreeNode * CClusterer::GetTree ( int  index = 0) const

Get tree for specific cluster.

Parameters
indexCluster index [in]
Returns
Cluster tree

Definition at line 1090 of file clusterer.cpp.

References m_Trees, and NCBI_THROW.

◆ GetTrees() [1/2]

void CClusterer::GetTrees ( vector< TPhyTreeNode * > &  trees) const

Get list of trees for clusters.

Parameters
Listof trees [out]

Definition at line 1064 of file clusterer.cpp.

References ITERATE, and m_Trees.

Referenced by s_TestClustersAndTrees().

◆ GetTrees() [2/2]

vector<TPhyTreeNode*>& CClusterer::GetTrees ( void  )
inline

Get list of trees for clusters.

Returns
List of trees

Definition at line 297 of file clusterer.hpp.

◆ operator=()

CClusterer& CClusterer::operator= ( const CClusterer )
protected

Forbid assignment operator.

◆ PurgeDistMatrix()

void CClusterer::PurgeDistMatrix ( void  )
inline

Delete distance matrix.

Definition at line 323 of file clusterer.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), Reset(), and CMultiAligner::x_FindQueryClusters().

◆ ReleaseTree()

TPhyTreeNode * CClusterer::ReleaseTree ( int  index = 0)

Get cluster tree and release ownership to caller.

Parameters
indexCluster index [in]
Returns
Cluster Tree

Definition at line 1101 of file clusterer.cpp.

References m_Trees, NCBI_THROW, NULL, and result.

Referenced by CMultiAligner::x_ComputeTree().

◆ ReleaseTrees()

void CClusterer::ReleaseTrees ( vector< TPhyTreeNode * > &  trees)

Get list of trees for clusters and release ownership to caller.

Parameters
Listof trees [out]

Definition at line 1073 of file clusterer.cpp.

References ITERATE, and m_Trees.

Referenced by CMultiAligner::x_ComputeClusterTrees().

◆ Reset()

void CClusterer::Reset ( void  )

Clear clusters and distance matrix.

Definition at line 1148 of file clusterer.cpp.

References m_Clusters, m_Links, m_Trees, PurgeDistMatrix(), CRef< C, Locker >::Reset(), and s_PurgeTrees().

Referenced by CMultiAligner::x_FindQueryClusters().

◆ Run()

void CClusterer::Run ( void  )

Cluster elements.

The clustering method is selected based on whether distance matrix or distance links are set.

Definition at line 1157 of file clusterer.cpp.

References ComputeClusters(), ComputeClustersFromLinks(), CRef< C, Locker >::Empty(), m_DistMatrix, m_Links, m_MaxDiameter, and NCBI_THROW.

Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().

◆ SetClusters()

TClusters& CClusterer::SetClusters ( void  )
inline

Set clusters.

Returns
Clusters

Definition at line 277 of file clusterer.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::x_FindQueryClusters().

◆ SetClustMethod()

void CClusterer::SetClustMethod ( EClustMethod  method)
inline

Set clustering method for links.

Parameters
methodClustering method

Definition at line 220 of file clusterer.hpp.

Referenced by CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().

◆ SetDistMatrix() [1/2]

void CClusterer::SetDistMatrix ( const TDistMatrix dmat)

◆ SetDistMatrix() [2/2]

void CClusterer::SetDistMatrix ( shared_ptr< TDistMatrix > &  dmat)

Set new distance matrix without copying.

Parameters
dmatDistance matrix

Definition at line 106 of file clusterer.cpp.

References m_DistMatrix, and s_CheckDistMatrix().

◆ SetLinks()

void CClusterer::SetLinks ( CRef< CLinks links)

Set distance links.

Parameters
linksDistance links

Definition at line 113 of file clusterer.cpp.

References m_Links.

Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().

◆ SetMakeTrees()

void CClusterer::SetMakeTrees ( bool  trees)
inline

Set make cluster tree/dendrogram option.

Parameters
treesIf true cluster trees will be computed [in]

Definition at line 230 of file clusterer.hpp.

Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().

◆ SetMaxClusterDiameter()

void CClusterer::SetMaxClusterDiameter ( double  diam)
inline

Set maximum diameter for single cluster.

Parameters
diamMaximum cluster diameter

Definition at line 210 of file clusterer.hpp.

◆ SetPrototypes()

void CClusterer::SetPrototypes ( void  )

Set prototypes for all clusters as center elements.

Definition at line 1115 of file clusterer.cpp.

References m_Clusters, m_DistMatrix, and NON_CONST_ITERATE.

◆ SetReportSingletons()

void CClusterer::SetReportSingletons ( bool  b)
inline

Set reporting of single element clusters.

Parameters
bIf true, single element clusters will be reported [in]

Definition at line 235 of file clusterer.hpp.

References b.

◆ x_CanAddElem()

bool CClusterer::x_CanAddElem ( int  cluster_id,
int  elem,
double &  dist 
) const
protected

Check whether element can be added to the cluster.

The function assumes that there is a link between the element and at least one element of the cluster.

Parameters
cluster_idCluster id [in]
elemElement [in]
distAverage distance between the element and cluster elements [out]

Definition at line 922 of file clusterer.cpp.

References eDist, CLinks::IsLink(), ITERATE, m_Clusters, m_LinkMethod, m_Links, and m_MakeTrees.

Referenced by ComputeClustersFromLinks().

◆ x_CanJoinClusters()

bool CClusterer::x_CanJoinClusters ( int  cluster1_id,
int  cluster2_id,
double &  dist 
) const
protected

Check whether two clusters can be joined.

The function assumes that there is a link between at least one pair of elements from the two clusters.

Parameters
cluster1_idId of the first cluster [in]
cluster2_idId of the second cluster [in]
distAverage distance between all pairs of elements (x, y), such that x belongs to cluster1 and y to cluster2 [out]

Definition at line 1024 of file clusterer.cpp.

References eDist, CLinks::IsLink(), ITERATE, m_Clusters, m_LinkMethod, m_Links, and m_MakeTrees.

Referenced by ComputeClustersFromLinks().

◆ x_CreateCluster()

void CClusterer::x_CreateCluster ( int  elem)
protected

Create one-element cluster.

Definition at line 844 of file clusterer.cpp.

References _ASSERT, CClusterer::CSingleCluster::AddElement(), int, m_ClusterId, m_Clusters, m_MakeTrees, m_UnusedEntries, and s_CreateTreeLeaf().

Referenced by ComputeClustersFromLinks().

◆ x_Init()

void CClusterer::x_Init ( void  )
protected

Initialize parameters.

Definition at line 89 of file clusterer.cpp.

References eClique, m_LinkMethod, m_MakeTrees, m_MaxDiameter, and m_ReportSingletons.

Referenced by CClusterer().

◆ x_JoinClustElem()

void CClusterer::x_JoinClustElem ( int  cluster_id,
int  elem,
double  dist 
)
protected

◆ x_JoinClusters()

void CClusterer::x_JoinClusters ( int  cluster1_id,
int  cluster2_id,
double  dist 
)
protected

◆ x_JoinElements()

void CClusterer::x_JoinElements ( const CLinks::SLink link)
protected

Member Data Documentation

◆ m_ClusterId

vector<int> CClusterer::m_ClusterId
protected

◆ m_Clusters

TClusters CClusterer::m_Clusters
protected

◆ m_DistMatrix

shared_ptr<TDistMatrix> CClusterer::m_DistMatrix
protected

◆ m_LinkMethod

EClustMethod CClusterer::m_LinkMethod
protected

Definition at line 382 of file clusterer.hpp.

Referenced by x_CanAddElem(), x_CanJoinClusters(), and x_Init().

◆ m_Links

CRef<CLinks> CClusterer::m_Links
protected

◆ m_MakeTrees

bool CClusterer::m_MakeTrees
protected

◆ m_MaxDiameter

double CClusterer::m_MaxDiameter
protected

Definition at line 381 of file clusterer.hpp.

Referenced by Run(), and x_Init().

◆ m_ReportSingletons

bool CClusterer::m_ReportSingletons
protected

Definition at line 389 of file clusterer.hpp.

Referenced by ComputeClustersFromLinks(), and x_Init().

◆ m_Trees

vector<TPhyTreeNode*> CClusterer::m_Trees
protected

◆ m_UnusedEntries

list<int> CClusterer::m_UnusedEntries
protected

Definition at line 386 of file clusterer.hpp.

Referenced by x_CreateCluster(), x_JoinClusters(), and x_JoinElements().


The documentation for this class was generated from the following files:
Modified on Thu Feb 22 17:11:44 2024 by modify_doxy.py rev. 669887