NCBI C++ ToolKit
|
Search Toolkit Book for CClusterer
Interface for CClusterer class used for clustering any type of data based on distance matrix. More...
#include <algo/cobalt/clusterer.hpp>
Classes | |
class | CSingleCluster |
Single cluster. More... | |
Public Types | |
enum | EDistMethod { eCompleteLinkage = 0 , eAverageLinkage } |
Method for computing distance between clusters. More... | |
enum | EClustMethod { eClique = 0 , eDist } |
Method for clustering from links. More... | |
typedef CNcbiMatrix< double > | TDistMatrix |
typedef CSingleCluster | TSingleCluster |
typedef vector< TSingleCluster > | TClusters |
Public Member Functions | |
CClusterer (void) | |
Create empty clusterer. More... | |
CClusterer (const TDistMatrix &dmat) | |
Create clusterer. More... | |
CClusterer (shared_ptr< TDistMatrix > &dmat) | |
Create clusterer. More... | |
CClusterer (CRef< CLinks > links) | |
Create clusterer. More... | |
~CClusterer () | |
Destructor. More... | |
void | SetDistMatrix (const TDistMatrix &dmat) |
Set new distance matrix. More... | |
void | SetDistMatrix (shared_ptr< TDistMatrix > &dmat) |
Set new distance matrix without copying. More... | |
void | SetLinks (CRef< CLinks > links) |
Set distance links. More... | |
const TDistMatrix & | GetDistMatrix (void) const |
Get distance matrix. More... | |
void | SetMaxClusterDiameter (double diam) |
Set maximum diameter for single cluster. More... | |
double | GetMaxClusterDiameter (void) const |
Get maximum diameter for single cluster. More... | |
void | SetClustMethod (EClustMethod method) |
Set clustering method for links. More... | |
EClustMethod | GetClustMethod (void) const |
Get clustering method for links. More... | |
void | SetMakeTrees (bool trees) |
Set make cluster tree/dendrogram option. More... | |
void | SetReportSingletons (bool b) |
Set reporting of single element clusters. More... | |
bool | GetReportSingletons (void) const |
Get reporting mode for single element clusters. More... | |
void | ComputeClusters (double max_diam, EDistMethod dist_method=eCompleteLinkage, bool do_trees=true, double infinity=-1.0) |
Compute clusters. More... | |
void | ComputeClustersFromLinks (void) |
Compute clusters using graph of distances between elements. More... | |
const TSingleCluster & | GetSingleCluster (size_t index) const |
Get list of elements of a specified cluster. More... | |
const TClusters & | GetClusters (void) const |
Get clusters. More... | |
TClusters & | SetClusters (void) |
Set clusters. More... | |
int | GetClusterId (int elem) const |
Find id of cluster to which given element belongs. More... | |
void | GetTrees (vector< TPhyTreeNode * > &trees) const |
Get list of trees for clusters. More... | |
void | ReleaseTrees (vector< TPhyTreeNode * > &trees) |
Get list of trees for clusters and release ownership to caller. More... | |
vector< TPhyTreeNode * > & | GetTrees (void) |
Get list of trees for clusters. More... | |
const TPhyTreeNode * | GetTree (int index=0) const |
Get tree for specific cluster. More... | |
TPhyTreeNode * | ReleaseTree (int index=0) |
Get cluster tree and release ownership to caller. More... | |
void | SetPrototypes (void) |
Set prototypes for all clusters as center elements. More... | |
void | GetClusterDistMatrix (int index, TDistMatrix &mat) const |
Get distance matrix for elements of a selected cluster. More... | |
void | PurgeDistMatrix (void) |
Delete distance matrix. More... | |
void | Reset (void) |
Clear clusters and distance matrix. More... | |
void | Run (void) |
Cluster elements. More... | |
Protected Member Functions | |
CClusterer (const CClusterer &) | |
Forbid copy constructor. More... | |
CClusterer & | operator= (const CClusterer &) |
Forbid assignment operator. More... | |
void | x_Init (void) |
Initialize parameters. More... | |
void | x_JoinElements (const CLinks::SLink &link) |
Join two elements and form a cluster. More... | |
void | x_JoinClustElem (int cluster_id, int elem, double dist) |
Add element to a cluster. More... | |
void | x_JoinClusters (int cluster1_id, int cluster2_id, double dist) |
Join two clusters. More... | |
void | x_CreateCluster (int elem) |
Create one-element cluster. More... | |
bool | x_CanAddElem (int cluster_id, int elem, double &dist) const |
Check whether element can be added to the cluster. More... | |
bool | x_CanJoinClusters (int cluster1_id, int cluster2_id, double &dist) const |
Check whether two clusters can be joined. More... | |
Protected Attributes | |
shared_ptr< TDistMatrix > | m_DistMatrix |
TClusters | m_Clusters |
vector< TPhyTreeNode * > | m_Trees |
double | m_MaxDiameter |
EClustMethod | m_LinkMethod |
CRef< CLinks > | m_Links |
vector< int > | m_ClusterId |
list< int > | m_UnusedEntries |
bool | m_MakeTrees |
bool | m_ReportSingletons |
Interface for CClusterer class used for clustering any type of data based on distance matrix.
The class operates on ideces in the distance matrix.
Definition at line 52 of file clusterer.hpp.
typedef vector<TSingleCluster> CClusterer::TClusters |
Definition at line 160 of file clusterer.hpp.
typedef CNcbiMatrix<double> CClusterer::TDistMatrix |
Definition at line 56 of file clusterer.hpp.
Definition at line 159 of file clusterer.hpp.
Method for clustering from links.
Enumerator | |
---|---|
eClique | Clusters can be joined if there is a link between all pairs of their elements. |
eDist | Clusters can be joined if there is a link between at least one pair of elements. |
Definition at line 66 of file clusterer.hpp.
Method for computing distance between clusters.
Enumerator | |
---|---|
eCompleteLinkage | Maximum distance between elements. |
eAverageLinkage | Avegrae distance between elements. |
Definition at line 60 of file clusterer.hpp.
CClusterer::CClusterer | ( | void | ) |
CClusterer::CClusterer | ( | const TDistMatrix & | dmat | ) |
Create clusterer.
dmat | Distance matrix |
Definition at line 56 of file clusterer.cpp.
References m_DistMatrix, s_CheckDistMatrix(), and x_Init().
CClusterer::CClusterer | ( | shared_ptr< TDistMatrix > & | dmat | ) |
Create clusterer.
dmat | Pointer to distance matrix |
Create clusterer.
links | Graph of distances between elements |
Definition at line 70 of file clusterer.cpp.
References x_Init().
CClusterer::~CClusterer | ( | ) |
|
protected |
Forbid copy constructor.
void CClusterer::ComputeClusters | ( | double | max_diam, |
CClusterer::EDistMethod | dist_method = eCompleteLinkage , |
||
bool | do_trees = true , |
||
double | infinity = -1.0 |
||
) |
Compute clusters.
Computes complete linkage distance-based clustering with constrainted maxium pairwise distance between cluster elements. Cluster dendrogram can be computed for each such cluster indepenently.
max_dim | Maximum distance between two elements in a cluster [in] |
dist_method | Method for computing distance between clusters [in] |
do_trees | If true, cluster dendrogram will be computed for each cluster [in] |
infinity | Distance above which two single elements cannot be joined in a cluster. They are added to exising clusters. [in] |
Definition at line 272 of file clusterer.cpp.
References _ASSERT, CClusterer::CSingleCluster::AddElement(), CTreeNode< TValue, TKeyGetterP >::AddNode(), eAverageLinkage, eCompleteLinkage, ctll::empty(), CTreeNode< TValue, TKeyGetterP >::GetValue(), i, infinity, int, ITERATE, m_Clusters, m_DistMatrix, m_Trees, NCBI_THROW, NON_CONST_ITERATE, NULL, s_CreateTreeLeaf(), s_FindDist(), s_FindDistAsMean(), s_FindMean(), s_PurgeTrees(), ncbi::grid::netcache::search::fields::size, and swap().
Referenced by BOOST_AUTO_TEST_CASE(), Run(), and CMultiAligner::x_ComputeTree().
void CClusterer::ComputeClustersFromLinks | ( | void | ) |
Compute clusters using graph of distances between elements.
Definition at line 621 of file clusterer.cpp.
References _ASSERT, CLinks::begin(), CRef< C, Locker >::Empty(), CLinks::end(), CLinks::GetNumElements(), i, int, CLinks::IsSorted(), ITERATE, m_ClusterId, m_Clusters, m_Links, m_MakeTrees, m_ReportSingletons, CClusterer::CSingleCluster::m_Tree, m_Trees, NCBI_THROW, CClusterer::CSingleCluster::size(), ncbi::grid::netcache::search::fields::size, CLinks::Sort(), x_CanAddElem(), x_CanJoinClusters(), x_CreateCluster(), x_JoinClustElem(), x_JoinClusters(), and x_JoinElements().
Referenced by Run().
void CClusterer::GetClusterDistMatrix | ( | int | index, |
TDistMatrix & | mat | ||
) | const |
Get distance matrix for elements of a selected cluster.
index | Cluster index [in] |
mat | Distance matrix for cluster elements [out] |
Definition at line 1123 of file clusterer.cpp.
References i, m_Clusters, m_DistMatrix, NCBI_THROW, CNcbiMatrix< T >::Resize(), and CClusterer::CSingleCluster::size().
Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::x_ComputeClusterTrees().
Find id of cluster to which given element belongs.
elem | Element [in] |
Definition at line 1053 of file clusterer.cpp.
References m_ClusterId, and NCBI_THROW.
Referenced by CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().
Get clusters.
Definition at line 272 of file clusterer.hpp.
Referenced by BOOST_AUTO_TEST_CASE(), s_TestClustersAndTrees(), CMultiAligner::x_AlignInClusters(), CMultiAligner::x_AlignProgressive(), CMultiAligner::x_AttachClusterTrees(), CMultiAligner::x_BuildFullTree(), CMultiAligner::x_ComputeClusterTrees(), CMultiAligner::x_ComputeTree(), CMultiAligner::x_CreateBlastQueries(), CMultiAligner::x_CreatePatternQueries(), CMultiAligner::x_FindQueryClusters(), CMultiAligner::x_MakeClusterResidueFrequencies(), CMultiAligner::x_MultiAlignClusters(), and CMultiAligner::x_Run().
|
inline |
Get clustering method for links.
Definition at line 225 of file clusterer.hpp.
const CClusterer::TDistMatrix & CClusterer::GetDistMatrix | ( | void | ) | const |
Get distance matrix.
Definition at line 118 of file clusterer.cpp.
References m_DistMatrix, and NCBI_THROW.
Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_AlignClusterQueries(), CMultiAligner::x_ComputeClusterTrees(), CMultiAligner::x_FindInClusterConstraints(), and CMultiAligner::x_FindQueryClusters().
|
inline |
Get maximum diameter for single cluster.
Definition at line 215 of file clusterer.hpp.
|
inline |
Get reporting mode for single element clusters.
Definition at line 240 of file clusterer.hpp.
const CClusterer::TSingleCluster & CClusterer::GetSingleCluster | ( | size_t | index | ) | const |
Get list of elements of a specified cluster.
index | Cluster index |
Definition at line 130 of file clusterer.cpp.
References m_Clusters, and NCBI_THROW.
Referenced by BOOST_AUTO_TEST_CASE().
const TPhyTreeNode * CClusterer::GetTree | ( | int | index = 0 | ) | const |
Get tree for specific cluster.
index | Cluster index [in] |
Definition at line 1090 of file clusterer.cpp.
References m_Trees, and NCBI_THROW.
void CClusterer::GetTrees | ( | vector< TPhyTreeNode * > & | trees | ) | const |
Get list of trees for clusters.
List | of trees [out] |
Definition at line 1064 of file clusterer.cpp.
References ITERATE, and m_Trees.
Referenced by s_TestClustersAndTrees().
|
inline |
|
protected |
Forbid assignment operator.
|
inline |
Delete distance matrix.
Definition at line 323 of file clusterer.hpp.
Referenced by BOOST_AUTO_TEST_CASE(), Reset(), and CMultiAligner::x_FindQueryClusters().
TPhyTreeNode * CClusterer::ReleaseTree | ( | int | index = 0 | ) |
Get cluster tree and release ownership to caller.
index | Cluster index [in] |
Definition at line 1101 of file clusterer.cpp.
References m_Trees, NCBI_THROW, NULL, and result.
Referenced by CMultiAligner::x_ComputeTree().
void CClusterer::ReleaseTrees | ( | vector< TPhyTreeNode * > & | trees | ) |
Get list of trees for clusters and release ownership to caller.
List | of trees [out] |
Definition at line 1073 of file clusterer.cpp.
References ITERATE, and m_Trees.
Referenced by CMultiAligner::x_ComputeClusterTrees().
void CClusterer::Reset | ( | void | ) |
Clear clusters and distance matrix.
Definition at line 1148 of file clusterer.cpp.
References m_Clusters, m_Links, m_Trees, PurgeDistMatrix(), CRef< C, Locker >::Reset(), and s_PurgeTrees().
Referenced by CMultiAligner::x_FindQueryClusters().
void CClusterer::Run | ( | void | ) |
Cluster elements.
The clustering method is selected based on whether distance matrix or distance links are set.
Definition at line 1157 of file clusterer.cpp.
References ComputeClusters(), ComputeClustersFromLinks(), CRef< C, Locker >::Empty(), m_DistMatrix, m_Links, m_MaxDiameter, and NCBI_THROW.
Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().
|
inline |
Set clusters.
Definition at line 277 of file clusterer.hpp.
Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::x_FindQueryClusters().
|
inline |
Set clustering method for links.
method | Clustering method |
Definition at line 220 of file clusterer.hpp.
Referenced by CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().
void CClusterer::SetDistMatrix | ( | const TDistMatrix & | dmat | ) |
Set new distance matrix.
dmat | Distance matrix |
Definition at line 97 of file clusterer.cpp.
References CNcbiMatrix< T >::begin(), copy(), CNcbiMatrix< T >::end(), CNcbiMatrix< T >::GetCols(), CNcbiMatrix< T >::GetRows(), m_DistMatrix, and s_CheckDistMatrix().
Referenced by BOOST_AUTO_TEST_CASE(), and CMultiAligner::x_FindQueryClusters().
void CClusterer::SetDistMatrix | ( | shared_ptr< TDistMatrix > & | dmat | ) |
Set new distance matrix without copying.
dmat | Distance matrix |
Definition at line 106 of file clusterer.cpp.
References m_DistMatrix, and s_CheckDistMatrix().
Set distance links.
links | Distance links |
Definition at line 113 of file clusterer.cpp.
References m_Links.
Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().
|
inline |
Set make cluster tree/dendrogram option.
trees | If true cluster trees will be computed [in] |
Definition at line 230 of file clusterer.hpp.
Referenced by BOOST_AUTO_TEST_CASE(), CMultiAligner::x_FindQueryClusters(), CClustererApplication::x_RunBinary(), and CClustererApplication::x_RunSparse().
|
inline |
Set maximum diameter for single cluster.
diam | Maximum cluster diameter |
Definition at line 210 of file clusterer.hpp.
void CClusterer::SetPrototypes | ( | void | ) |
Set prototypes for all clusters as center elements.
Definition at line 1115 of file clusterer.cpp.
References m_Clusters, m_DistMatrix, and NON_CONST_ITERATE.
|
inline |
Set reporting of single element clusters.
b | If true, single element clusters will be reported [in] |
Definition at line 235 of file clusterer.hpp.
References b.
Check whether element can be added to the cluster.
The function assumes that there is a link between the element and at least one element of the cluster.
cluster_id | Cluster id [in] |
elem | Element [in] |
dist | Average distance between the element and cluster elements [out] |
Definition at line 922 of file clusterer.cpp.
References eDist, CLinks::IsLink(), ITERATE, m_Clusters, m_LinkMethod, m_Links, and m_MakeTrees.
Referenced by ComputeClustersFromLinks().
|
protected |
Check whether two clusters can be joined.
The function assumes that there is a link between at least one pair of elements from the two clusters.
cluster1_id | Id of the first cluster [in] |
cluster2_id | Id of the second cluster [in] |
dist | Average distance between all pairs of elements (x, y), such that x belongs to cluster1 and y to cluster2 [out] |
Definition at line 1024 of file clusterer.cpp.
References eDist, CLinks::IsLink(), ITERATE, m_Clusters, m_LinkMethod, m_Links, and m_MakeTrees.
Referenced by ComputeClustersFromLinks().
|
protected |
Create one-element cluster.
Definition at line 844 of file clusterer.cpp.
References _ASSERT, CClusterer::CSingleCluster::AddElement(), int, m_ClusterId, m_Clusters, m_MakeTrees, m_UnusedEntries, and s_CreateTreeLeaf().
Referenced by ComputeClustersFromLinks().
|
protected |
Initialize parameters.
Definition at line 89 of file clusterer.cpp.
References eClique, m_LinkMethod, m_MakeTrees, m_MaxDiameter, and m_ReportSingletons.
Referenced by CClusterer().
Add element to a cluster.
Definition at line 878 of file clusterer.cpp.
References _ASSERT, CTreeNode< TValue, TKeyGetterP >::AddNode(), CTreeNode< TValue, TKeyGetterP >::GetValue(), ITERATE, m_ClusterId, m_Clusters, m_MakeTrees, NON_CONST_ITERATE, and s_CreateTreeLeaf().
Referenced by ComputeClustersFromLinks().
Join two clusters.
Definition at line 948 of file clusterer.cpp.
References _ASSERT, CClusterer::CSingleCluster::AddElement(), CTreeNode< TValue, TKeyGetterP >::AddNode(), CClusterer::CSingleCluster::clear(), CTreeNode< TValue, TKeyGetterP >::GetValue(), ITERATE, m_ClusterId, m_Clusters, CClusterer::CSingleCluster::m_DistToRoot, m_MakeTrees, CClusterer::CSingleCluster::m_Tree, m_UnusedEntries, NON_CONST_ITERATE, NULL, and CClusterer::CSingleCluster::size().
Referenced by ComputeClustersFromLinks().
|
protected |
Join two elements and form a cluster.
Definition at line 789 of file clusterer.cpp.
References _ASSERT, CClusterer::CSingleCluster::AddElement(), CTreeNode< TValue, TKeyGetterP >::AddNode(), CLinks::SLink::first, CTreeNode< TValue, TKeyGetterP >::GetValue(), int, m_ClusterId, m_Clusters, m_MakeTrees, m_UnusedEntries, s_CreateTreeLeaf(), CLinks::SLink::second, CClusterer::CSingleCluster::SetMaxDistance(), ncbi::grid::netcache::search::fields::size, and CLinks::SLink::weight.
Referenced by ComputeClustersFromLinks().
|
protected |
Definition at line 385 of file clusterer.hpp.
Referenced by ComputeClustersFromLinks(), GetClusterId(), x_CreateCluster(), x_JoinClustElem(), x_JoinClusters(), and x_JoinElements().
|
protected |
Definition at line 379 of file clusterer.hpp.
Referenced by ComputeClusters(), ComputeClustersFromLinks(), GetClusterDistMatrix(), GetSingleCluster(), Reset(), SetPrototypes(), x_CanAddElem(), x_CanJoinClusters(), x_CreateCluster(), x_JoinClustElem(), x_JoinClusters(), and x_JoinElements().
|
protected |
Definition at line 378 of file clusterer.hpp.
Referenced by CClusterer(), ComputeClusters(), GetClusterDistMatrix(), GetDistMatrix(), Run(), SetDistMatrix(), and SetPrototypes().
|
protected |
Definition at line 382 of file clusterer.hpp.
Referenced by x_CanAddElem(), x_CanJoinClusters(), and x_Init().
Definition at line 384 of file clusterer.hpp.
Referenced by ComputeClustersFromLinks(), Reset(), Run(), SetLinks(), x_CanAddElem(), and x_CanJoinClusters().
|
protected |
Definition at line 388 of file clusterer.hpp.
Referenced by ComputeClustersFromLinks(), x_CanAddElem(), x_CanJoinClusters(), x_CreateCluster(), x_Init(), x_JoinClustElem(), x_JoinClusters(), and x_JoinElements().
|
protected |
Definition at line 381 of file clusterer.hpp.
|
protected |
Definition at line 389 of file clusterer.hpp.
Referenced by ComputeClustersFromLinks(), and x_Init().
|
protected |
Definition at line 380 of file clusterer.hpp.
Referenced by ComputeClusters(), ComputeClustersFromLinks(), GetTree(), GetTrees(), ReleaseTree(), ReleaseTrees(), Reset(), and ~CClusterer().
|
protected |
Definition at line 386 of file clusterer.hpp.
Referenced by x_CreateCluster(), x_JoinClusters(), and x_JoinElements().