blocks_image
The model selection by the cophenetic coefficient has been proposed by Monti et al. 2003 and used by Brunet et al. 2004 for NMF in the consensus clustering framework. Although it does not provide "binary" decisions" on the overall clustering result (e. g. correct vs false), the cophenetic coefficient suggests which models may be better than others. In this case k4 and k12 appear to be overall reasonable overall clustering results. k12 was chosen for further downstream analysis, since the resulting clusters resembled more our current concepts of cell class identities than the more comprehensive k4 clusters, allthough it will be important to understand, on what basis these four sample clusters have been aggregated by sNMF.
More technically, the cophenetic corellation coeficient can be defined for our specific case as follows:
The cophenetic correlation coefficient is the Pearson correlation coefficient between pairwise distances of a set of objects and their cophenetic distances, which are derived from a hierarchical clustering. The cophenetic distance of two objects is defined as the intergroup dissimilarity at which the two observations are first combined into a single cluster[1,2].
A high cophenetic correlation coefficient conveys that the clustering dendrogram reflects the original distances well. In our setting, this implies that segregating the data into
k groups is well supported by the co-occurance data of the consensus clustering.
[1] Farris, J. S.
On the Cophenetic Correlation Coefficient Systematic Zoology, 1969, 18, 279-285
[2] R Development Core Team.
R: A Language and Environment for Statistical Computing 2007. Help files.