I've been out of the loop for stats for a while, but is there a viable approach ...

lukego · on Oct 6, 2024

Is the question fundamentally: what's the relative likelihood of each number or clusters?

If so then estimating the marginal likelihood of each one and comparing them seems pretty reasonable?

(I mean in the sense of Jaynes chapter 20.)

disgruntledphd2 · on Oct 6, 2024

Unsupervised learning is hard, and the pick K problem is probably the hardest part.

For PCA or factor analysis, there's lots of ways but without some way of determining ground truth it's difficult to know if you've done a good job.

CrazyStat · on Oct 6, 2024

There are Bayesian nonparametric methods that do this by putting a dirichlet process prior on the parameters of the mixture components. Both the prior specification and the computation (MCMC) are tricky, though.