-
A user asked via e-mail about the seemingly conflicting metric results shown in the example on the README page. So I'll record my answer here for future reference. Here, Dunn and ASW suggest we want 2 or fewer clusters, whereas BIC and WRSS indicate more clusters are better. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The Dunn and ASW are from the traditional field of clustering, and are used to find distinct (well-separated on the mean/cluster trajectory) clusters. This makes most sense to use with non-probabilistic cluster methods (e.g., KmL, AHC). For longitudinal data, the Dunn and ASW are computed by comparing the distances of trajectories to the cluster trajectories. If there are very distinct clusters present in your data, it will be visible from these metrics. These metrics are useful to inspect because they are informative on the distinctiveness of the solution, but in case of high within-subject or within-cluster variability these metrics would falsely guide you to using a minimal number of clusters. Typically with longitudinal cluster analyses, we are using it as a pragmatic tool for decomposing the population heterogeneity into more manageable (preferably homogeneous) groups. Here, metrics that describe how well the model fits the data are more useful (e.g., BIC, AIC, BLRT, WRSS). BIC is a measure of the explanatory power of a model w.r.t. the data and model complexity. With methods such as KML and GBTM that don’t have a notion of within-cluster variability, the BIC can be arbitrarily decreased by introducing more clusters, but with diminishing returns. With such methods, we therefore seek for a balance between information gained and model complexity (number of clusters), also called the elbow method. |
Beta Was this translation helpful? Give feedback.
The Dunn and ASW are from the traditional field of clustering, and are used to find distinct (well-separated on the mean/cluster trajectory) clusters. This makes most sense to use with non-probabilistic cluster methods (e.g., KmL, AHC). For longitudinal data, the Dunn and ASW are computed by comparing the distances of trajectories to the cluster trajectories. If there are very distinct clusters present in your data, it will be visible from these metrics. These metrics are useful to inspect because they are informative on the distinctiveness of the solution, but in case of high within-subject or within-cluster variability these metrics would falsely guide you to using a minimal number of c…