I just realized that I can meaningfully compare many HMMs if I sort the states. If I put in initial emission distribution means so that for a univariate case the first state is the one with the lowest mean whereas the last one is the one with the highest mean, then I can ensure that I get the same state labeling for all trained models. This naturally extends to the multivariate case.

I also found choosing initial values is pretty important. I found a suggestion somewhere which basically suggests partitioning the data to $$N_{states}$$ equal groups, and assigning their estimated parameters (e.g. mean and standard deviation for a normal distribution) as initial values to emission distributions of states.

If our data is in an array $$x$$, the R code for this is:

library(caret)sample_groups <- simplify2array(createFolds(train\$x, k=nstates))sample_groups <- apply(sample_groups, 2, function(a) {x[a]})
Currently unrated