I just realized that I can meaningfully compare many HMMs if I sort the states. If I put in initial emission distribution means so that for a univariate case the first state is the one with the lowest mean whereas the last one is the one with the highest mean, then I can ensure that I get the same state labeling for all trained models. This naturally extends to the multivariate case.

I also found choosing initial values is pretty important. I found a suggestion somewhere which basically suggests partitioning the data to \(N_{states}\) equal groups, and assigning their estimated parameters (e.g. mean and standard deviation for a normal distribution) as initial values to emission distributions of states.

If our data is in an array \(x\), the R code for this is:

library(caret)Share on Twitter Share on Facebook

sample_groups <- simplify2array(createFolds(train$x, k=nstates))

sample_groups <- apply(sample_groups, 2, function(a) {x[a]})

## Comments

There are currently no comments

## New Comment