Learning metrics for sequential data
The design of similarity measures for sequences is generally addressed from a model-based perspective. Specifically, hidden Markov models (HMMs) are usually employed as core components within a semiparametric generative model of a given set of sequences. The HMMs somehow capture the dyamincs underlying the sequential data in a probabilistic model. Our contributions in this line are two metrics for sequences.
In (Garcia-Garcia et al., 2009) we propose a similarity based on the KL divergence between generative models for the sequence. To compute this similarity, one has to train an HMM with each training sequence and come up with a Likelihood matrix L. The element L(i,j) is the likelihood of the Sequence j being generated by the model learnt with the Sequence i. Each normalized (to sum 1) column j of L can be reinterpreted as the probability of sequence j being generated by each of the models learned with all the training sequences. The similarity between two sequences is computed as the KL divergence between their corresponding columns in this normalized Likelihood matrix.
In (Garcia-Garcia et al. 2011) we propose the State-Space Dynamics (SSD) metric. This second similarity is focused on the transition probabilities induced by each sequence. The metric is computed in the following steps:
- Get a complete HMM with all the training sequences. This HMM should have a number of hidden states capable of representing all the emission states of all the classes of sequences included in the dataset.
- Obtain a transition matrix for every sequence in the dataset. This is achieved by running a single iteration of the M step of the HMM training (the update of the transition probabilities) with each sequence.
- Each row of the transition matrix of each sequence can be regarded as a discrete probability distribution. The similarity among two sequences is computed as the average of the divergences between the paired rows of the two transition matrices.
These two similarities have been succesfully evaluated in several clustering tasks, achieving significantly better results than other state-of-the-art semiparametric similarities
References