and an ordering , where
means that event
occurs in position k. In practise we only observe a snapshot of the event sequence for each subject, taken at an unknown stage k. If a subject is at stage k in the sequence
the events
have occurred and events
have yet to occur. This adduces a partition of the event set, or partial ranking,
, where the vertical bar indicates that the first set of events precedes the second. The occurence of event
in subject j is informed by biomarker measurement
. The generative model of the biomarker data is




2.2 The Generalised Mallows Event-Based Model
We formulate the generalised Mallows event-based model by using a generalised Mallows model to parameterise the variance in a central event sequence
through the spread parameter
. Each subject then has their own latent ordering
, which is assumed to be a sample from a generalised Mallows model. The generative model of the biomarker data in the event-based model is therefore preceded by

is a generalised Mallows distribution with
.
is the generalised Kendalls tau distance [8], which penalises the number of pairwise disagreements between sequences.
is a conjugate prior over the generalised Mallows distribution parameters of the form
[12].





![$$GM(\pi , \varvec{\theta }) = \frac{1}{\psi (\varvec{\theta })} \exp \left[ -d_{\varvec{\theta }} (\pi , \sigma ) \right] $$](/wp-content/uploads/2016/09/A339424_1_En_56_Chapter_IEq15.gif)



![$$P (\pi , \varvec{\theta } | \nu , \varvec{r}) \propto \exp \left( -\nu \sum _{j} [\theta _{j} r_{j} + \ln \psi _{n-j} ( \theta _{j} ) ] \right) $$](/wp-content/uploads/2016/09/A339424_1_En_56_Chapter_IEq19.gif)
2.3 Dirichlet Process Mixtures of Generalised Mallows Event-Based Models
Dirichlet process mixtures of generalised Mallows models assume that each subject has their own central ordering
and spread parameters
, which are sampled from a discrete distribution G that is drawn from a Dirichlet process [9]. A Dirichlet process mixture is a generative clustering model where the number of clusters is a random variable, meaning that the number of clusters is detected automatically depending on the concentration parameter
. The generative model of the biomarker data in the event-based model is now preceded by the process
where
is a Dirichlet process [9]. Each data point
can be characterised by an association with a cluster label
and each cluster c with a set of generalised Mallows parameters
and
.











3 Inference
3.1 The Event-Based Model
Inference in the event-based model can be performed by taking Markov Chain Monte Carlo (MCMC) samples of
where

![$$\begin{aligned} P(X | \sigma ) = \prod _{j=1}^{J} \left[ \sum _{k=0}^{K} P(k) \left( \prod _{i=1}^{k} p(x_{\sigma (i) , j} | e_{\sigma (i)}) \prod _{i=k+1}^{N} p(x_{\sigma (i) , j} | \lnot e_{\sigma (i)} ) \right) \right] \!. \end{aligned}$$](/wp-content/uploads/2016/09/A339424_1_En_56_Chapter_Equ1.gif)
(1)
3.2 The Generalised Mallows Event-Based Model
We use Gibbs sampling to infer the parameters of the generalised Mallows event-based model. This consists of two stages. First, generating a set of sample event sequences
. We sample from an augmented model [10], by alternating between sampling a subject’s ordering
and disease stage
, which are used to deterministically reconstruct their partial ranking
. The Gibbs sampling updates are therefore
Second, sampling the model parameters given the set of sample orderings
using the updates










3.3 Dirichlet Process Mixtures of Generalised Mallows Event-Based Models
We formulate another Gibbs sampler to infer the parameters of Dirichlet process mixtures of generalised Mallows event-based models. We generate a set of candidate sample orderings
, disease stages
, and partial rankings
, which are conditioned on the parameters for each cluster via the updates
From these samples we sample the cluster assignment
of each subject conditioned on the cluster assignments of the other subjects
, where
is the set of cluster assignments for all subjects except subject j, the subject’s sample ordering for each cluster
, disease stage
and their biomarker data
. We then update the generalised Mallows model parameters for each cluster,
and
, from the set of subject orderings assigned to each cluster,
. So we have the updates















Full access? Get Clinical Tree

