Orderings of Events in Disease Progression

and an ordering $$\sigma = ( \sigma (1) , \dots , \sigma (N) )$$, where $$\sigma (k) = i$$ means that event $$e_{i}$$ occurs in position k. In practise we only observe a snapshot of the event sequence for each subject, taken at an unknown stage k. If a subject is at stage k in the sequence $$\sigma $$ the events $$e_{\sigma (1)} \dots e_{\sigma (k)}$$ have occurred and events $$e_{\sigma (k+1)} \dots e_{\sigma (N)}$$ have yet to occur. This adduces a partition of the event set, or partial ranking, $$\gamma _{k} = e_{\sigma (1)}, \dots , e_{\sigma (k) } | e_{\sigma (k+1)} , \dots , e_{\sigma (N) }$$, where the vertical bar indicates that the first set of events precedes the second. The occurence of event $$e_{i}$$ in subject j is informed by biomarker measurement $$x_{ij}$$. The generative model of the biomarker data is

$$\begin{aligned} k_{j} \sim P(k) , \end{aligned}$$
$$\begin{aligned} x_{\sigma (i) , j} \sim p(x_{\sigma (i) , j} | e_{\sigma (i)}) \text { if } i \le k_{j} , \end{aligned}$$
$$\begin{aligned} x_{\sigma (i) , j} \sim p(x_{\sigma (i) , j} | \lnot e_{\sigma (i)}) \text { otherwise}. \end{aligned}$$
p(x | e) and $$p(x | \lnot e)$$ are probability density functions on observing biomarker measurement x given that event e has or has not occurred respectively. P(k) is a prior on the disease stage k.

2.2 The Generalised Mallows Event-Based Model

We formulate the generalised Mallows event-based model by using a generalised Mallows model to parameterise the variance in a central event sequence $$\pi $$ through the spread parameter $$\varvec{\theta } = (\theta _{1}, \dots , \theta _{N-1})$$. Each subject then has their own latent ordering $$\sigma _{j}$$, which is assumed to be a sample from a generalised Mallows model. The generative model of the biomarker data in the event-based model is therefore preceded by
$$\begin{aligned} \pi , \varvec{\theta } \sim P (\pi , \varvec{\theta } | \nu , \varvec{r}), \end{aligned}$$
$$\begin{aligned} \sigma _{j} \sim GM(\pi , \varvec{\theta } ). \end{aligned}$$
$$GM(\pi , \varvec{\theta }) = \frac{1}{\psi (\varvec{\theta })} \exp \left[ -d_{\varvec{\theta }} (\pi , \sigma ) \right] $$ is a generalised Mallows distribution with $$\psi (\varvec{\theta }) = \prod _{j=1}^{n-1} \psi _{n-j} (\theta _{j} ) = \prod _{j=1}^{n-1} \frac{1-e^{-(n-j+1) \theta _{j} }}{ 1-e^{-\theta _{j}} }$$. $$d_{\varvec{\theta }} (\pi , \sigma )$$ is the generalised Kendalls tau distance [8], which penalises the number of pairwise disagreements between sequences. $$P (\pi , \varvec{\theta } | \nu , \varvec{r})$$ is a conjugate prior over the generalised Mallows distribution parameters of the form $$P (\pi , \varvec{\theta } | \nu , \varvec{r}) \propto \exp \left( -\nu \sum _{j} [\theta _{j} r_{j} + \ln \psi _{n-j} ( \theta _{j} ) ] \right) $$ [12].

2.3 Dirichlet Process Mixtures of Generalised Mallows Event-Based Models

Dirichlet process mixtures of generalised Mallows models assume that each subject has their own central ordering $$\pi _{j}$$ and spread parameters $$\varvec{\theta }_{j}$$, which are sampled from a discrete distribution G that is drawn from a Dirichlet process [9]. A Dirichlet process mixture is a generative clustering model where the number of clusters is a random variable, meaning that the number of clusters is detected automatically depending on the concentration parameter $$\alpha $$. The generative model of the biomarker data in the event-based model is now preceded by the process
$$\begin{aligned} G \sim DP( \alpha ,P(\pi , \varvec{\theta } | \nu , \varvec{r}) ), \end{aligned}$$
$$\begin{aligned} \pi _{j}, \varvec{\theta }_{j} \sim G , \end{aligned}$$
$$\begin{aligned} \sigma _{j} \sim GM(\pi _{j}, \varvec{\theta }_{j} ) , \end{aligned}$$
where $$DP( \alpha ,P (\pi , \varvec{\theta } | \nu , \varvec{r}) )$$ is a Dirichlet process [9]. Each data point $$\pi _{j}$$ can be characterised by an association with a cluster label $$c_{j} \in 1 , \dots , C$$ and each cluster c with a set of generalised Mallows parameters $$\sigma _{c}$$ and $$\varvec{\theta }_{c}$$.

3 Inference

3.1 The Event-Based Model

Inference in the event-based model can be performed by taking Markov Chain Monte Carlo (MCMC) samples of $$P(\sigma |X) = \frac{P(X|\sigma )P(\sigma )}{P(X)} $$ where
$$\begin{aligned} P(X | \sigma ) = \prod _{j=1}^{J} \left[ \sum _{k=0}^{K} P(k) \left( \prod _{i=1}^{k} p(x_{\sigma (i) , j} | e_{\sigma (i)}) \prod _{i=k+1}^{N} p(x_{\sigma (i) , j} | \lnot e_{\sigma (i)} ) \right) \right] \!. \end{aligned}$$
(1)

3.2 The Generalised Mallows Event-Based Model

We use Gibbs sampling to infer the parameters of the generalised Mallows event-based model. This consists of two stages. First, generating a set of sample event sequences $$\sigma _{1:J}$$. We sample from an augmented model [10], by alternating between sampling a subject’s ordering $$\sigma _{j}$$ and disease stage $$k_{j}$$, which are used to deterministically reconstruct their partial ranking $$\gamma _{j}$$. The Gibbs sampling updates are therefore
$$\begin{aligned} \sigma ^{(j)} \sim P(\sigma | \varvec{\gamma } = \gamma _{j} , \pi , \varvec{\theta }) , \end{aligned}$$
$$\begin{aligned} k^{(j)} \sim P(k | \varvec{\sigma } = \sigma _{j} , X_{j} ) . \end{aligned}$$
Second, sampling the model parameters given the set of sample orderings $$\sigma _{1:J}$$ using the updates
$$\begin{aligned} \pi \sim P(\pi | \varvec{\theta }, \nu , \varvec{r} , \sigma _{1:J} ), \end{aligned}$$
$$\begin{aligned} \theta _{k} \sim P(\theta _{k} | \pi , \nu , \varvec{r} , \sigma _{1:J} ) .\end{aligned}$$

3.3 Dirichlet Process Mixtures of Generalised Mallows Event-Based Models

We formulate another Gibbs sampler to infer the parameters of Dirichlet process mixtures of generalised Mallows event-based models. We generate a set of candidate sample orderings $$\sigma _{1:J,1:C}$$, disease stages $$k_{1:J,1:C}$$, and partial rankings $$\gamma _{1:J,1:C}$$, which are conditioned on the parameters for each cluster via the updates
$$\begin{aligned} \sigma ^{(j,c)} \sim P(\sigma | \varvec{\gamma } = \gamma _{jc} , \pi _{c}, \varvec{\theta }_{c}) , \end{aligned}$$
$$\begin{aligned} k^{(j,c)} \sim P(k | \varvec{\sigma } = \sigma _{jc} , X_{j} ) . \end{aligned}$$
From these samples we sample the cluster assignment $$c_{j}$$ of each subject conditioned on the cluster assignments of the other subjects $$c_{-j}$$, where $$c_{-j}$$ is the set of cluster assignments for all subjects except subject j, the subject’s sample ordering for each cluster $$\sigma _{j,1:C}$$, disease stage $$k_{j,1:C}$$ and their biomarker data $$X_{j}$$. We then update the generalised Mallows model parameters for each cluster, $$\pi _{c}$$ and $$\varvec{\theta }_{c}$$, from the set of subject orderings assigned to each cluster, $$\varvec{\sigma }_{c}$$. So we have the updates
Sep 16, 2016 | Posted by in GENERAL RADIOLOGY | Comments Off on Orderings of Events in Disease Progression

Full access? Get Clinical Tree

Get Clinical Tree app for offline access