Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling

Fig. 1.

The graphical representation of the proposed model; shaded and hollow circles represent observed and latent variables, respectively, arrows imply the dependencies and plates embrace numbered incidences of events.

2 Methods

2.1 Probabilistic Generative Model

Our observation consists of K point sets, denoted as $\mathcal {X}_k=\{\mathrm {\mathbf {x}}_{kn}\}^{N_k}_{n=1}$ , $1\le k \le K$ , where $\mathrm {\mathbf {x}}_{kn}$ is a D dimensional feature vector corresponding to the nth landmark in the kth point set. The model can be explained as two interacting layers of mixture models. In the first (lower-dimension) layer, $\mathcal {X}_k$ is assumed to be a collection of D-dimensional samples from a GMM with M Gaussian components. Meanwhile, by concatenating the means of the GMM (with a consistent order), a vector representation for $\mathcal {X}_k$ can be derived in $M\cdot D$ dimension. Clustering and linear component analysis for $\mathcal {X}_k$ takes place in this space.

More specifically, we consider a mixture of J probabilistic principal component analyzers (MPPCA). A PPCA is essentially an $M\cdot D$ -dimensional Gaussian specified by a mean vector, $\bar{\varvec{\mu }}_{j}\in \mathcal {R}^{MD}$ , $1\le j \le J$ , and a covariance matrix having a subspace component in the form of $\mathrm {\mathbf {W}}_{j}\mathrm {\mathbf {W}}_{j}^T$ [13]. Here, ${\mathrm {\mathbf {W}}_{j}}$ is a ${MD\times L}$ dimensional matrix, whose column l, i.e. $\mathrm {\mathbf {W}}^{(l)}_{j}$ , represents one mode of variation for the cluster j. Let $\mathrm {\mathbf {v}}_{k}$ be an L dimensional vector of loading coefficients corresponding to $\mathcal {X}_k$ and let us define: $\varvec{\mu }_{jk}=\mathrm {\mathbf {W}}_{j}\mathrm {\mathbf {v}}_{k}+\bar{\varvec{\mu }}_{j}$ . These vectors can be thought of as variables that bridge the two layers of our model: In the higher dimension, $\varvec{\mu }_{jk}$ is a re-sampled representation of $\mathcal {X}_k$ in the space spanned by principal components of the jth cluster; meanwhile, if we partition $\varvec{\mu }_{jk}$ into a series of M subsequent vectors, and denote each as $\varvec{\mu }^{(m)}_{jk}$ , we obtain the means of D dimensional Gaussians of the corresponding GMM.

Let $\mathcal {Z}_k=\{\mathrm {\mathbf {z}}_{kn}\}^{N_k}_{n=1}$ be a set of $$N_k$$

, 1-of-M coded latent membership vectors for the points in $\mathcal {X}_k$ . Each $\mathrm {\mathbf {z}}_{kn}\in \{0,1\}^M$ is a vector of zeros, whose mth component equals one ( $z_{knm}=1$ ) indicates that $\mathrm {\mathbf {x}}_{kn}$ is a sample from the D dimensional Gaussian m. The precision (inverse of the variance) of Gaussians is globally denoted by $\beta \mathbf {I}_{D\times D}$ . Similarly, let $\mathrm {\mathbf {t}}_{k}\in \{0,1\}^J$ be a latent, 1-of-J coded vector whose component j being one ( $t_{kj}=1$ ) indicates the membership of the $\mathcal {X}_k$ to cluster j. The conditional pdf of $\mathrm {\mathbf {x}}_{kn}$ is then given by:

$\begin{aligned} p(\mathrm {\mathbf {x}}_{kn}|\mathrm {\mathbf {z}}_{kn},\mathrm {\mathbf {t}}_{k},\beta ,\mathrm {\mathbb {W}},\mathrm {\mathbf {v}}_{k})=\prod ^J_{j=1}\prod ^M_{m=1}\Big (\mathcal {N}(\mathrm {\mathbf {x}}_{kn}|\varvec{\mu }^{(m)}_{jk},\beta ^{-1} \varvec{I}_{D\times D})^{z_{knm}}\Big )^{t_{kj}} \end{aligned}$

(1)

where $\mathbb {W}=\{\mathrm {\mathbf {W}}_{j}\}^J_{j=1}$ is the set of principal component matrices. To facilitate our derivations, we introduce the following prior distributions over $\mathrm {\mathbf {W}}_{j}$ , $\mathrm {\mathbf {v}}_{k}$ , and $\beta$ , which are conjugate to the normal distribution in Eq. (1):

$\begin{aligned} \!\!p(\mathrm {\mathbf {W}}_{j})\!\!=\!\!\prod ^{L}_{l=1} \mathcal {N}(\mathrm {\mathbf {W}}^{(l)}_{j}|\varvec{0},\alpha ^{-1}_{jl} \varvec{I}),\quad p(\mathrm {\mathbf {v}}_{k})\!\!=\!\!\mathcal {N}(\mathrm {\mathbf {v}}_{k}|\varvec{0},\varvec{I}),\quad p(\beta )\!\!=\!\!\varGamma (\beta |a_0,b_0) \end{aligned}$

(2)

The hyper-parameters of the Gamma distribution in the last line are set to $a_0=10^{-3}$ and $$b_0=1$$

to have a flat prior over $\beta$ . Next, we respectively denote the mixture weights of GMMs and MPPCA by $\varvec{\pi }^{\mathrm {\mathbf {z}}}$ and $\varvec{\pi }^{\mathrm {\mathbf {t}}}$ vectors, each having a Dirichlet distribution as priors: $p(\varvec{\pi }^{\mathrm {\mathbf {z}}}) = \text {Dir}(\varvec{\pi }^{\mathrm {\mathbf {z}}}|\lambda ^{\mathrm {\mathbf {z}}}_{0})$ , $p(\varvec{\pi }^{\mathrm {\mathbf {t}}}) = \text {Dir}(\varvec{\pi }^{\mathrm {\mathbf {t}}}|\lambda ^{\mathrm {\mathbf {t}}}_{0})$ . where we set $\lambda ^{\mathrm {\mathbf {z}}}_{0}=\lambda ^{\mathrm {\mathbf {t}}}_{0}=10^{-3}$ . The conditional distributions of membership vectors of $\mathrm {\mathbf {z}}_{kn}$ (for points) and $\mathrm {\mathbf {t}}_{k}$ (for point sets) given mixing weights are specified by two multi-nomial distributions: $p(\mathrm {\mathbf {z}}_{kn}|\varvec{\pi }^{\mathrm {\mathbf {z}}}) = \prod ^{M}_{m=1}{(\pi ^{\mathrm {\mathbf {z}}}_{m})}^{z_{knm}}$ , and $p(\mathrm {\mathbf {t}}_{k}|\varvec{\pi }^{\mathrm {\mathbf {t}}}) = \prod ^{J}_{j=1}{(\pi ^{\mathrm {\mathbf {t}}}_{j})}^{t_{kj}}$ , where $0\le \pi ^{\mathrm {\mathbf {z}}}_{m}$ , $0\le \pi ^{\mathrm {\mathbf {t}}}_{j}$ are the components m, j of $\varvec{\pi }^{\mathrm {\mathbf {z}}}$ , $\varvec{\pi }^{\mathrm {\mathbf {t}}}$ , respectively. We now construct the joint pdf of the sets of all random variables, by assuming (conditional) independence and multiplying the pdfs where needed. Let $\mathbb {X}=\{\mathcal {X}_k\}^K_{k=1}$ , $\mathbb {Z}=\{\mathcal {Z}_k\}^K_{k=1}$ , $\mathbb {V}=\{\mathrm {\mathbf {v}}_{k}\}^K_{k=1}$ , and $\mathbb {T}=\{\mathrm {\mathbf {t}}_{k}\}^K_{k=1}$ , then the distributions of these variables can be written as:

$\begin{aligned}&p(\mathrm {\mathbb {W}})\!\!=\!\!\prod _j\! p(\mathrm {\mathbf {W}}_{j}|\varvec{\alpha }_j),\quad p(\mathbb {Z}|\varvec{\pi }^{\mathrm {\mathbf {z}}})\!\!=\!\!\prod _k\!p(\mathcal {Z}_k|\varvec{\pi }^{\mathrm {\mathbf {z}}})\!\!=\!\!\prod _k\!\prod _n\! p(\mathrm {\mathbf {z}}_{kn}),\,\, p(\mathbb {T}|\varvec{\pi }^{\mathrm {\mathbf {t}}})\!\!=\!\!\prod _k\! p(\mathrm {\mathbf {t}}_{k}|\varvec{\pi }^{\mathrm {\mathbf {t}}}) \nonumber \\&p(\mathbb {V})=\prod _k p(\mathrm {\mathbf {V}}_{k}),\quad p(\mathbb {X}|\mathbb {Z},\mathbb {T},\mathrm {\mathbb {W}},\mathbb {V},\beta )=\prod _k p(\mathcal {X}_k|\mathcal {Z}_k,\mathrm {\mathbf {t}}_{k},\beta ,\mathbb {\mathrm {\mathbb {W}}},\mathrm {\mathbf {v}}_{k}), \end{aligned}$

(3)

$\begin{aligned}&p(\mathcal {X}_k|\mathcal {Z}_k,\mathrm {\mathbf {t}}_{k},\beta ,\mathbb {\mathrm {\mathbb {W}}},\mathrm {\mathbf {v}}_{k})=\prod ^{N_k}_{n=1} p(\mathrm {\mathbf {x}}_{kn}|\mathrm {\mathbf {z}}_{kn},\mathrm {\mathbf {t}}_{k},\beta ,\mathrm {\mathbb {W}},\mathrm {\mathbf {v}}_{k}) \end{aligned}$

(4)

Having defined the required distributions through Eqs. (1)-(3), the distribution of the complete observation is given as

$\begin{aligned} \!\!\!\!\!\!p(\mathbb {X}\!,\!\mathbb {Z},\!\mathbb {T},\!\!\mathrm {\mathbb {W}},\!\mathbb {V},\!\varvec{\pi }^{\mathrm {\mathbf {t}}}\!,\varvec{\pi }^{\mathrm {\mathbf {z}}}\!,\beta \!)= & {} p(\mathbb {X}|\mathbb {Z},\!\mathbb {T},\!\mathrm {\mathbb {W}},\!\mathbb {V}\!,\!\beta )p(\mathbb {Z}|\varvec{\pi }^{\mathrm {\mathbf {z}}}\!)p(\mathbb {T}|\varvec{\pi }^{\mathrm {\mathbf {t}}}\!)p(\!\varvec{\pi }^{\mathrm {\mathbf {z}}}\!)p(\!\varvec{\pi }^{\mathrm {\mathbf {t}}}\!)p(\!\mathrm {\mathbb {W}}\!)p(\!\mathbb {V}\!)p(\!\beta ) \end{aligned}$

(5)

Figure 1 is a graphical representation for the generative model considered in this paper. Given observations (colored dark gray) as D dimensional points, our problem is to estimate the posterior distributions of all the latent random variables (hollow circles) and hyper-parameters, which include the discrete cluster and the continuous variables (e.g. means and modes of variations).

2.2 Approximated Inference

If we denote the set of latent variables as $\mathbf{{\theta }}=\{\mathbb {Z},\mathbb {T},\mathrm {\mathbb {W}},\mathbb {V},\varvec{\pi }^{\mathrm {\mathbf {t}}},\varvec{\pi }^{\mathrm {\mathbf {z}}},\beta \}$ , direct inference of $p(\varvec{\theta }|\mathbb {X})$ (as our objective) is analytically intractable thus an approximated distribution, $q(\varvec{\theta })$ , is sought. Owing to the dimensionality of the data, we prefer Variational Bayes (VB) over sampling based methods. The VB principle for obtaining $q(\varvec{\theta })$ is explained briefly. The model evidence, i.e. $p(\mathbb {X})$ ¹, can be decomposed as $p(\mathbb {X})=\mathcal {L}+\text {KL}( p(\varvec{\theta }|\mathbb {X})||q(\varvec{\theta }))$ , where $0\le \text {KL}(\cdot ||\cdot )$ denotes the Kullback-Leilber divergence, and

$\begin{aligned} \mathcal {L}=\int q(\varvec{\theta })\ln {\frac{p(\mathbb {X},\varvec{\theta })}{q(\varvec{\theta })}}d\varvec{\theta } \le p(\mathbb {X}) \end{aligned}$

(6)

is a lower bound on $p(\mathbb {X})$ . To obtain $q(\varvec{\theta })$ , the $\text {KL}$ divergence between the true and the approximated posterior should be minimized. However, this is not feasible because the true posterior is not accessible to us. Thus, $q(\varvec{\theta })$ can be computed by maximizing $\mathcal {L}$ . We approximate the true posterior as a factorized form, i.e., $q(\varvec{\theta })=\prod _i q(\theta _i)$ , where $\theta _i$ refers to any of our latent variables. This factorization leads to the following tractable result: let $\varepsilon$ be the variable of interest in $\varvec{\theta }$ , and $\xi =\varvec{\theta }-\varepsilon$ , then the variational posterior of $\varepsilon$ is given by $\ln {q(\varepsilon )}=\langle \ln {p(\mathbb {X},\varvec{\theta })}\rangle _\xi +\text {const}$ , where $p(\mathbb {X},\varvec{\theta })$ is given in Eq. (5), $\langle \cdot \rangle _\xi$ denotes the expectation w.r.t. to the product of $q(\cdot )$ of all variable in $\xi$ .

2.3 Update of Posteriors and Hyper-Parameters

In this section, we provide equations to update the variational posteriors. Thanks to conjugacy of priors to likelihoods, these derivations are done by inspecting expectations of logarithms and matching posteriors to their corresponding likelihood template forms. Detailed proof of our derivations is skipped for brevity. Starting from $\mathbb {Z}$ variables we have $q(\mathbb {Z})=\prod _{k}q(\mathcal {Z}_k)=\prod _{k,m,n} (r_{knm})^{z_{knm}}$ Under this equation, we have $\langle z_{knm}\rangle =r_{knm}$ , where the right hand side can be computed using the following relationships:

$\begin{aligned} \!\!\!\!r_{knm}=\frac{\rho _{knm}}{\sum _{m'}\rho _{knm'}},\, \ln {\rho _{knm}} = -\frac{\langle \beta \rangle }{2}\sum _{j}\langle t_{kj}\rangle \langle |\mathrm {\mathbf {x}}_{kn}-\varvec{\mu }^{(m)}_{jk}|^2\rangle \!+\!\langle \ln {\pi ^{\mathrm {\mathbf {z}}}_{m}}\rangle \end{aligned}$