Fig. 1.
(a) Real-world fluorescence microscopy images (top) and (bottom), depict the presence of protein and cell nuclei, respectively. (b) Proposed Probabilistic Graphical Model for Colocalization Estimation. Filled circles denote observed image data . Label images are hidden random variables. We estimate all parameters from data. The nature and strength of colocalization is modeled by parameter . Parameters model spatial smoothness of the labels in . In this paper, for simplicity, we use a single . Means and standard deviations model image intensities of objects and backgrounds in . (c) MAP label images , indicating object presence /absence, computed after optimal parameters are found by VBEM. Note: this paper focuses on estimating colocalization using VBEM, which is crucial to several clinical and scientific studies; the MAP label images are unused by VBEM and are shown only to provide additional insights into the proposed approach.
Let denote a random field modeling the pixel labels of the two objects. Specifically, let be a binary label image with I pixels with possible label values , where indicates the object’s presence at pixel i in ; similarly, indicates the object’s absence at pixel i in . We follow similar notations and semantics for . We assume that the spatial distributions of objects are (i) smooth within each image and (ii) codependent between images. To model these properties, each pixel has intra-image and inter-image neighbors.
We introduce a neighborhood system , where and denote sets of neighbors of pixel i in the same image (to which pixel i belongs) and the other image, respectively. To have isotropic neighborhoods, we employ Gaussian-weighted masks w where the ij-th element , where and are physical coordinates of pixels i and j, respectively. We use two such masks, i.e., (for intra-image smoothness) and (for inter-image colocalization) with underlying parameters and , respectively. We set pixels to restrict the neighborhood to 2 neighbors along each dimension (we restrict the neighborhood so that to be large enough to be able to model the longest-range direct spatial interactions between the biological entities; this is application dependent and informed by prior knowledge, but the results are fairly robust to the choice of this parameter. In this paper, pixels.
We model the prior probability mass function (PMF) of the label image pair L as
where the normalization constant (partition function) is a function of the smoothness parameter and the colocalization parameter . The motivation for this model is as follows. The terms involving are derived from a standard Ising model of smoothness on the label images. The term involving is novel and carefully designed as follows. First, decouples the two label images such that P (l) can be written as ; this indicates absence of colocalization. Second, because the labels are designed to be , a positive forces the interacting labels in the two images (i.e., labels in neighborhoods of each other) towards being equal (i.e., both or both ) to produce a high label image probability. This implies that the neighboring labels in the two images both likely indicate the presence of the objects or both indicate the absence of the objects. Third, similarly, a negative forces the interacting labels in the two images to be negative of each other, i.e., the presence of the object on one image indicates the absence of the other object in the other image.
(1)
Let denote a random field modeling the intensities of the two objects and the backgrounds. We assume that the intensities of each object are Gaussian distributed as a result of natural intensity variation, partial-volume effects, minor shading artifacts, etc. We assume that the data is corrupted by i.i.d. additive Gaussian noise. Let the intensities of the object and the background in the observed image , where , be Gaussian distributed with parameters and , respectively. Then, the likelihood of observing data z, given labels l, is
where maps the label value to the object number, i.e., if and if ; similarly for .
(2)
Let the set of parameters be , where the prior parameters are and the likelihood parameters are . We propose to formulate colocalization estimation as the following maximum-a-posteriori Bayesian estimation problem over all parameters :
A novelty in our approach is that we estimate (as well as ) automatically from the data. We observe that the optimization of the parameters and is non-trivial because of their involvement in the partition function that is intractable to evaluate. Furthermore, we optimize all parameters underlying the model directly from the data.
(3)
3.2 Expectation Maximization for Parameter Estimation
We treat the labels L as hidden variables and solve the parameter-estimation problem using EM. EM is an iterative optimization algorithm, each iteration comprising an E step and an M step. Consider that after iteration n, the parameter estimate is . Then, in iteration , the updated parameter estimate is obtained as follows.
The E step involves constructing the function
that is intractable to evaluate analytically. Furthermore, a Monte-Carlo simulation-based approximation of the expectation leads to challenges in reliable and efficient sampling; e.g., while Gibbs sampling of the label fields is computationally expensive, determining an appropriate burn-in period and detecting convergence of the Gibbs sampler pose serious theoretical challenges. Thus, we opt for variational inference and choose to approximate the posterior PMF by a factorizable analytical model
where we optimize the per-pixel factors, i.e., PMFs, to best fit the posterior. This factorized approximation for the posterior makes the function more tractable. We describe this strategy in the next section.
(4)
(5)
The M step subsequently maximizes the function over parameters . The E and M steps are repeated until convergence; we observe that a few iterations are suffice.
3.3 E Step: Variational Inference of the Factorized Posterior
Within each EM iteration, we use variational inference to optimize the factors underlying . To simplify notation, we omit where it is obvious. We rewrite
and denotes the Kullback-Leibler (KL) divergence between distributions A and B. Our goal is to find the maximal lower bound of the data log-probability , under the factorization constraint on F(L). We do this by maximizing , because guarantees to be a lower bound. We perform this functional optimization by iterative optimization of each of the factor functions; this is possible because the factors are designed to be independent of each other. We now incorporate the factorized form of F(L) and, without loss of generality, separate terms involving from the other terms. This yields
(6)
(7)