Optimal Transport Averaging of Neuroimaging Data

. Simply put, we consider normalized histograms on a grid of locations, and assume the distance between any two locations is provided. Next, we extend Wasserstein distances to non-negative vectors of bounded mass.


Notations. Let d be the cardinal of $$\varOmega $$. Relabeling arbitrarily all the elements of $$\varOmega $$ as $$\{1,\cdots , d\}$$ we represent the set of non-negative measures on $$\varOmega $$ by the non-negative orthant $$\mathbb {R}^{d}_+$$. Let $$\mathbf {1}_d$$ be the d-dimensional vector of ones. For any vector $$u\in \mathbb {R}^d$$ we write $$|u|_1$$ for the $$l_1$$ norm of u, $$\sum _{i=1}^d |u_i|$$. When $$u\in \mathbb {R}^{d}_+$$ we also call $$|u|_1$$ the (total) mass of vector u. Let $$M=[m_{ij}]\in \mathbb {R}^{d\times d}_+$$ be the matrix that describes the metric between all locations in $$\varOmega $$, namely $$m_{ij}=D(i,j), i,j\le d$$.

Wasserstein Distance for Normalized Histograms. Consider two vectors $$a,b\in \mathbb {R}^{d}_+$$ such that $$|a|_1=|b|_1$$. Both can be interpreted as histograms on $$\varOmega $$ of the same mass. A non-trivial example of such normalized data in medical imaging is the discretized ODF used for diffusion imaging data [10]. For $$p\ge 1$$, the p-Wasserstein distance $$W_p(a,b)$$ between a and b is the $$p^\text {th}$$ root of the optimum of a linear program, known as a transportation problem [4, §7.2]. A transport problem is a network flow problem on a bipartite graph with cost $$M^p$$ (the pairwise distance matrix M raised element-wise to the power p), and feasible set of flows U(ab) (known as the transportation polytope of a and b), where U(ab) is the set of $$d\times d$$ nonnegative matrices such that their row and column marginals are equal to a and b respectively:


$$\begin{aligned} U(a,b) \overset{\text {def}}{=}\{ T\in \mathbb {R}_+^{d\times d}\; |\; T\mathbf {1}_d = a,\, T^T\mathbf {1}_d = b \}. \end{aligned}$$

(1)
Given the constraints induced by a and b, one naturally has that U(ab) is empty when $$|a|_1\ne |b|_1$$ and non-empty when $$|a|_1=|b|_1$$ (in which case one can easily check that the matrix $$ab^T/|a|_1$$ belongs to that set). The p-Wasserstein distance $$W_p(a,b)$$ raised to the power p (written $$W^p_p(a,b)$$ below) is equal to the optimum of a parametric Optimal Transport ($$\mathbf {OT}$$) problem on $$d^2$$ variables,


$$\begin{aligned} W_p^p(a,b) = \mathbf {OT}(a,b,M^p)\overset{\text {def}}{=}\min _{T\in U(a,b)}\langle T , M^p\,\rangle , \end{aligned}$$

(2)
parameterized by the marginals ab and matrix $$M^p$$.

Optimal Transport for Unnormalized Measures. If the total masses of a and b differ, namely $$|a|_1\ne |b|_1$$, the definition provided above is not useful because $$U(a,b)=\emptyset $$. Several extensions of the OT problem have been proposed in that setting; we recall them here for the sake of completeness. In the computer vision literature, [23] proposed to handle that case by: (i) relaxing the equality constraints of U(ab) to inequality constraints $$T\mathbf {1}_d \le a,\, T^T\mathbf {1}_d \le b$$ in Eq. (1); (ii) adding an equality constraint on the total mass of the solution $$\mathbf {1}_d^TT\mathbf {1}_d=\min (|a|_1,|b|_1)$$; (iii) dividing the minimum of $$\langle T , M\,\rangle $$ under constraints (i,ii) by $$\min (|a|_1,|b|_1)$$. This modification does not, however, result in a metric. [20] proposed later a variant of this approach called EMD-hat that incorporates constraints (i,ii) but (iii’) adds to the optimal cost $$\langle T^\star , M\,\rangle $$ a constant times $$\min (|a|_1,|b|_1)$$. When that constant is large enough M, [20] claim that EMD-hat is a metric. We also note that [2] proposed a quadratic penalty between the differences of masses and made use of a dynamic formulation of the transportation problem.

Kantorovich Norms for Signed Measures. We propose to build on early contributions by Kantorovich to define a generalization of optimal transport distance for unnormalized measures, making optimal transport applicable to a wider class of problems, such as the averaging of functional imaging data. [18] proposed such a generalization as an intermediary result of a more general definition, the Kantorovich norm for signed measures on a compact metric space, which was itself extended to separable metric spaces by [15]. We summarize this idea here by simplifying it to the case of interest in this paper where $$\varOmega $$ is a finite (of size d) probability space, in which case signed measures are equivalent to vectors in $$\mathbb {R}^d$$. [18] propose first a norm for vectors z in the orthogonal of $$\mathbf {1}_d$$ (vectors z such that $$z^T\mathbf {1}_d=0$$), by considering the 1-Wasserstein distance between the positive and negative parts of z, $$||z||_K= W_1(z_+,z_-)$$. A penalty vector $$\varDelta \in \mathbb {R}^{d}_+$$ is then introduced to define the norm $$||x||_{K}$$ of any vector x as the minimal value of $$||z||_K + \varDelta ^T|z-x|$$ when z is taken in the space of all vectors z with zero sum, and $$|z-x|$$ is the element-wise absolute value of the difference of vectors z and x. For this to define a true norm in $$\mathbb {R}^d$$, $$\varDelta $$ must be such that $$\varDelta _i\ge \max _j m_{ij}$$ and $$|\varDelta _i-\varDelta _j|\le m_{ij}$$. The distance between two arbitrary non-negative vectors ab of different mass is then defined as $$||a-b||_K$$. As highlighted by [26, p.108], and if we write $$e_i$$ for the $$i^{\text {th}}$$ vector of the canonical basis of $$\mathbb {R}^d$$, this norm is the maximal norm in $$\mathbb {R}^d$$ such that for any $$i,j \le d$$, $$||e_i-e_j||_K=m_{ij}$$, namely the maximal norm in the space of signed measures on $$\varOmega $$ such that the norm between two Dirac measures coincides with $$\varOmega $$’s metric between these points.

Kantorovich Distances for Unnormalized Nonnegative Measures. Reference [14] noticed that Kantorovich’s distance between unnormalized measures can be cast as a regular optimal transport problem. Indeed, one simply needs to: (i) add a virtual point $$\omega $$ to the set $$\varOmega =\{1,\cdots ,d\}$$ whose distance $$D(i,\omega )=D(\omega ,i)$$ to any element i in $$\varOmega $$ is set to $$\varDelta _i\,$$; (ii) use that point $$\omega $$ as a buffer when comparing two measures of different mass. The appeal of Kantorovich’s formulation in the context of this work is that it boils down to a classic optimal transport problem, which can be approximated efficiently using the smoothing approach of [6] as discussed in Sect. 3. To simplify our analysis in the next section, we only consider non-negative vectors (histograms) $$a\in \mathbb {R}^{d}_+$$ such that their total mass is upper bounded by a known positive constant. This assumption alleviates the definition of our distance below, since it does not require to treat separately the cases where either $$|a|_1 \ge |b|_1$$ or $$|a|_1 < |b|_1$$ when comparing $$a,b\in \mathbb {R}^{d}_+$$. Note also that this assumption always holds when dealing with finite collections of data. Without loss of generality, this is equivalent to considering vectors a in $$\mathbb {R}^{d}_+$$ such that $$|a|_1\le 1$$ with a simple rescaling of all vectors by that constant. We define next the Kantorovich metric on $$S_d$$, where $$S_d=\{u\in \mathbb {R}^{d}_+, |u|_1 \le 1\}$$.

Sep 16, 2016 | Posted by in GENERAL RADIOLOGY | Comments Off on Optimal Transport Averaging of Neuroimaging Data

Full access? Get Clinical Tree

Get Clinical Tree app for offline access