Tensor Estimation, Regularization and Classification

with the tensor D being the unknown variable and b a value that depends on the acquisition settings. The estimation of the tensors in the volume domain Ω can be done through direct inference (6 acquisitions are at least available), which is equivalent to minimizing:

${E}_{data}\left(\mathbf{D}\right)={\displaystyle {\int}_{\varOmega}{\displaystyle \sum_{k=1}^n\;{\left( \log \left({S}_k\left(\mathbf{x}\right)/{S}_0\left(\mathbf{x}\right)\right)+ b{\mathbf{g}}_k^t\mathbf{D}\;\left(\mathbf{x}\right){\mathbf{g}}_k\right)}^2} d\mathrm{x}}$

This energy is based on the linearized diffusion tensor model which is reasonable for moderate values of SNR [23]. Such a direct estimation is quite sensitive to noise, on the other hand, it refers to a convex term, which is rather convenient when seeking its lowest potential. The most common approach to account for noise is through the use of an additional regularization term which constrains the estimation of D to be locally smooth. One of the most prominent uses of DTI is fiber extraction. Therefore it is natural to assume that locally these fibers do have similar orientations. In such a context, the tensor can be expressed as a linear combination of the tensors lying in its neighborhood since they are likely to represent the same population of fibers. Such a regularization constraint was introduced in the case of image restoration in [1]. This assumption still holds at the boundaries between different groups of fibers as long as the linear combination is thoroughly chosen to ensure that the contribution of tensors belonging to a different fiber population is negligible. It is also more accurate than the underlying assumption of total-variation based approaches where the tensor field is considered piecewise constant. This leads us to define the following regularization component:

${E_{smooth}}(\textbf{D}) = \int_{\Omega} {\left\| {\textbf{D}(\textbf{x}) - \frac{1}{{Z(\textbf{x})}}\int_{y \in {{\mathcal{N}}_x}} {w(\textbf{x}, \textbf{y})\textbf{D}(\textbf{y})d\textbf{y}} } \right\|_F^2} d\textbf{x}$

where w(x, y) reflects the similarity between tensors D(x) and D(y), ||A|| _F) being the Frobenius norm ${\parallel}A{\parallel}_F=\sqrt{ t r\left({A}^t A\right)}$ and Z(x) is a normalization factor, i.e $Z(\mathbf{x}) = \int_{\mathbf{y} \in \mathcal{N}_{\mathbf{x}}} {\textit{w}}(\mathbf{x},\mathbf{y})d\mathbf{y}$ . The most critical aspect of such an approximation model is the definition of weights, measuring the similarity between tensors within the local neighborhood. The use of Gaussian weights is a common weight’s selection, i.e $\left[ {w(\mathbf{x},\mathbf{y}) = {e^{\frac{{ - {d^2}(\mathbf{D}(\mathbf{x}),\mathbf{D}(\mathbf{y}))}}{{2{\sigma ^2}}}}}} \right]$ , where d(.; .) is a distance between tensors and σ a scale factor. In the context of direct estimation and regularization it is more appropriate to define similarities directly on the observation space rather than the estimation space. Such a choice will lead to a tractable estimation, while preserving the convexity of the cost function. Our distance definition as well as our minimization step are based on the representation of symmetric positive semi-definite matrices S ₊³ as a convex closed cone in the Hilbert space of symmetric matrices S ³, where the standard scalar product is defined by $\langle A, B \rangle _{F} = tr(A^{t} B)$ which induces the corresponding Frobenius norm.

3.1 Measuring Similarities from diffusion weighted images

We aim at simultaneously estimating and smoothing the tensor field, therefore the weights w(x, y) in E _smoothshould be precalculated using the raw data. The most straightforward estimation of the distances can be done through the algebraic distance between the log(S _k/S ₀) for two neighborhood voxels in any direction

$d\!\left(\mathbf{D}(\mathbf{x}),\mathbf{D}\left(\mathbf{y}\right)\right)=\frac{1}{b}\sqrt{{{\displaystyle \sum_{k=1}^N\left( \log \left(\frac{S_k\left(\mathbf{x}\right)}{S_0\left(\mathbf{x}\right)}\right)- \log \left(\frac{S_k\left(\mathbf{y}\right)}{S_0\left(\mathbf{y}\right)}\right)\right)}}^2}$

One can easily show that such an expression does not reflect similarity between tensors according to the norm ∥ ⋅ ∥ _F. In fact, this leads to

$$$ d\!\left(\mathbf{D}(\mathbf{x}),\mathbf{D}\left(\mathbf{y}\right)\right)=\sqrt{{\displaystyle \sum_{k=1}^N<\mathbf{D}\left(\mathbf{x}\right)}-\mathbf{D}\left(\mathbf{y}\right)\!,{\mathbf{G}}_k{>}_F^2} $$ ” src=”/wp-content/uploads/2016/09/A151032_1_En_24_Chapter_Eque.gif”></DIV></DIV></DIV>where <SPAN class=EmphasisTypeBold>G</SPAN> <SUB><SPAN class=EmphasisTypeItalic>k</SPAN> </SUB> = <SPAN class=EmphasisTypeBold>g</SPAN> <SUB><SPAN class=EmphasisTypeItalic>k</SPAN> </SUB><SPAN class=EmphasisTypeBold>g</SPAN> <SUB><SPAN class=EmphasisTypeItalic>k</SPAN> </SUB><SUP><SPAN class=EmphasisTypeItalic>t</SPAN> </SUP>do not form necessarily an orthonormal basis. We use a Gram-Schmidt orthogonalization scheme to calculate an orthonormal basis <SPAN id=IEq5 class=InlineEquation><IMG alt=$ such that ${\tilde{\mathbf{G}}}_k={\displaystyle {\sum}_l{\alpha}_{k l}{\mathbf{G}}_l}$ (each new vector of the new basis is a linear combination of the vectors of the initial basis). This procedure allows us to have an approximation of ||D(x) − D(y)||_Fdirectly from the raw data S _kand S ₀ as follows

$$$ \begin{array}{l}{\parallel}(\mathbf{D}(\mathbf{x})-\mathbf{D}\mathbf{y}){\parallel}_F=\sqrt{{\displaystyle \sum_{k=1}^N<\mathbf{D}\left(\mathbf{x}\right)}-\mathbf{D}\left(\mathbf{y}\right),{\tilde{\mathbf{G}}}_k{>}_F^2}\\ {}\frac{1}{b}=\sqrt{{{\displaystyle \sum_{k=1}^N\left({\displaystyle \sum_l{\alpha}_{k l}\left( \log \left(\frac{S_k\left(\mathbf{x}\right)}{S_0\left(\mathbf{x}\right)}\right)- \log \left(\frac{S_k\left(\mathbf{y}\right)}{S_0\left(\mathbf{y}\right)}\right)\right)}\right)}}^2}\end{array} $$ ” src=”/wp-content/uploads/2016/09/A151032_1_En_24_Chapter_Equf.gif”></DIV></DIV></DIV></DIV></DIV><br /> <DIV id=Sec5 class=$

3.2 Semi-Definite Positive Gradient Descent

One now can seek the lowest potential of the cost function towards recovering the optimal solution on the tensor space. The present framework consists of a convex energy with a unique minimum which can be reached using a projected gradient descent on the space of semi-definitive positive matrices. The projection from S ³ onto S ₊³ denoted by ${\Pi}_{S_{+}^3}$ is well defined and has an explicit expression. Indeed, projecting M amounts to replacing the negative eigenvalues in its spectral decomposition by 0 [8, 12]. Note that we minimize over the set of semi-definite positive matrices because it is topologically closed, as opposed to the set of definite positive matrices. In the current setting, the problem is well posed and the projected gradient descent algorithm is convergent for a suitable choice of the time step dt. Using a weighting factor λ between the data attachment term and the regularization energy, the gradient descent can be expressed as the following equation

$\begin{array}{l}{\mathbf{D}}^{t+1}\left(\mathbf{x}\right)={\Pi}_{S_{+}^3}\left({\mathbf{D}}^t\left(\mathbf{x}\right)- dt\dfrac{\partial E}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left({\mathbf{D}}^t\right)\right)\\ {}={\Pi}_{S_{+}^3}\left({\mathbf{D}}^t\left(\mathbf{x}\right)- dt\lambda \dfrac{\partial {E}_{smooth}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left({\mathbf{D}}^t\right)- dt\dfrac{\partial {E}_{data}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left({\mathbf{D}}^t\right)\right)\end{array}$

where

$\begin{array}{l}\dfrac{\partial {E}_{s mooth}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left(\mathbf{D}\right)=2\mathbf{D}\left(\mathbf{x}\right)-2{\displaystyle {\int}_{\mathbf{y}\in {\mathcal{N}}_{\mathbf{x}}}\frac{w\left(\mathbf{x},\mathbf{y}\right)}{Z\left(\mathbf{x}\right)}}\mathbf{D}\left(\mathbf{y}\right) d\mathbf{y}\\ {}\kern4.2em \hbox{--} 2{\displaystyle {\int}_{\mathbf{y}\in {\mathcal{N}}_{\mathbf{x}}}\frac{w\left(\mathbf{x},\mathbf{y}\right)}{Z\left(\mathbf{x}\right)}}\left(\mathbf{D}\left(\mathbf{y}\right)\hbox{--} {\displaystyle {\int}_{\mathbf{z}\in {\mathcal{N}}_{\mathbf{y}}}\frac{w\left(\mathbf{x},\mathbf{y}\right)}{Z\left(\mathbf{y}\right)}}\mathbf{D}\left(\mathbf{z}\right) d\mathbf{z}\right) d\mathbf{y}\\ {}\dfrac{\partial {E}_{data}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left(\mathbf{D}\right)=2 b{\displaystyle \sum_{k=1}^N\;\left( \log \left({s}_k\left(\mathbf{x}\right)/{s}_0\left(\mathbf{x}\right)\right)+ b{\mathbf{g}}_k^t D\left(\mathbf{x}\right){\mathbf{g}}_k\right){\mathbf{G}}_k}\end{array}$

Let us define the norm ∥ ⋅ ∥ _TFover the whole tensor field D as ∥ D ∥ _TF = ∫ _Ω ∥ D(x) ∥ _Fd x. Considering two tensor fields D ₁ and D ₂, we show in the following that the gradient of our energy functional is L-Lipschitz. The constant L will allow us to choose automatically a time step that insures the convergence of the algorithm.

$$$ \begin{array}{ll}{\left\Vert \dfrac{\partial {E}_{data}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left({\mathbf{D}}_1\right)-\dfrac{\partial {E}_{data}}{\partial \mathbf{D}\left(\mathbf{x}\right)}\left({\mathbf{D}}_2\right)\right\Vert}_F&=2{b}^2{\left\Vert {\displaystyle \sum_{k=1}^N<{G}_k,}{\mathbf{D}}_1-{\mathbf{D}}_2{>}_F{\mathbf{G}}_k\right\Vert}_F\\ &\le 2{b}^2{\displaystyle \sum_{k=1}^N\parallel {\mathbf{G}}_k{\parallel}_F^2\parallel {\mathbf{D}}_1}-{\mathbf{D}}_2{\parallel}_F\end{array} $$ ” src=”/wp-content/uploads/2016/09/A151032_1_En_24_Chapter_Equi.gif”></DIV></DIV></DIV>Therefore ∥ ∇<SPAN class=EmphasisTypeItalic>E</SPAN> <SUB><SPAN class=EmphasisTypeItalic>data</SPAN> </SUB>(<SPAN class=EmphasisTypeBold>D</SPAN> <SUB>1</SUB>) − ∇<SPAN class=EmphasisTypeItalic>E</SPAN> <SUB><SPAN class=EmphasisTypeItalic>data</SPAN> </SUB>(<SPAN class=EmphasisTypeBold>D</SPAN> <SUB>2</SUB>) ∥ <SUB> <SPAN class=EmphasisTypeItalic>TF</SPAN> </SUB> ≤ 2<SPAN class=EmphasisTypeItalic>b</SPAN> <SUP>2</SUP> ∑ <SUB> <SPAN class=EmphasisTypeItalic>k</SPAN> = 1</SUB><SUP> <SPAN class=EmphasisTypeItalic>N</SPAN> </SUP> ∥ <SPAN class=EmphasisTypeBold>G</SPAN> <SUB><SPAN class=EmphasisTypeItalic>k</SPAN> </SUB> ∥ <SUB> <SPAN class=EmphasisTypeItalic>F</SPAN> </SUB><SUP>2</SUP> ∥ <SPAN class=EmphasisTypeBold>D</SPAN> <SUB>1</SUB> − <SPAN class=EmphasisTypeBold>D</SPAN> <SUB>2</SUB> ∥ <SUB> <SPAN class=EmphasisTypeItalic>TF</SPAN> </SUB>. Besides, we can easily show the following inequality<br /> <DIV id=Equj class=Equation><br /> <DIV class=EquationContent><br /> <DIV class=MediaObject><IMG alt=$

where $| \mathcal{N}_{\mathrm{x}} |$ is the number of the considered neighbors. Thus the gradient of the objective function is L-Lipschitz with $L=2{b}^2{\displaystyle {\sum}_{k=1}^N\parallel {\mathrm{G}}_k}{\parallel}_F^2+2\uplambda {\left(\left|{\mathcal{N}}_{\mathrm{x}}\right|+1\right)}^2$ . Choosing $0< dt<\frac{1}{b^2{\displaystyle {\sum}_{k=1}^N\parallel {\mathrm{G}}_k}{\parallel}_F^2+\uplambda {\left(\left|{\mathcal{N}}_{\mathrm{x}}\right|+1\right)}^2}$ makes the projected gradient descent convergent [3].

We can give an interpretation of our regularization energy in terms of diffusion-weighted images smoothing. It can be easily verified that for each direction k

$$$ \begin{array}{lll} {\displaystyle\int_\Omega }{ < \mathbf{D}(\mathbf{x}) - \int_{\mathrm{y} \in {{\mathcal N}_\mathrm{x}}} {\dfrac{{{w}(\mathbf{x},\mathbf{y})}}{{Z(\mathbf{x})}}} \mathbf{D}(\mathbf{y})d\mathbf{y},{\mathbf{G}_k}} > _F^2 d\mathbf{x} = \\ \dfrac{1}{{{b^2}}}{\displaystyle\int_\Omega {\left[ {\mathrm{log}\left( {\dfrac{{{S_k}(\mathbf{x})}}{{{S_0}(\mathbf{x})}}} \right) – \mathrm{log} \left( {\prod\limits_{\mathrm{y} \in {{\mathcal N}_\mathrm{x}}} {{{(\dfrac{{{S_k}(\mathbf{y})}}{{{S_0}(\mathbf{y})}})}^{\frac{{\textit{{w}}(\mathrm{x},\mathrm{y})}}{{Z(\mathrm{x})}}}}} } \right)} \right]} ^2}d\mathbf{x} \end{array} $$ ” src=”/wp-content/uploads/2016/09/A151032_1_En_24_Chapter_Equk.gif”></DIV></DIV></DIV>Using Cauchy-Schwartz inequality we obtain:<br /> <DIV id=Equl class=Equation><br /> <DIV class=EquationContent><br /> <DIV class=MediaObject><IMG alt=$

We can see that minimizing E _smoothhas a direct implication on the normalized diffusion weighted images $\frac{S_k}{S_0}$ . Reconstructing the tensors using a linear combination of the tensors in its neighborhood leads to the reconstruction of the normalized signals using a weighted geometric mean of the neighboring signals where the weights are not calculated only with a single volume S _kbut also with the volumes obtained from the other magnetic gradient directions.

4 SVM classification and kernels on tensors

4.1 Two-class Support Vector Machines

We briefly review the principles of two class SVMs [26]. Given N points x_iwith known class information y _i(either +1 or −1), SVM training consists in finding the optimal separating hyperplane described by the equation w ^tx + b = 0 with the maximum distance to the training examples. It amounts to solving a dual convex quadratic optimization problem and each data point x is classified using the SVM output function f(x) = (Σ_i^Nα _iy _ixx _i) + b. The algorithm is extended to achieve non linear separation using a kernel function K(x,y)(symmetric, positive definite) that is used instead of the standard inner product.

4.2 Information diffusion kernel

In order to define a kernel on the set of definite positive matrices, we can propagate class and structure information using its geometry as a Riemannian manifold [22]. Intuitively, we can see the construction of this kernel as diffusing the labels of the training set to the whole set of definite positive matrices. Therefore, similarly to heat diffusion on a Euclidean space, where the solution is given by the convolution of the initial condition by a Gaussian kernel, heat diffusion on a Riemannian manifold is driven by a kernel function K _tand given by the following asymptotic series expansion [17]:

${K}_t\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)\propto \exp \left(-\frac{d^2\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)}{4 t}\right){\displaystyle \sum_{n=0}^{\infty }{a}_n\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)}{t}^n$

where d corresponds to the geodesic distance induced by the Riemannian metric, a _nare the coefficients of the series expansion and t is the diffusion time, which is a parameter of the kernel. We use a first order approximation in t of the previous expression that yields

${K}_t\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)\propto \exp \left(-\frac{d^2\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)}{4 t}\right)$

In our case, d has an explicit expression given by $d\left({\mathbf{D}}_1,{\mathbf{D}}_2\right)=\sqrt{{\sum}_{\mathrm{i}}\left( \log \left({\lambda}_i\right)\right){}^2}$ where λ_iare the generalized eigenvalues of D ₁ and D ₂ [22].

Only gold members can continue reading. Log In or Register to continue