Morphological and Appearance Features for Predicting Physical Disability from MR Images in Multiple Sclerosis Patients

MRI scans $${I} = \{I_1,\dots ,I_{n}\}$$ where each 3D MRI scan $$I_i$$ has a corresponding real number clinical score $$y_i \in Y,$$ and a corresponding spinal cord segmentation $$S_i \in S.$$ The dimensions of $$I_i$$ and $$S_i$$ are the same. Each voxel in $$S_i$$ has a value between 0 and 1, where 0 represents the background and 1 represents the spinal cord. Voxels in $$S_i$$ that are on the boundary of the spinal cord are assigned a fuzzy value between 0 and 1 that represents an estimated percentage of the voxel that contains spinal cord (i.e., partial volume) [15].


Our objective is to create a model $$M$$, using the images $$I$$ and segmentations $$S,$$ capable of predicting the patients’ clinical scores $$Y$$ from novel MR images. We extract a set of features $$X$$ from $$I$$ and $$S$$ that are transformed by model $$M$$ into values $$\hat{Y,}$$ such that these predicted values $$\hat{Y}=M(X)$$ estimate the corresponding clinical scores $$Y.$$

One approach is to set $$M$$ as a simple linear regression model with the spinal cord volume as the single explanatory variable $$X.$$ This is similar to the existing literature where a Pearson’s correlation coefficient is computed to measure the linear dependency between the spinal cord volume and clinical score. However, as mentioned in the introduction, this linear dependency using spinal cord volume does not always reveal a strong clinical relationship. We improve on this by deriving new morphological and MRI-based appearance features $$X$$ and examining ways to combine them in more descriptive models $$M.$$



2.2 Candidate Features


We describe simple candidate morphological and appearance features $$X$$ that are potentially sensitive to spinal cord changes. This is not meant to be a comprehensive set of features, but is sufficient to explore the potential of going beyond measuring cord size to predict disability. We first define the commonly used spinal cord volume, which is computed by summing all voxels, including the partial volumes $$S_i(j)\in [0,1],$$ in the segmentation, $${ {v}}ol = \sum ^J_{j=1} S_i(j),$$ where $$J$$ is the total number of voxels in $$S_i.$$ While spinal cord volume captures a global measure of spinal cord atrophy, we are also interested in features that vary at least partly independently from area or volume, and that are sensitive to spinal cord changes at a local scale.

A323246_1_En_6_Fig1_HTML.gif


Fig. 1
Illustrations of the proposed features. a The distances (dashed line) from the center-of-mass (center box) to the boundary voxels (circles) make up $$per_k$$. b The distances to the nearest boundary point from the voxels inside the cord give $$dist_k$$ (brighter implies farther). c An ellipse is fit to the cord. d The normalized intensities of the cord are considered in $$int_k$$

Our first proposed feature is designed to be more sensitive to local changes in the spinal cord’s boundary. On each 2D axial slice of the segmentation $$S_i,$$ we find voxels on the boundary between the spinal cord and background by considering voxels in $$S_i$$ with a partial volume greater than 0.5 to be spinal cord. This results in a 2D binary image that we use to extract the cord’s boundary voxels. For the $$k$$th 2D axial slice of the spinal cord, we take the Euclidean distance between the center-of-mass $$c_k$$ of the cord’s $$k{\mathrm {th}}$$ cross section, and the spinal cord boundary/perimeter voxels $$b$$ computed as, $$per_k = ( d(c_k,b^1_{k}), \dots , d (c_k,b_{k}^{m(k)}) ),$$ where $$b_k^i$$ represents the $$i{\mathrm {th}}$$ boundary voxel on the $$k$$th slice, and $$d(c,b)$$ computes the Euclidean distance between the two coordinates (Fig. 1a). The number of boundary voxels $$m(k)$$ can change for each 2D slice. We find the minimum distance from the center-of-mass to the boundary voxels in each 2D slice averaged over $$K$$ 2D slices,


$$\begin{aligned} per_\mathrm {min} = \frac{1}{K} \sum _{k=1}^K \mathrm {min}(per_k) . \end{aligned}$$

(1)
In a similar way, to compute additional features we replace the “min” function from (1) with the mean ($$per_{\mathrm {mean}}$$), standard deviation ($$per_{\mathrm {std}}$$), and the max ($$per_{\mathrm {max}}$$) functions.

We define a related measure that focuses on local changes in 3D by calculating a 3D distance transform from the surface of the segmented spinal cord masked by (or restricted to) the interior region of the cord. To compute the distance transform, we calculate the Euclidean distance between voxels inside the spinal cord and the nearest boundary voxel in 3D. To further differentiate this feature from the $$per$$ features, we consider voxels that contain any partial volume to be spinal cord, which changes the boundary voxels. The distance transform for slice $$k$$ with $$q$$ voxels inside the cord is represented as $$dist_k = (t^1_k, \dots , t^{q(k)}_k)$$ where $$t^i_k$$ is the distance from the $$i$$th voxel inside the cord on the $$k$$th slice to the nearest 3D boundary coordinate (Fig. 1b). The number of voxels inside the cord, $$q(k),$$ can change for each 2D slice. In a similar fashion to (1), we replace $$per_k$$ with $$dist_k$$ and the “min” function with the mean ($$dist_{\mathrm {mean}}$$), max ($$dist_{\mathrm {max}}$$), standard deviation ($$dist_{\mathrm {std}}$$) and the max divided by the mean distance ($$dist^{\mathrm {max}}_{\mathrm {mean}}$$) function averaged over the $$K$$ 2D slices. For clarity we formally define,


$$\begin{aligned} dist^{\mathrm {max}}_{\mathrm {mean}} = \frac{1}{K}\sum _{k=1}^K \frac{\mathrm {max}(dist_k)}{\mathrm {mean}(dist_k)}, \end{aligned}$$

(2)
which averages the ratio of the furthest boundary distance by the mean distance.

To compute features that are more robust to local noise, such as small segmentation errors, we fit an ellipse (Fig. 1c) to each 2D cross-sectional slice of the segmented spinal cord and compute the eccentricity ($$ecc$$), minor axis ($$ax_{\mathrm {min}}$$), and major axis ($$ax_{\mathrm {maj}}$$), averaged over the length the cord.

All the features proposed so far are dependent on the geometrical characteristics of the cord, but we also include features based on the intensities found within the MRI. As the intensity values can vary widely in different MRI scans, we normalize a scan’s intensities by its overall 3D scan intensities to produce z-scores. We extract the z-scores of those voxels that are labelled as spinal cord (partial volume $$>$$” src=”/wp-content/uploads/2016/03/A323246_1_En_6_Chapter_IEq67.gif”></SPAN> 0.5) and take the mean (<SPAN id=IEq68 class=InlineEquation><IMG alt=) and standard deviation ($$int_{\mathrm {std}}$$) of the spinal cord intensity values averaged over the $$K$$ 2D slices (Fig. 1d).


2.3 Regression Models


Linear regression employs a linear function to model the relationship between the explanatory variable (e.g. spinal cord volume) and a response variable (clinical score). The parameters of this model are the coefficients $$\beta $$ of the explanatory variables and the error term $$\varepsilon .$$ These coefficients can be estimated from the data by applying a least-squares fitting that minimizes the differences between the response variable and the fitted explanatory variable. A model with only a single explanatory variable $$x_1$$, is known as simple linear regression, and is one of the simplest models to analyze. Given a dataset with $$n$$ observations, this produces a straight line, $$y_i = \beta _1 x_{i1} + \varepsilon _i, i=1,\dots ,n.$$ Multiple linear regression builds on this by adding $$r$$ explanatory variables to the model, $$y_i = \beta _1 x_{i1} + \dots + \beta _r x_{ir} + \varepsilon _i.$$

While these models assume a linearity of the underlying relations, we also explore a more flexible, non-linear, non-parametric model, known as a regression forest. A regression forest significantly differs from the previously described models as it is completely learned from the data and makes no assumptions about the underlying distributions [2].


2.4 Training and Testing the Models


The models in Sect. 2.3, are described in order of increasing complexity. With this added complexity, we increase the potential to accurately model the underlying function, but also increase the difficulty in intuitively understanding the model and increase the likelihood of over-fitting the model to the training data. To reduce the possibility of over-fitting, we divide our data into a training and testing set. Given the relatively small size of our dataset, we use leave-one-out cross-validation. This is repeated for all samples to give us an indication of the robustness and generalizability of our regression model and chosen features.


2.5 Clinical Scores


As discussed in the introduction, the EDSS and the MSFC scores, which we aim to predict from $$X,$$ are commonly used to quantify clinical disability. We choose to focus on the MSFC score rather than the EDSS score because the MSFC captures disability to which the EDSS score is relatively insensitive, such as arm/hand function. In addition, the EDSS scores tend to exhibit a poor distribution due to the non-linearity of the scale, with many patients clustered between 4.5 and 6.5 (Fig. 2a).

The MSFC score tests for: upper extremity function, determined by a 9-hole peg test (9-HPT); walking speed, measured by a timed 25-foot walk (T25W); and cognitive function, evaluated by a paced auditory serial addition test (PASAT). These three tests are shown to vary relatively independently, be sensitive to changes over time, and capture aspects of MS that are not captured in the EDSS score [3

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Mar 17, 2016 | Posted by in COMPUTERIZED TOMOGRAPHY | Comments Off on Morphological and Appearance Features for Predicting Physical Disability from MR Images in Multiple Sclerosis Patients

Full access? Get Clinical Tree

Get Clinical Tree app for offline access