IVIM in the Body: A General Overview

Matthew R. Orton,a Neil P. Jerome,b Mihaela Rata,a and Dow-Mu Kohc

^aCancer Research UK Cancer Imaging Centre, Division of Radiotherapy & Imaging, The Institute of Cancer Research, London, SM2 5NG, UK

^bDepartment of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), Trondheim, Norway Clinic of Radiology and Nuclear Medicine, St. Olavs Hospital, Trondheim, Norway

^cDepartment of Radiology, Royal Marsden Hospital, Sutton, Surrey, UK

dowmukoh@icr.ac.uk

6.1 Introduction

Diffusion-weighted imaging (DWI) performed using multiple b values is attractive as a technique to derive functional information from tissues on the basis of the principles of intravoxel incoherent motion (IVIM). Although the technique has been applied for some time in the brain, it is only in the past decade or so that has seen its wider investigation and application in the body. In this chapter, we review the technical considerations for performing IVIM DWI in the body, discuss measurement repeatability for IVIM-derived parameters, and survey the potential clinical applications of the technique.

6.2 Technical Considerations

6.2.1 Selection of b Values

Modern scanners have had the capability of delivering practical IVIM acquisitions for around 20 years [1], but despite this, the consensus on the choice of b values for different organs, pathologies, and applications is still being refined. The persistent difficulty of reliably estimating the pseudodiffusion coefficient seems to underlie this lack of consensus and is due to an incomplete understanding of the IVIM process in biological tissues.

Some researchers argue that since estimates of the pseudo- diffusion coefficient are not sufficiently accurate for routine use, the acquisition should not even attempt to measure it and in so doing the selection of b values is greatly simplified [2–5]. To obtain estimates of the pseudodiffusion volume fraction and tissue diffusion coefficient (and the amplitude scaling), a minimum of just three b values is required, and these would normally include (i) b = 0 s/mm², (ii) an intermediate b value that is just large enough to ensure the signal attenuation of the pseudodiffusion compartment is complete (e.g., b ≈ 200–400 s/mm² [6]), and (iii) a b value large enough to capture the tissue attenuation while being above the rectified noise floor (e.g., b > 500 s/mm²). When estimates of the pseudodiffusion coefficient are also required, at least one more b value is required, and this will typically be less than 100 s/mm², depending on the tissue of interest. Having selected three or four b values, the number of averages can then be chosen to make a direct trade-off between acquisition time and parameter accuracy, although it may be advantageous to use a larger number of averages at higher b values where the signal-to-noise ratio (SNR) is lower. This minimal approach is of utility where the range of IVIM parameters for the tissues or pathologies of interest has already been established, for example, to differentiate between two (or more) diagnoses using IVIM, which may be further confirmed by validation studies using other techniques.

Using a minimal number of b values is less applicable in exploratory studies where the IVIM parameter ranges are not known a priori and also where the presence or magnitude of any confounding factors is unknown. With a minimal number of b values, the estimation accuracy will deteriorate rapidly as the IVIM parameters move away from the region for which the b values are optimal. As the number of b values increases, the range of IVIM parameter values over which acceptable estimation accuracy will be obtained is increased, which is clearly desirable in exploratory studies. Six or more b values that depend on the organ of interest are typically appropriate [7–10], with some studies employing up to 32 b values [11]. The work of Lemke et al. [12] used Monte Carlo simulations of three prototype tissues to propose an ordered list of 16 b values that can be truncated to give close-to-optimal b-value protocols. The truncation point determines the trade-off between acquisition time and parameter estimation accuracy, and the authors recommend that 10 or more b values be used in practice. In another study looking at the choice of low b values for measuring IVIM in liver, it was concluded that at least two b values below 50 s/mm² are required for a reliable estimation of the pseudodiffusion coefficient [13].

Another advantage of using more than the minimum number of b values is that voxel-wise information about noise levels is preserved in the image data, and this can be further improved by retaining any repeat measurements as separate images rather than returning averaged images. This is particularly important for techniques that require or benefit from the parameter uncertainty in each voxel to be measurable, such as hierarchical Bayesian modeling [14] and Markov random field modeling [15]. In both these cases the overall parameter inference is governed by a loss function that is the weighted sum of a data consistency term and a regularization term, where the two weights relate to the measurement noise standard deviation and some statistical property of the regularization (determined by the choice of regularization). Obtaining a sufficiently large number of b values and/or retaining repeat measurements as separate images implies that the measurement noise (and therefore the data consistency weight) can be inferred from the data, which is required for the hierarchical Bayesian modeling approach to work. In this case, the regularization weight will be the covariance of the IVIM parameters across the ROI, and this can also be inferred from the data. Inferring the data consistency weight from the data is not essential for the Markov random field modeling, but if it is possible, the regularization weight is then the expected difference in the IVIM parameter values between neighboring voxels, and this property is useful in selecting an appropriate regularization weight.

The ability to flexibly specify b values on most magnetic resonance imaging (MRI) platforms has improved over the years, which is an important step toward wider clinical use of such techniques. Specifying low b values was historically a significant limitation on many platforms, where the lowest b value available was often insufficiently low to probe the IVIM effect and the b value selection was quantized to a very limited number of b values below 100 s/mm². It should be borne in mind that the presence of imaging gradients implies that the effective b value is never actually zero and for the very low b values that are used in more recent studies it may be necessary to correct for this effect in the future [16].

Most diffusion-weighted MR sequences employ trapezoidal diffusion-sensitizing gradient lobes placed on either side of a 180° refocusing pulse. Although their net effect is summarized via the Stejskal–Tanner equation as the b value, there is increasing evidence that the duration and spacing of the gradient lobes have a separate effect on the attenuation observed in IVIM systems [17]. While many studies do not aim to examine the effect of the diffusion timing, in order to meaningfully interpret differences between published studies it is nevertheless crucial that the gradient timing parameters be also disclosed in published works [18].

More advanced diffusion gradient profiles are also feasible, and their application to IVIM measurement is dealt with in more detail in other chapters in this book. Bipolar gradients can be used instead of the more standard monopolar gradients, and these have reduced eddy current artifacts, but at the expense of SNR due to the need for longer echo times. In organs such as the liver that have short T₂ values, the benefits have been shown to outweigh the SNR loss for measuring diffuse liver disease with IVIM [19]. Further studies on liver and pancreas comparing IVIM parameters obtained with monopolar, bipolar, and flow-compensated acquisitions have shed light on the nature of the IVIM phenomenon in these organs, including the velocity and length scales that can be probed with these measurements [17].

6.2.2 Motion Control

MRI outside the brain is challenging due to motion arising from breathing, cardiac activity, and peristalsis, and these confounders have a range of effects that are particular to DWI. In the context of IVIM imaging, these effects can be especially deleterious because the perfusion-related signal can be hard to detect—it may be small in amplitude or be present for a small range of low b values or both.

In the context of IVIM (but also for the apparent diffusion coefficient [ADC] measurement), images with multiple contrasts must be acquired, which leads to the possibility of image misregistration. This is particularly problematic for IVIM measurements where a large number of b values are required and acquisition duration extends to several minutes. Another important issue is that bulk tissue motion during the application of paired diffusion gradient lobes causes unrecoverable dephasing of the signal. This leads to a signal loss that is difficult to distinguish from attenuation arising from the tissue microstructure and is more pronounced with larger diffusion probing gradients. Although the IVIM effect is inferred from low-b-value images that are less sensitive to this effect, the necessity of also acquiring images with high b values implies any bias will be propagated into the IVIM parameter estimates when signal dropouts occur at high b values. Since it is not feasible to correct images affected in this way, one must either detect and remove such images prior to IVIM model fitting, as is done with DTI [20, 21], or incorporate the signal dropout process into the noise model, as was achieved by Murphy et al. [22] using a beta log-normal error model. Most MR platforms are now available with DWI sequences that use triggered acquisition techniques to initiate repetitive image capture segments, with the aim of acquiring data when the relevant anatomy of the subject is static and in the same position. In principle suchtechniques will minimize not onlyimage misregistration but also signal dropouts due to bulk motion. However, the use of triggering means that not all of the scanning time is used in collecting data and, in the context of IVIM, the proportionate increase in scan times can be prohibitive. In addition, the scan time is not known a priori and can vary widely between subjects, which would be particularly problematic in a clinical context. Free-breathing acquisitions have the advantage that the scan time is fully utilized but further post processing may be necessary to handle motion and signal dropouts. These competing issues mean that the choice between free-breathing and respiratory triggered techniques is not straightforward and the context of any proposed study (exploratory/clinical, volunteers/ patients, etc.) introduces extra factors that must be considered when making such a choice.

There is a useful body of recent literature comparing the performance of free-breathing and respiratory-triggered IVIM acquisitions [19,23,24] and also exploring the use of cardiac triggering to limit the localized effect of cardiac motion on the left lobe of the liver [25]. While these publications do not collectively reach a clear consensus, they are a useful resource for designing new studies. Developments in simultaneous slice (SS) acquisition techniques that give speedup factors of around 3 show great promise for IVIM measurement [26]. A recent comparison of free-breathing and respiratory triggered methods with SS for ADC measurement found no advantage with the use of triggering [26, 27], although a similar comparison has yet to be performed for IVIM measurements. It is also worth noting that existing studies comparing motion control techniques have used either subjective image quality assessments or quantitative metrics (including reproducibility) based on summary statistics such as the mean or median on which to base their comparisons. However, there is increasing interest in the potential of higher-order univariate statistics (standard deviation, skewness, kurtosis, etc.) and spatial statistics (measures of image texture) for radiomic analysis of images [28–30]. Such analyses will have more stringent requirements for IVIM measurement techniques in relation to motion control, compared to analysis using the mean or median, and so evaluations of motion control techniques may need to be repeated where radiomic analyses are proposed.

Whether a free-breathing or triggered acquisition is used, in practice there will always be some degree of motion between individual image frames and this can only be properly accounted for with image registration techniques. Having said that, when multiple averages are acquired a much simpler technique that nevertheless has good performance is to combine the images at a given b value using the median (or trimmed mean) over the repeated measurements instead of the mean [31]. While this method does not explicitly account for motion or signal dropouts, it is nevertheless robust to both effects and can be used with free-breathing or respiratory triggered acquisitions.

Directly accounting for subject motion with image registration is challenging for IVIM measurements for a number of reasons. DWI uses 2D acquisitions, where the in-plane voxel dimensions are typically smaller than the slice thickness. Handling through-plane motion is therefore problematic; for this reason many studies in the abdomen have used coronal imaging so that the dominant motion direction due to breathing (superior-inferior) is in the imaging plane. This is particularly beneficial with free-breathing techniques because the amount of motion is sufficiently large for substantial improvements to be obtained with simple 2D image registration. When regional analysis is to be used it may be sufficient to apply local affine transformations, but nonrigid transformations may be necessary where whole organ or global analysis is used. Since real subject motion is a 3D process one may expect that nonrigid 3D registration techniques would show the most promise, but this makes the assumption that the IVIM measurement yields coherent 3D data volumes. Figure 6.1 shows an example of a number of image volumes acquired coronally under free breathing. Figure 6.1a is a case where the subject motion happened to be very small while all 16 slices were acquired, whereas Figs. 6.1b and 6.1c (at two different b values) show examples where there is significant motion during the acquisition. These images reveal the interleaved scheme used by many DWI sequences, where odd-indexed slices followed by even-indexed slices are measured sequentially in order to minimize crosstalk between adjacent slices due to imperfect slice profiles. These figures clearly demonstrate why simple 3D registration of DW images is highly challenging under free breathing, and even when noninterleaved slice acquisition schemes are used, the same effect is still present but may not be as apparent as in the figure. Registration techniques for interleaved acquisition schemes have been developed [32] but have yet to be applied to IVIM measurements.

Figure 6.1 An example showing a number of image volumes acquired coronally under free breathing. Panel (a) is a case where the subject motion happened to be very small while all 16 slices were acquired, whereas panels (b) and (c) (at two different b values) show examples where there is significant motion during the acquisition.

The difference in image contrast between the lowest and highest b values, and in addition the low SNR for high b values, is another challenge when registering IVIM data. Techniques involving the sequential acquisition of high- and low-b-value images have been proposed [33], where deformations are determined from the low-b– value (high SNR) images, which are then applied to the adjacent high-b-value images on the assumption that there is negligible motion between the adjacent frames. Combining IVIM signal modeling with image registration is also feasible, [34] and this relies on using the IVIM model itself to generate target images for the registration so that they should automatically have the appropriate contrast for the given b value.

6.2.3 Measurement Repeatability

The utility of quantitative biomarkers in clinical decision making, such as those derived from DWI, is tied not only to their sensitivity and specificity to underlying physiology and microstructure but also to the stability of the measurement itself. The ability to meaningfully measure or detect changes in the pseudodiffusion and tissue diffusion components of the IVIM signal therefore relies not only on the properties of the system being observed but also on the performance of the MR scanner hardware and acquisition strategies being used. Significant advancements in technology since the first description of the IVIM experiment have facilitated the clinical use of the IVIM acquisition extracranially, demonstrating the phenomenon clearly for healthy and diseased abdominal organs [35, 36] as well as neoplasms [37, 38].

DWI is a challenging MR modality; it is notable, though not unique, among MR techniques in that contrast is introduced by actively destroying signal, specifically through diffusion-sensitizing gradients. When quantitative diffusion measurements are required, one must therefore be mindful of the available SNR and how it may influence signal modeling. Well-documented challenges of DWI are the dependence of minimum echo times on the maximum b value, distortion in echo-planar imaging (EPI) readouts from susceptibility boundaries, and artifacts arising from both cardiac and respiratory motion for extracranial targets.

These challenges, notwithstanding the IVIM model (and corresponding acquisition strategy) have received wide attention and application in literature. Articles that employ a repeatability assessment for IVIM are scarcer, however, and it is necessary for the radiologist to balance the promise of the IVIM acquisition with the additional confounding factors that accompany it. In particular, the mathematical fitting of a biexponential model is not trivial, and the specific choices of b values and fitting algorithm that are fertile areas for discussion and disagreement [8,13,39–43] are less critical for the single-exponential ADC model. Such technical considerations and their implications have been discussed in the previous section. Here we consider experimental methods for estimating the errors that affect real-world IVIM measurements, in particular, the use of repeatability studies, as recommended by the quantitative imaging biomarkers alliance [44].

There are alternative ways of quantifying repeatability, and without a standardized approach it is not always possible to directly compare across studies. Many studies report the repeat-measures coefficient of variation (CoV, expressed as a percentage), which quantifies the error expected in the parameter value estimate [45]. Another commonly reported measure is the limit of agreement (LoA), described by Bland and Altman, derived from the variance of the difference of repeated measures, such that 95% of observed discrepancies between measures are expected to be within those limits from the average value [46]. In both cases, smaller numbers represent improved repeatability, which implies greater stability and thus usability.

Table 6.1 summarizes some recent studies (2008–2018) that have assessed repeatability of IVIM measurements by performing the same DWI measurement twice on the same object, with a delay between 1 h and 7 days [8,25,37,38,47–56]. While it is more common to perform repeatability studies on volunteers, a number of articles report repeatability for pathological as well as normal tissue from patients, which provides a more clinically relevant assessment. Many of these studies also report the repeatability of the ADC derived from the same data, giving a comparison to a simpler, well- established biomarker that is known to be robust [57].

Table 6.1 Results from literature discussing extracranial IVIM repeatability, including studies reporting coefficients of variation (CoVs) and Bland–Altman 95% limits of agreement (LoA)

Across the studies analyzed, comparison of the within-study repeatability measures shows that the IVIM tissue diffusion coefficient D commonly has a higher CoV than the corresponding ADC, although it is likely that this arises from the reduced data used in the calculation, and in general D shows good repeatability, with the CoV of D above 20% in only a few studies (Fig 6.2a). In contrast, the CoVs observed for the pseudodiffusion parameters f and D* are consistently higher, with the former commonly reaching higher than 20% and the latter having a substantially larger range (Fig 6.2b). The same observation is reflected in the LoA data where, while it is conceivable that these measures could differentiate pathologic conditions or be changed under intervention to this degree, such a degree of variation intrinsic to a measurement is at least an indication of the need for cautious interpretation. It is notable that the studies reporting the lowest variation in repeated measurement are intracranial, where the pseudodiffusion component contributes less to the overall signal than in the body (Fig 6.3).

Figure 6.2 Comparisons from studies reporting coefficients of variation (CoVs) for ADC and IVIM parameters showing that (a) ADC and IVIM D are similar in showing acceptable CoV whereas (b) the pseudodiffusion parameters f and D* show a substantially higher CoV.

Figure 6.3 Results from studies that report Bland–Altman limits of agreement (LoA, 95%) for (a) diffusion parameters ADC and IVIM D and (b) pseudodiffusion parameters f and D* provided by IVIM repeatability studies. Note the extended axes in plot (b). Studies reporting only the CoV are shown as blank entries.

Sources of variation in derived IVIM parameters, aside from noise associated with the scanner hardware (e.g., electrical noise and field inhomogeneity), and the scan procedure (e.g., coil and patient positioning), arise from the diffusion sequence itself. It should be remembered that, exactly like the ADC, IVIM parameters are empirical and that the assumptions of the model are not necessarily or consistently valid, for example, the incoherence of capillary direction, the lack ofinfluencefrom larger vessels that may contribute a pulsatile or coherent flow component, and a lack of spin exchange between the two compartments. Lemke et al. demonstrated that a difference in the T₂ between the two compartments considered in the IVIM model leads to a bias in the estimation of f related to the choice of echo time (TE) [58]. A useful extension of the model to include the estimation of T₂ values by Jerome et al. mitigates this effect at the expense of some additional acquisitions [59]. This work also demonstrates that the bias in f in healthy liver can be over 100%. Further, it has been shown that the diffusion time (i.e., the spacing of the diffusion-weighting gradients) also influences estimates of the pseudodiffusion fraction f [60].

Reproducibility, by contrast, is more difficult to infer from these studies, since using different MR hardware may necessarily incorporate differences in protocol. In this sense, the simplicity of the ADC model, both in practical application as well as sensitivity, will remain a challenge for the widespread acceptance of IVIM [61]. While the development of a robust IVIM phantom [62] may assist in the practicalities of conducting reproducibility measurements, the standardization of the IVIM acquisition, including but not limited to repetition time, echo time, diffusion b values, and diffusion times, remains a significant target and a critical step in maximizing the utility of IVIM-derived parameters.

6.3 Overview of the Clinical Applications of IVIM in the Body

The notion of IVIM is highly attractive as a clinical tool for disease evaluation in the body. As IVIM measurements can provide potential information that reflects capillary perfusion, tubular flow, and glandular secretion, these measurements have been applied to detect disease, characterize lesions, evaluate organ function, and assess treatment response of diseases.

One of the perceived advantages of IVIM measurement is that the technique does not require the administration of an exogenous contrast medium to obtain information that reflects tissue perfusion. This is attractive given the recent concerns of gadolinium contrast deposition in the brain. Hence, the avoidance of or reduction in contrast use, particularly in the pediatric population or where repeated contrast dosing is anticipated, would be welcomed. The potential disadvantage of IVIM measurement is the longer MR examination time (required to acquire images of a sufficiently high SNR, images of multiple b values, or the use of techniques to overcome respiratory and/or cardiac motion where appropriate) compared with a DWI examination to derive the ADC using two or three b values. Furthermore, as previously discussed, the perfusion- sensitive measurements of perfusion fraction and pseudodiffusion coefficient are less robust with poorer measurement repeatability and reproducibility compared to the true diffusion coefficient or the ADC [63].

In the following section, we present some current clinical and research applications of body IVIM measurements for disease assessment. However, for an in-depth discussion of specific areas, the reader should refer to the individual chapters.

6.3.1 Disease Detection

The use of high-b-value DWI and ADC maps has improved the detection of cellular disease across the body [64]. However, in some studies, the use of IVIM DWI has also improved the detection of prostate [65, 66] and pancreatic [67, 68] cancers (Fig 6.4) in patients. In nasopharyngeal cancers, one study showed that IVIM parameters are useful for discriminating disease but not conventional ADC measurements [69]. The technique has also been applied to detect diffuse disease, such as liver cirrhosis [70–73], but the IVIM parameters appear to lack the sensitivity for early disease detection [74].