24.1 Introduction
Computer-aided diagnosis (CAD) in medical imaging is generally defined as a diagnosis made by a radiologist who uses the output from a computerized analysis of medical images as “another opinion” in detecting abnormalities, assessing extent of disease, characterizing lesions, and making diagnostic decisions. The major goal of CAD is to help the radiologist improve diagnostic accuracy, efficiency, and consistency in medical image interpretation tasks where perception plays a substantial role. In this chapter, we discuss CAD from an image perception perspective with emphasis on impacts that CAD and CAD output can have on image perception and how CAD can be best utilized.
The benefit of a medical imaging exam depends on both the physical quality of the medical images and the ability of the radiologist interpreting them. Studies indicate that radiologists do not detect all abnormalities on images that are visible on retrospective review, and they do not always correctly characterize abnormalities that are found (Renfrew et al., 1992). Causes of detection and interpretation errors in the clinical interpretation of medical images include limitations in the human eye–brain visual system, reader fatigue and distraction, the presence of overlapping structures that camouflage disease in images, and the vast number of normal cases seen in screening programs, to name a few.
With the goal of reducing image perception errors and improving image interpretation accuracy, CAD has seen tremendous growth over the past 20 years. CAD has evolved rapidly from the early days of time-consuming film digitization and computations on a limited number of cases to its current uses in a broadening range of medical imaging applications and clinical workstations. Basic CAD research involves a variety of activities – collecting relevant normal and abnormal images of clinical patients and the associated pathological exam results serving as ground truth for diagnosis; developing computer algorithms appropriate to the medical interpretation task; validating the algorithms using appropriate cases (distribution and sizes) and study designs to measure performance and robustness; evaluating radiologists’ performance in the relevant diagnostic task without and with the use of the computer aid; and then ultimately assessing performance with a clinical trial. At present, CAD research includes analysis of images of a number of disease types – breast cancer, lung cancer, colon cancer, interstitial disease, osteoporosis, osteolysis, vascular plaque, aneurysms, and others – and from various modalities, including analogue and digital X-ray radiography, ultrasonography, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and others.
In this chapter, we first present a broad overview of CAD techniques and methodology. Then we discuss factors that affect radiologists’ performance in image perception and interpretation, and how CAD can impact radiologists’ perception and performance. Finally, we discuss how CAD can be best utilized from an image perception perspective.
24.2 CAD: An Overview of Techniques
CAD can broadly be categorized into two types – computer-aided detection (CADe) and computer-aided diagnosis (CADx). The word “diagnosis” is used in a broad sense in “CAD” – including both disease detection and diagnostic decision making – and is used in a narrow sense in “CADx,” referring specifically to the diagnostic characterization and clinical decision making of a lesion already detected. Nonetheless, this categorization is useful from both a clinical point of view and a CAD developer’s point of view: “detection” and “diagnosis” are two clinical tasks; the design and development of CADe and CADx systems are different, although some image analysis and pattern recognition techniques are commonly used in both types of systems.
In this section, we give a brief introduction to the computerized techniques for the two types of CAD systems, i.e., we aim to shed some light on the “computer part” or the commonly called “CAD black box” and defer our discussions on the “computer aid” part or the “human–computer interactions” to the next section. While computerized techniques are of interest to CAD developers, they also are important to CAD users: if the user knows what the computer is calculating, he/she will be better able to understand the output, especially if the computer indicates a region as suspicious that appears to the user to be normal.
24.2.1 CADe as an Aid for Detection of Disease
CADe implies the use of an output from a computerized image analysis as an aid in the localization of an image region that is suspected of being abnormal. CADe is basically a detection or localization task, i.e., the computer analyzes medical images to find and output locations of suspect regions to “alert” the radiologist, leaving the final localization of lesions and patient management to the radiologist (Figure 24.1a). CADe systems are most useful in imaging examinations in which most cases are normal, such as in screening programs, examples of which include screening mammography, low-dose thoracic CT for smokers, and colon cancer screening.
Figure 24.1 Schematic diagrams illustrating (a) the use and (b) the components of a computer-aided detection (CADe) system.
Figure 24.2a shows the first prototype CADe system, which was developed for screening mammography at the University of Chicago. The system took as input a screen/film mammogram, which was subsequently digitized and automatically passed to the computer for analysis. The output annotation from the system would indicate suspect locations on a thermal paper printout or monitor, as demonstrated in Figure 24.2b for a mammogram with a mass lesion. While the output may be quite simple, complex mathematical calculations are performed within the CADe “black box.”
Figure 24.2 (a) The first prototype computer-aided detection (CADe) system, which was developed for screening mammography at the University of Chicago, in which output annotation from the system would indicate suspect locations on a thermal paper printout or monitor, as demonstrated in (b) for a mammogram with a mass lesion.
Computerized techniques for CADe include the detection of abnormalities that may be signs of potential cancer. This involves several steps of image analyses and computations. Figure 24.1b schematically illustrates the components of a CADe system. Once image data are input to the computer, by either film digitization or directly from an image acquisition device, various image processing and analysis steps are performed. Depending upon the quality of the input images, the initial step may involve image recovery techniques for preprocessing, such as noise reduction and/or artifact correction. Following the preprocessing, image segmentation techniques are applied to delineate the organ of interest (e.g., the breast in mammography CADe) from the image background and other body parts in the image, allowing for subsequent analyses and computations to focus only on the organ of interest. The “initial detection” step involves using lesion enhancement techniques to enhance the signal of the lesion relative to the camouflaging normal anatomy background, and this step results in a number of candidate lesions.
Next, numerical features are derived from the image data of candidate lesions, and these features are mathematical descriptors of characteristic features that potentially help distinguish whether a candidate lesion is truly a lesion or normal tissue (Giger et al., 2000). Feature extraction involves computer vision techniques, clinical observations, and/or other prior domain knowledge. These numerical features are “interpreted” by the computer using pattern recognition techniques such as discriminant analysis, rule-based methods, and/or artificial neural networks to merge the extracted features into a decision on the likelihood that the region being analyzed is a potential lesion (Sonka and Fitzpatrick, 2000). Such a classifier is trained, using a data set of images from normal subjects as well as patients with histologically verified lesions, to distinguish between actual lesions and normal regions. In these later stages, the processing methods are mainly to reduce the number of false-positive (FP) candidates, i.e., to improve specificity.
24.2.2 CADx as an Aid for Differential Diagnosis
Once a lesion is detected, the radiologist must extract its characteristic features and then merge these into a decision on patient management, e.g., biopsy or no biopsy. CADx is a classification or differential diagnosis task, i.e., the computer analyzes a suspicious region or lesion to calculate and output mathematical descriptors (tumor features) and/or estimated probability of malignancy that are potentially helpful to classify a suspicious lesion as benign or malignant, also leaving the final diagnosis and patient management to the physician (Figure 24.3a). Thus, the role of a CADx system is to characterize an already-found lesion or other abnormality in terms of its morphological or functional features, and to potentially estimate its probability of malignancy or other disease. Such a system would aid a radiologist in the differential diagnosis of malignancy (or disease state) and recommendation on patient management.
Figure 24.3 Schematic diagrams illustrating (a) the use and (b) the components of a computer-aided diagnosis (CADx) system.
Computerized techniques for CADx include the classification of detected lesions as subgroups (e.g., benign and malignant) that warrant different clinical actions (e.g., biopsy or no biopsy, different therapeutic interventions). This involves several steps of image analyses and computations. Figure 24.3b schematically illustrates the components of a CADx system. Unlike CADe, which takes the whole image as input, CADx takes as input either a radiologist-detected or a computer-detected lesion. Note that, for such a system, manual indication of the lesion location may be required, since with CADx, this is a classification task, not a detection task. Typical input may include a seed point or a region of interest indicating the location of the lesion. Preprocessing of the image data may be performed to enhance the contrast of the lesion relative to the camouflaging normal anatomy background.
Next, the lesion is segmented from the surrounding background (Chen et al., 2006a; Huo et al., 1995; Kupinski and Giger, 1998; Yuan et al., 2007). This is generally regarded as a critical step in computerized lesion characterization because analysis of a poorly segmented lesion would result in an erroneous characterization of the lesion features and an incorrect assessment of disease severity (e.g., malignancy). Following lesion segmentation, numerical features are derived from the segmented lesions and these features are mathematical descriptors of characteristic features that potentially help classify a lesion into different subgroups (e.g., benign or malignant). Like in CADe, feature extraction in CADx is often motivated or guided by clinical observations and/or other prior domain knowledge, which are represented by mathematical descriptors derived with computer vision techniques.
Finally, these numerical features are “interpreted” by the computer using pattern recognition techniques such as discriminant analysis, rule-based methods, and/or artificial neural networks to combine the extracted features into a decision on the likelihood that the lesion being analyzed is, for example, benign or malignant.
Unlike CADe, the output of which is typically a marker (e.g., arrow, circle) indicating the location of a suspicious lesion, the output from a CADx system can contain diverse information. For example, it can be presented in terms of a numerical estimate of the probability of malignancy, a retrieval of similar images from an online database, a graphical presentation that relates the lesion in question to the computer analyses of cases in a known database with certain cancer prevalence, and/or other useful information from the computer analyses that is potentially helpful to the radiologist for diagnostic decision making. An example of such an interface appears in Figure 24.4, which displays the computer outputs for both mammography and sonography CADx.
Figure 24.4 Human/computer interface for a computer-aided diagnosis (CADx) system in which the output is presented in terms of numerical values related to the likelihood of malignancy, through a display of similar images of known diagnoses, and/or with a graphical representation of the unknown lesion relative to all lesions in an online database atlas (Giger et al., 2000, 2003; Horsch et al., 2006). The display uses color coding to indicate whether the similar images are malignant or benign.
24.2.3 Assessment of the Standalone Performance of CAD Systems
Assessment of the ultimate benefit of CAD requires evaluation of the CAD system with observer studies, i.e., studies that involve a comparison of the detection/diagnosis performance of radiologists without the computer aid and with the aid in a clinical realm or a setting that mimics the clinical arena. This is a major topic of the next section in which the impact of CAD on image perception is presented.
However, to demonstrate the effectiveness of a CAD system, it is also essential to appropriately assess the standalone performance of the computer algorithms, i.e., the detection performance or diagnosis performance of the computer without a physician. Standalone performance assessment is useful in comparing the current system with an existing one upon which the current system claims to improve. Additionally, standalone performance assessment is needed to demonstrate an adequate level of performance prior to recruiting radiologists for observer studies, which are generally more time consuming and expensive. General principles and considerations in the assessment of standalone computer performance are presented, noting that these aspects also apply to the evaluation of radiologist performance without and with the computer aid.
There are three basic elements in performance assessment of CAD systems: (1) gold standard or truth; (2) performance figure of merit; (3) study designs to appropriately collect and utilize a data set to estimate the performance and its uncertainty.
The gold standard or truth in CADe is the true location of abnormalities which must be determined by well-established processes such as clinical follow-up exams and/or biopsy to determine the presence and location of a lesion, or a consensus panel of experts for determining the extent of the lesion. The truth in CADx is typically obtained from the histopathological analysis of biopsied tissue samples or evaluations from a panel of radiology experts. This truth information is usually summarized as binary clinical endpoints, e.g., normal or abnormal, benign or malignant.
Performance of a CADe system can be given in terms of sensitivity and FP detections per image. Sensitivity is calculated as the number of lesions detected divided by the total number of real lesions in the data set. The number of FP detections is the average number of FPs per image over the data set. A free-response receiver operating characteristic (FROC) curve shows the sensitivity as a function of mean FPs/image. Such a curve can be obtained for a data set by varying one of the algorithm’s parameters.
Various performance indices exist for use in the evaluation of computerized methods, such as receiver operating characteristic (ROC) (Metz, 1978, 1986, 2000; Wagner et al., 2007) and FROC analyses (Bunch et al., 1977; Chakraborty, 2000; Samuelson and Petrick, 2006). Performance of a CADx system can be given in terms of sensitivity and specificity or ROC curve which, by convention, plots sensitivity versus (1 – specificity) at varying cut-off thresholds. The area under the ROC curve (AUC) is a standard summary performance figure of merit (Wagner et al., 2007). A partial area index from an ROC curve is useful when evaluating a system that requires a high level of sensitivity, such as in the task of determining the likelihood of malignancy of lesions seen on mammograms (Jiang et al., 1996).
In the assessment of CAD systems, it is critically important to apply appropriate study designs to collect and utilize a data set to estimate the performance and its uncertainty. It is essential to collect a sample data set that is representative of the general population to which the CAD system is designed to apply. Study designs for evaluation of clinical benefits of CADe are discussed in detail in Chapter 25. Here we emphasize some general considerations in evaluation of standalone performance of CAD algorithms with typically a finite data set of images.
One consideration involves the proper use of a finite data set for the assessment task. There are multiple tasks that need to be assessed in developing a CAD system, e.g., feature selection, classifier model selection, training, and testing the model. Ideally, each task needs to be assessed by an independent data set. When only a limited data set is available, however, one may use resampling techniques such as cross-validation or bootstrapping. It is critically important to be clear about which task is being assessed when utilizing a data set and be cautious of the bias of the results introduced by reusing a data set to assess multiple tasks. For example, it is well known that training and testing a classifier with the same data set would introduce large optimistic bias to the performance results. Also, if a data set is initially used for feature selection, and then partitioned to a training set and a testing set (or if cross-validation is used to train/test the model) to assess the performance, the results would also be optimistically biased (Sahiner et al., 2000). Another less noticeable example is when a data set is used in the examination of multiple models and/or parameter tunings; then the performance of the best model, as assessed with the very same data set, is subject to model selection bias. In other words, if a data set has been used for the model selection task, another data set is needed to assess the performance of the selected model.
Another consideration is related to the uncertainty estimation problem. Appropriate assessment of the variability of the standalone performance of CAD is important as the performance is usually estimated from a data set of limited size. Variability estimation is particularly useful in, for example, calculating confidence intervals on the estimated performance, model selection, and model comparison. In the conventional approach, the variability of the estimated performance is ascribed only to the random choice of testing cases. One argument for this approach is the claim that the developers have frozen their training algorithm – so that it is a fixed effect, contributing no randomness to the analysis. However, a classifier is rarely frozen until a sufficiently large training sample is employed in the training. This means that the random choice of a training sample also contributes to the variability of the estimated performance and hence the training variability has to be assessed.
Assessment of training variability can demonstrate the stability of a CAD algorithm – a more stable algorithm would be less sensitive to the varying training sample, i.e., would have smaller training variability. Methods are emerging in the literature for the assessment of classifiers in terms of uncertainty due to both finite size of trainers and finite size of testers, e.g., a bootstrap-based approach when only one data set is available (Yousef et al., 2005) or a U-statistics-based approach when two data sets are available for training and testing the classifier (Yousef et al., 2006).
Additional random effects are also present in these evaluation situations. For example, for some CAD applications, the truth status is determined by an expert panel for which the variability should also be accounted. In CADe, additional random effects include the number and location of multiple CAD cues and/or multiple lesions in images. There will be further uncertainty associated with specifying a test cut-off value corresponding to a desired level of performance, e.g., sensitivity or true-positive rate, and estimating the corresponding number of FP cues per image (Bornefalk and Hermansson, 2005).
24.3 CAD and Image Perception
We have so far discussed the development and assessment of computerized techniques of a CAD system. Since the computer output only serves as “another opinion” to the radiologist and since the final diagnosis is made by the radiologist, the human observer (i.e., the radiologist) is inevitably an integral part of a CAD system. CAD is motivated by limitations of human observers in interpreting medical images and is designed in the hope of overcoming these limitations, and thus, as mentioned earlier, the ultimate benefit of CAD requires evaluation of the CAD system with observer studies, i.e., compare the image interpretation performances of the radiologist without the computer aid and with the aid. Many types of CAD, especially those related to detection, are designed to improve the detection/diagnosis accuracy of disease in medical image interpretation where perception plays a significant role, and hence one of the major goals of CAD is to enhance the perception of abnormalities. Although the exact mechanism of how CAD output may impact human perception of medical images is not completely understood, a large variety of studies, in laboratory settings or in clinical trials, have been conducted to examine the effect of CAD on image perception and interpretation in terms of the overall diagnostic performance, and are reviewed in this section.
24.3.1 Limitations of Human Observers and the Promise of Computer Analyses
Misses of lesions on radiographic images may be because of the presence of noise, i.e., quantum mottle or overlapping normal structures, or because of the radiologist’s inadequate search patterns or lapses in perception (Kundel, 1975). It has been reported that at least half of the errors made in clinical image interpretation practice are perceptual (Bird et al., 1992; Renfrew et al., 1992). Factors affecting radiologists’ performance levels may be summarized as the following:
1. Physical quality of the image such as poor spatial resolution, noise, low contrast, and image artifacts. This represents one of the most fundamental limitations of the medical image itself.
2. Attributes of the disease state such as lesion size, lesion contrast, and complexity of the anatomical background (“structure noise”).
3. Amount of data: medical imaging is evolving from traditional two-dimensional (2D) imaging to 3D and 4D (functional imaging) with multiple views. The amount of information presented to the radiologist can be overwhelming.
4. Interpretation conditions such as presence of distractions, fatigue, mental distractions, and/or interpretation time limitations (workload) which occur in screening programs or when a huge amount of data is available for review.
5. Ability of the radiologist: radiologists may have different levels of prior training and knowledge.
Overcoming the image quality limitations may eventually rely on the improvement of imaging hardware and optimal acquisition protocols. However, with advances in computer vision, artificial intelligence, and computer technology, along with the availability of large databases of cases, CAD could potentially impact all aspects of the image interpretation performance. As a second reading tool, CAD could potentially reduce human perceptual search errors and interpretation errors, and improve diagnostic accuracy. As an automatic technique, CAD has potential to reduce the variability between and within observers, and thus improve the consistency of the image interpretation. With advanced artificial intelligence and computer graphics techniques, CAD yields quantitative measures such as morphology and physiological parameters and also improved visualizations, which could potentially reduce image interpretation time and improve efficiency.
Lung cancer ranks as the leading cause of cancer deaths in the USA, and early detection is essential for effective therapeutic intervention (Flehinger et al., 1992; Shah et al., 2003). Because of its low cost, simplicity, and low radiation dose, chest radiography is used for the clinical assessment of many thoracic diseases, including the detection of lung nodules as indicators of primary lung cancer or metastases. Radiologists may miss nodules on radiographic chest images due to the presence of overlying ribs, bronchi, blood vessels, and other anatomic lung structures. A CADe system for the detection of lung nodules aims to direct a radiologist’s attention to potential nodule sites.
Due to the elimination of overlapping structures in thoracic CT, CT is more sensitive than chest radiography in the detection of lung cancer at an early stage (Henschke et al., 2001). However, even with the reduction of overlapping structures, nodule detection is still limited due to the presence of blood vessels. The advent of low-dose CT has led to lung cancer screening programs (Henschke et al., 2001; Sone et al., 2001; Swensen et al., 2003). Also, the high number of CT slices in a single case, which need to be reviewed by a radiologist, makes lesion detection a burdensome task that is complicated by fatigue and distractions. A CADe system for the detection of nodules on CT images is expected to help radiologists focus their attention on regions suspected of being cancerous, and avoid missing nodules. This seems especially advantageous in low-dose CT lung cancer screening procedures, in which most cases will be normal – similar to the large number of normal mammograms requiring review in a breast cancer screening program.
Figure 24.5a, for example, reproduces a chest radiograph with a subtle nodule camouflaged by a rib in the upper right lung, making it difficult to detect. While one of the benefits of thoracic CT is the removal of overlapping tissues in the image, structure noise can still cause problems in detection, as in Figure 24.5b, in which a nodule is located near a neighboring vessel. If a computer could indicate the location of the suspect lesion, the radiologist’s attention would be drawn to the region, potentially avoiding the miss.
Figure 24.5 (a) Chest radiograph with a subtle nodule (arrow) camouflaged by a rib in the upper right lung, making it difficult to detect. (b) Thoracic computed tomography image, with a nodule (circled) located near a neighboring vessel, illustrating that, even with the removal of overlapping tissues in the image, structure noise can still cause problems in detection.
Another cause for misses in detection tasks is when the radiologist must review a large amount of image data. Such a tedious task can occur in a screening program where most cases are normal or when searching a large amount of image data such as in thoracic CT (which can reach hundreds of slices). Detecting abnormalities in screening cases, in which most are normal, has been compared to the task of finding the character “Waldo” (also known as “Wally”) in a Where’s Waldo? book in which he appears on only five of 1000 pages! The vast amount of data for human interpretation in a cancer screening program makes lesion detection a burdensome task, impacts radiologists’ workloads, and causes oversight errors.
For example, Figure 24.6 shows mammograms from a screening program. They had been input to the University of Chicago prototype CADe system for analysis of performance in the prospective computer analysis of cancers missed on screening mammography (Nishikawa et al., 2001). After the automatic analysis, the computer correctly detected the lesion in the right mediolateral oblique (MLO) view, but it also placed one false mark in the left MLO view and two false marks in the left craniocaudal (CC) view. Note that, in this study, it was found that the lesion had been clinically missed, and also missed by the three readers in the study (Nishikawa et al., 2001).
Figure 24.6 Mammograms from a screening program that had been input to the University of Chicago prototype computer-aided detection (CADe) system for analysis of performance in the prospective computer analysis of cancers missed on screening mammography. After the automatic analysis, the computer correctly detected the lesion in the right mediolateral oblique (MLO) view, but it also placed one false mark in the left MLO view and two false marks in the left craniocaudal (CC) view. Note that, in this study, it was found that the lesion had been clinically missed, and also missed by the three readers (Nishikawa et al., 2001).
Medical imaging in radiology practice is evolving to include multimodalities, multidimensions, and multiviews. For example, a clinical breast imaging reading room, which had traditionally included hard-copy screen/film mammograms, nowadays is equipped with state-of-the-art liquid crystal display monitors displaying full-field digital mammograms, computer workstations showing 2D and 3D breast ultrasound images, and scanner workstations displaying 4D dynamic MRI. The radiologists are overwhelmed with a huge amount of information, and computer aids definitely have great potential to extract, combine, and display the relevant information.
An example in breast MRI shows that a computer may help extract useful diagnostic information from a large amount of data. A typical dynamic breast MRI exam consists of 3D images acquired before and repeatedly after the injection of a contrast agent, with each voxel possessing an MR signal intensity–time curve. One of the commonly used features for diagnosing a breast lesion as benign or malignant is the shape of the kinetic curve (American College of Radiology, 2003; Kuhl et al., 1999): a persistent type of curve typically is associated with a benign diagnosis, a washout type of curve is an indicator of malignancy, and a plateau type of curve is intermediate. Due to uptake inhomogeneity, however, averaging over the entire lesion to obtain a kinetic curve is suboptimal. In practice, radiologists search over the lesion to find the most enhancing region (“hot” area) within the lesion and manually draw a region of interest to generate the characteristic kinetic curve. This is a time-consuming task and suffers from significant inter- and intraobserver variability, and thus a computer may play a role in identifying the characteristic kinetic curve from a given lesion automatically and efficiently.
For example, Figure 24.7a shows a slice containing a lesion highlighted in Figure 24.7b (note that there are 60 slices for a 3D image at a particular time point and this just shows one slice at one time point). Figure 24.7c shows the signal–time curves of randomly selected voxels in the lesion. It can be easily seen that all three types of curves are found in one lesion. Figures 24.7d and e (solid line) show the most-enhancing region (color-coded) and the corresponding characteristic kinetic curve, respectively, as identified by a computerized approach (Chen et al., 2006b). The computer-identified washout curve leads to a correct diagnosis of malignancy (for comparison, the curve averaging over the entire lesion is also shown in Figure 24.7e (dashed line). This research has shown that computerized approaches are promising in extracting relevant and manageable information from a large amount of image data.
Figure 24.7 (a) Slice containing a lesion highlighted in (b). (c) Signal–time curves of randomly selected voxels in the lesion. (d) and (e) (solid line) show the most-enhancing region (color-coded) and the corresponding characteristic kinetic curve, respectively, as identified by a computerized approach (Chen et al., 2006b). The computer-identified washout curve leads to a correct diagnosis of malignancy (for comparison, the curve averaging over the entire lesion is also shown in (e) (dashed line)).
CAD can also play a critical role in visualization. Various investigators are developing CADe systems for CT colonography. It is important to note that these systems require both an effective visual interface for the 3D CT colonography images (McFarland et al., 2001; Royster et al., 1997) and reliable CADe output (Nappi and Yoshida, 2003; Summers et al., 2001; Suzuki et al., 2006; Yoshida et al., 2002) to aid in the detection of colonic polyps. With advanced computer graphics techniques, the interface provides the fly-through of the colon in the virtual colonoscopy – and the computer is additionally tasked to label regions suspicious for colonic polyps in these CT-based images (Summers et al., 2005; Yoshida and Dachman, 2004). In the interpretation process, the combination of an efficient interface and accurate CADe output has the potential to improve both the interpretation accuracy and interpretation time. Figure 24.8 shows the interface of a colon CADe system developed in the University of Chicago.
Figure 24.8 The interface of the prototype computer-aided diagnosis (CAD) system developed at the University of Chicago to assist radiologists in finding polyps on computed tomography colonography. The upper left, lower left, and lower right images show the multiplanar reformatted views of the colon. The upper right image shows a three-dimensional endoluminal view of the colon. The CAD system finds the highly suspicious polyps and colors them green.
24.3.2 Observer Studies: An Overview of Methodology
An observer study (or “reader study”) in CAD of medical images refers to a study of observer performances in medical image perception, interpretation, and/or diagnosis without the use of CAD compared with user performance with CAD either in a real clinical realm (“clinical study”) or in a simulated clinical setting (“laboratory study”). A well-designed observer study provides direct evidence of the impact of CAD on image perception and interpretation.
Modern experience in medical imaging has indicated that at least two sources of variability – reader variability and patient case variability – need to be accounted for when comparing two imaging modalities/conditions so that any conclusion on the impact of a new modality (e.g., CAD) can be generalized to a population of readers and a population of patients. The experimental paradigm within this context has come to be referred to as the multiple-reader, multiple-case (MRMC) ROC paradigm. In this study design for assessing CAD, a sample of readers reads a sample of patient cases under both conditions: without the computer aid and with the computer aid. The most powerful study design – for a given number of cases with verified truth-status – is the fully crossed design, in which the same readers read the same cases in both situations. A reading consists of a radiologist providing a level of suspicion or rating of the probability of the abnormal condition of interest for each patient case.
Methods for designing and analyzing MRMC experiments include the Dorfman, Berbaum, and Metz (DBM) method (Dorfman et al., 1992), the Beiden, Wagner, and Campbell (BWC) method (Beiden et al., 2000), and the Barrett, Clarkson, and Kupinski (BCK) method (Barrett et al., 2005; Clarkson et al., 2006; Kupinski et al., 2006) with the associated “one-shot” estimate of MRMC variance of AUC (Gallas, 2006). Alternative study designs are possible, and sometimes necessary, because it may be impractical for the same readers to read the same cases. Recent progress includes methods on a “doctor–patient” design in which every doctor reads his/her own patients and arbitrary designs (Gallas and Brown, 2008; Gallas et al., 2007). For an introduction to multireader ROC analysis, the reader should consult other chapters in this book and reviews by Wagner et al. (2007).
As mentioned earlier in this section, an observer study in CAD can be either a laboratory study or a clinical study. The advantages of a laboratory study include having well-controlled experimental conditions and protocols that are also less costly. However, a laboratory study can be less realistic than a real prospective clinical study in terms of both reading environments and case distributions (Gur et al., 2008).
In general, there are two types of clinical studies of mammography CADe systems: (1) “sequential” studies in which the interpretation of each clinical case is initially performed without seeing any CADe output, followed by a viewing of the computer output, and a subsequent reinterpretation of the case with the computer aid; and (2) “separate or historical” assessment of cases in which the CADe system is not used within a radiology practice for some time period, and then the computer output is used clinically during a subsequent time period. In both types of clinical studies, the performance of the radiologists interpreting without the CADe output is compared to the performance of the radiologists interpreting with the CADe output. Note that, in the second type of study, radiologists may be interpreting different mammograms from different patients.
24.3.3 Impact of CAD: Evidence From Observer Studies
24.3.3.1 Impact of CADe
Since the first CADe prototype, multiple CADe systems have been approved by the Food and Drug Administration (FDA), with the first approval being of an R2 Technology CADe system (now Hologic) in 1998. These various systems (R2/Hologic; ISSI; Kodak) have been applied on numerous systems for screen-film mammography and full-field mammography units. Websites of the systems describe performance, physical space requirements, workflow aspects, and options. The standalone performances of some of these systems tend to achieve a high detection sensitivity for clustered microcalcifications with a lower detection sensitivity for mass.
To date, multiple clinical studies/surveys have been performed and are listed in Table 24.1 (Birdwell et al., 2005; Cupples et al., 2005; Dean and Iivento, 2006; Feig et al., 2004; Fenton et al., 2007; Freer and Ulissey, 2001; Gromet, 2008; Gur et al., 2004; Helvie et al., 2004; Khoo et al., 2005; Morton et al., 2006), which also gives the percentage change in sensitivity and call-back rate for each study. Of interest is the ratio of percentage change in sensitivity (from “without” to “with use of the computer aid”) to the percentage change in call-back rate. If this ratio is equal to or greater than 1, then one might say that the use of the computer output was beneficial to the interpretation process. It is extremely important to be careful when reading such clinical studies to understand the statistical testing methods and to recognize limitations.
Study | Unaided | Aided | % change in cancer detected | % change in recall rates |
---|---|---|---|---|
Sequential | ||||
Freer and Ulissey (2001) | 12,860 | 12,860 | 19.5 | 18.5 |
Helvie et al. (2004) | 2,389 | 2,389 | 10 | 9.9 |
Birdwell et al. (2005) | 8,692 | 8,692 | 7.4 | 8 |
Khoo et al. (2005) | 6,111 | 6,111 | 1.3 | 5.8 |
Dean and Iivento (2006) | 9,520 | 9,520 | 13.3 | 26 |
Morton et al. (2006) | 21,349 | 21,349 | 7.6 | 10.8 |
Separate | ||||
Gur/Feig et al. (2004) (high volume radiologists) | 44,629 | 37,500 | –3.3 | –4.9 |
Gur/Feig et al. (2004) (low volume radiologists) | 11,803 | 21,639 | 19.7 | 14.1 |
Cupples et al. (2005) | 7,872 | 19,402 | 16.1 | 8.1 |
Fenton et al. (2007) (survey study) | 313,259 | 31,186 | 4.5 | 31 |
Gromet (2008) | 231,221 | 231,221 | 1.9 | 3.9 |
The first commercial CADe system (Riverain Medical, OH) for nodule detection on digitized chest radiographs received FDA approval in 2001 with a detection sensitivity of 65.0% with 5.3 FP marks per image (Freedman et al., 2001). Radiologist performance was shown to improve even at moderate levels and large FP rates.
Colon cancer is another leading cause of cancer deaths. The early detection of precursor colonic polyps and their subsequent removal can drastically improve survival. CT colonography (virtual colonoscopy) is being investigated as a screening alternative to conventional colonoscopy for the early detection of such colonic polyps (Hara et al., 1997). Radiologist interpretation of a CT colonography exam can be quite time consuming, and, because of the potentially large number of axial CT images (greater than 500 slices), can result in oversight errors.
A number of CADe algorithms have been developed in colon CADe and the impact of these aids on radiologist performance is currently being evaluated in the clinical setting. Petrick et al. (2008) conducted an observer performance study to evaluate the effect of CADe as a second reader on radiologists’ diagnostic performance in interpreting CT colonographic examinations. With a modest number of patients (60 CT exams) and a modest number of observers (four board-certified radiologists), the authors demonstrated that use of CADe led to a significant increase in sensitivity for detecting polyps in the 6-mm or larger (sensitivity increased 15%, p<0.01) and 6–9-mm (sensitivity increased 16%, p<0.02) groups at the expense of a similar significant reduction in specificity. Larger, prospective clinical trials are needed to further demonstrate the effect of CADe on CT colonography.
24.3.3.2 Impact of CADx
Through various laboratory observer studies, CADx has been shown to aid radiologists in the task of distinguishing between malignant and benign breast lesions (Chan et al., 1999; Horsch et al., 2006; Huo et al., 2002; Jiang et al., 1999). Use of a computer diagnostic aid has the potential to increase sensitivity, specificity, or both in the work-up of breast lesions. Investigators have demonstrated that radiologists showed an increase in both sensitivity and specificity in the characterization of clustered microcalcifications and in the associated recommendation for biopsy (Jiang et al., 1999). In addition, it was shown that improvement in performance can be obtained both by expert mammographers and by community-based radiologists who used CADx information, with the increase greater for the nonexperts (Huo et al., 2002).
Also, use of computer output is expected to reduce the variability among radiologists’ interpretations (Jiang et al., 2001). Various CADx observer studies have been performed for individual breast imaging modalities as well as for multimodality CADx workstations. These have demonstrated that the use of computer-estimated probabilities of malignancy led to a statistically significant improvement in radiologists’ performances in the task of interpreting single-modality breast images and multimodality breast images (Giger et al., 2003; Horsch et al., 2006).
When evaluating a potential cancer case (or other disease state), radiologists consider all available information, including the entire case of images (single or multimodality) and not just individual images. Therefore, CADx systems are being developed to analyze multiple views within a single modality as well as across multimodality images. Thus, the computer–human interface of a CADx system needs to communicate both the multiple images and the CADx output to the radiologist. Figure 24.4 shows such a CADx interface. Here, the output of the CADx system can be presented in terms of numerical values related to the likelihood of malignancy, through a display of similar images of known diagnoses, or with a graphical representation of the unknown lesion relative to all lesions in an online database atlas (Giger et al., 2000, 2003; Horsch et al., 2006). This interface displays similar images and uses color coding to indicate whether the similar images are malignant or benign. In addition, with the graphical option, the probability of malignancy of the unknown case can be shown relative to the probability distributions of all the malignant and benign cases in the database or relative to the distributions of a specific feature (i.e., lesion characteristic).
Radiologists learn to interpret cases during their radiology residencies through the review of hundreds of cases, and thus the access to online cases of known pathology during a radiologist’s daily practice may be helpful for continuous learning. A search of an online image atlas can be based on individual features, on likelihoods of malignancy, or on psychophysical measures of similarity. The system in Figure 24.4 searches either via the computer-determined estimate of the probability of malignancy or by way of computer-extracted lesion features (characteristics) (Giger et al., 2003; Horsch et al., 2006). Others have combined the computer-extracted lesion characteristics with subjective similarity measures obtained from observers reviewing pairs of images (Muramatsu et al., 2005) or from observers giving subjective perceived ratings of lesion features (Zheng et al., 2007).
24.3.4 Summary of the Impact of CAD
We have discussed the limitations of human observers in the perception and interpretation of medical images, and the promise of computerized methods for overcoming these limitations. Observer studies have provided evidence of the contribution of CAD to the diagnostic performance of human observers. The positive impact of CAD is typically that the computer’s true positive findings contain abnormalities initially missed by the human observer and hence, appropriate use of CAD increases sensitivity. However, this positive impact is often associated with an expense of more FP findings, e.g., increased recall rate in screening mammography. Whether the positive impact can counteract the negative impact depends on both the CAD system and the appropriate use of the system by the human observer. How to best utilize CAD is the topic of the next section.
24.4 How to Best Utilize CAD?
CAD in medical imaging is designed to assist radiologists to better interpret medical images, improve diagnostic accuracy, and ultimately, benefit patient care. The most advantageous utilization of CAD in medical image interpretation relies on both the detection/diagnosis ability (i.e., standalone performance) of the computer system and proper use of the CAD output by the clinicians. Studies in mammography CADe have shown how the CAD performance level could affect radiologists’ performance in detecting subtle masses and microcalcification clusters. Results reported by Zheng et al. (2001, 2004) suggest that highly performing CAD schemes have the potential to significantly improve detection performance of radiologists, whereas poorly performing schemes had little or negative effect on radiologists’ performance in identifying abnormalities depicted on mammograms.
CAD can be used in two reading paradigms (or modes): the so-called “second reader” paradigm and the “concurrent” paradigm. In the “second reader” paradigm, radiologists view the CAD output after they have made their initial interpretation without computer aid. This is the classic paradigm that most CAD systems are currently designed, evaluated, and labeled to use. In this paradigm, CAD acts as a checker of the abnormalities missed by an unassisted reader. One may argue that such a paradigm increases interpretation time as compared to a single reading without computer aid since it requires further assessing the computer output in addition to a regular single reading.
A potentially more time-efficient paradigm is emerging as “concurrent reading,” which displays the CAD output at the start of image perception and interpretation. Although it is an intuitively attractive proposition, concurrent application of CAD may reduce observer vigilance and therefore reduce sensitivity. For example, Zheng et al. (2004) reported that, in CADe of mammographic masses, viewing CADe cues during the initial display consistently resulted in fewer abnormalities being identified in noncued regions.
In an investigation of the optimal CAD reader paradigm on CT colonography, Taylor et al. (2007) reported that CAD is more time-efficient in its “concurrent” mode than “second reader” mode, with similar sensitivity for polyps 6 mm or larger. However, the “second reader” paradigm maximized sensitivity, particularly for small lesions.
The relative pros and cons of different reading paradigms may need further investigation in the real clinical settings, especially in the potentially different uses of CADe and CADx. However, it is crucially important to use CAD in the mode for which it is designed and evaluated.
Proper use of a CADe output in the classic “second reader” paradigm may require that any initial human-detected region/lesion remains as a detection even if the CADe system does not indicate it. Such requirements ensure that the radiologists’ detection sensitivity either remains constant or improves with computer assistance. In the mean time, such requirements may not be necessary if the performance level of a CADe system is adequately high or the system is designed to be used in the “concurrent” mode.
Sufficient training of the radiologist is important for the best utilization of CAD since, as with any new modality, there will be a learning curve. Such training would include the use of the CAD interface so that the radiologist could extract and interpret the CAD output information appropriately. Getting the radiologist familiar with the typical correct and incorrect findings from the CAD may help the radiologist better combine the CAD output into his/her final diagnostic decisions.
Also, radiologists may better use CAD, and with more confidence, if intermediate CAD output is shown to help radiologists understand the image analyses in the “black box.” For example, Figure 24.9 shows the output from a commercial CADe system in which the radiologist can touch the lesion on the screen and the system will show the computer segmentation. Different algorithms may yield different segmentations (i.e., delineations) of lesions, as demonstrated in Figure 24.10, which shows a breast mass in mammography as outlined (“perceived”) by a radiologist and three different computer algorithms (region-growing method, radial gradient index-based method, and snake-based method, respectively; Huo et al., 1995; Kupinski and Giger, 1998; Yuan et al., 2007).
Figure 24.9 Output from a commercial computer-aided detection (CADe) system in which the user can touch the lesion on the screen and the system will show the computer segmentation.
Figure 24.10 Image of a breast mass in mammography as outlined (“perceived”) by a radiologist and three different computer algorithms (region-growing method, radial-gradient index (RGI)-based method, and snake-based method, respectively; Huo et al., 1995; Kupinski and Giger, 1998; Yuan et al., 2007).
If the segmentation is “poor” from the radiologist’s perspective, then the radiologist can assume that the subsequent computer analysis of the “lesion” will be erroneous, and thus he/she should proceed with caution in using the computer output.
24.5 Summary
The purpose of this chapter has been to present a broad overview of CAD in medical imaging from an image perception perspective. As a tool to assist humans in overcoming perceptual errors and other human limitations in the interpretation of medical images, CAD has seen tremendous growth in technical development, assessment methodology, and clinical applications. Laboratory and clinical observer studies have shown evidence that CAD can help detect cancers missed by human perception. Nonetheless, the increased sensitivity is often associated with a cost of decreased specificity. Thus research is needed for further development of computerized techniques in CAD systems, further development of human–computer interactions in image perception tasks, and further assessment of the acceptable levels for sensitivity and specificity. While we have focused on the overall impact of CAD on image perception, it should be noted that image perception research can have fundamental impacts on CAD – since CAD is designed to catch human perceptual errors, an overall understanding of the perceptual process may fundamentally impact the way CAD is designed and used (Krupinski et al., 1998).