Computer-Aided Detection of Lung Cancer



Fig. 2.1
Flowchart for a generic CADe scheme for detection of lesions in medical images





2.2.2 Enhancement of Lesions in CADe


Among the steps in CADe schemes, thresholding-based methods such as multiple thresholding (Xu et al. 1997; Aoyama et al. 2002; Bae et al. 2002; Giger et al. 1988) are often used for detection of lesion candidates in CT. With such methods, the specificity can generally be low, because normal structures of gray levels similar to those of lesions could be detected erroneously as lesions. To obtain a high specificity as well as sensitivity, some researchers employ a filter for enhancement of lesions before the lesion candidate detection step. Such a filter aims at enhancement of lesions and sometimes the suppression of noise. The filter enhances objects similar to a model employed in the filter. For example, a blob enhancement filter based on the Hessian matrix enhances sphere-like objects (Frangi et al. 1999). A difference image technique employs a filter designed for the enhancement of nodules and the suppression of noise in chest radiographs (Xu et al. 1997).

Actual lesions, however, are not simple enough to be modeled accurately by a simple equation in many cases. For example, a lung nodule is generally modeled as a solid sphere, but there are nodules of various shapes and with internal inhomogeneities such as spiculated opacity and ground-glass opacity. Thus, conventional filters often fail to enhance actual lesions. Moreover, such filters enhance any objects similar to a model employed in the filter. For example, a blob enhancement filter enhances not only spherical solid nodules but also any spherical parts of objects in the lungs such as vessel crossing, vessel branching, and a part of a vessel, which leads to a low specificity. Therefore, methods which can enhance actual lesions accurately (as opposed to enhancing a simple model) are demanded for improvement of the sensitivity and specificity of the lesion candidate detection and thus of the entire CAD scheme. To improve the performance of CADe schemes, investigators sometimes employ the step of enhancement of lesions after the step of the segmentation of the organ(s) of interest. This step aims to improve the sensitivity for detection of lesion candidates in the subsequent step. It often helps improve the specificity as well.


2.2.3 False-Positive Reduction


A machine learning technique (Suzuki 2013) is generally used in the step of classification of lesion candidates. The machine learning technique is trained with sets of input features and correct class labels. This class of machine learning is referred to as feature-based machine learning or simply as a classifier. The task of the machine learning here is to determine “optimal” boundaries for separating classes in the multidimensional feature space which is formed by the input features (Duda et al. 2001). Feature-based machine-learning algorithms include linear discriminant analysis (LDA) (Fukunaga 1990), quadratic discriminant analysis (QDA) (Fukunaga 1990), multilayer perceptron (one of the most popular artificial neural network (ANN) models) (Rumelhart et al. 1986), support vector machines (SVMs) (Vapnik 1995), and random forests. The structure of an ANN may be designed by using an automated design method such as sensitivity analysis (Suzuki et al. 2001; Suzuki 2004).

Investigators often employ an additional step of the reduction of FPs at the end in a CADe scheme. The FP reduction step aims to improve the specificity of the CADe scheme. Reduction of FPs is very important, because a large number of FPs could adversely affect the clinical application of CADe. A large number of FPs are likely to confound the radiologist’s task of image interpretation and thus lower his/her efficiency. In addition, radiologists may lose their confidence in CADe as a useful tool.

Recently, as available computational power has increased dramatically, pixel-/patch-based machine learning (Suzuki 2012a) emerged in medical image processing/analysis which uses pixel values in images directly, instead of features calculated from segmented regions, as input information; thus, feature calculation or segmentation is not required. Pixel-/patch-based machine learning has been used in the classification of the detected lesion candidates in CADe and CADx schemes (Suzuki et al. 2003a, 2005a, b, 2006b, 2008b, 2010a; Arimura et al. 2004).



2.3 Supervised “Lesion Enhancement” MTANN Filter


We believe that enhancing actual lesions requires some form of “learning from examples”; thus, machine learning plays an essential role in this task. To enhance actual lesions accurately, we developed a supervised filter based on a machine-learning technique called a massive training artificial neural network (MTANN) (Suzuki et al. 2003a) filter in a CADe scheme for the detection of lung nodules in CT. By extension of “neural filters” (Suzuki et al. 2002a, b) and “neural edge enhancers” (Suzuki et al. 2003c, 2004b), which are ANN-based (Rumelhart et al. 1986) supervised nonlinear image-processing techniques, MTANNs (Suzuki et al. 2003a) have been developed for accommodating the task of distinguishing a specific opacity from other opacities in medical images. MTANNs have been applied for the reduction of false positives (FPs) in CADe of lung nodules in low-dose CT (Arimura et al. 2004; Suzuki et al. 2003a) and chest radiography (Suzuki et al. 2005b), for distinction between benign and malignant lung nodules in CT (Suzuki et al. 2005a), for enhancement of lung nodules in CT (Suzuki 2009), for suppression of ribs in chest radiographs (Suzuki et al. 2004a, 2006a; Chen et al. 2016; Chen and Suzuki 2013, 2014), and for reduction of FPs in computerized detection of nodules in CT colonography (Suzuki et al. 2006b, 2008b, 2010a, b).


2.3.1 Architecture of an MTANN Filter


The architecture of an MTANN supervised filter is shown in Fig. 2.2. An MTANN filter consists of a supervised regression model such as a linear-output ANN regression model (Suzuki et al. 2003c) which is a regression-type ANN capable of operating on pixel data directly. The MTANN filter is trained with input CT images and the corresponding “teaching” images that contain a map for the “likelihood of being lesions.” The pixel values of the input images are linearly scaled such that –1000 Hounsfield units (HUs) correspond to 0 and 1000 HUs correspond to 1. The input to the MTANN filter consists of pixel values in a subregion/volume (image patch), V S , extracted from an input image. The output of the MTANN filter is a continuous scalar value, which is associated with the center pixel in the subregion/volume (image patch) and is represented by



$$ O\left( x, y, z\right)= NN\left({\overrightarrow{I}}_{x, y, z}\right), $$

(2.1)
where



$$ {\overrightarrow{I}}_{x, y, z}=\left\{ I\left( x- i, y- j, z- k\right)| i, j, k\in {V}_S\right\} $$

(2.2)
is the input vector to the MTANN; x, y, and z are the coordinate indices; NN (·) is the output of a supervised regression model (e.g., linear-output ANN regression model); i, j, and k are the coordinate indices in V s ; and I(x,y,z) is the normalized voxel value of the input isotropic volume. The linear-output ANN regression employs a linear function,f L (u) = a ⋅ u + 0.5, instead of a sigmoid function, f S (u) = 1/{1 +  exp (−u)}, as the activation function of the output layer unit because the characteristics and performance of an ANN are improved significantly with a linear function when applied to the continuous mapping of values in image processing (Suzuki et al. 2003c). Note that the activation function in the hidden layers is still a sigmoid function. For processing of the entire image, the scanning of an input CT image with the MTANN is performed pixel by pixel, as illustrated in Fig. 2.3b.

A340376_1_En_2_Fig2_HTML.gif


Fig. 2.2
Architecture of an MTANN consisting of a linear-output ANN regression model with multiple layers, with subregion/volume input and single-pixel output


A340376_1_En_2_Fig3_HTML.gif


Fig. 2.3
Training and application of an MTANN filter for enhancement of lesions. (a) Training of an MTANN filter. (b) Application of the trained MTANN filter to a new CT image


2.3.2 Training of an MTANN Filter


For the enhancement of lesions and suppression of non-lesions in CT images, the teaching image T(x,y,z) contains a map of the “likelihood of being lesions,” as illustrated in Fig. 2.3a. To create the teaching image, we first segment lesions manually for obtaining a binary image with 1 being lesion pixels and 0 being non-lesion pixels. Then, Gaussian smoothing is applied to the binary image for smoothing down the edges of the segmented lesions, because the likelihood of being lesions should gradually be smaller as the distance from the boundary of the lesion decreases.

The MTANN filter involves training with a large number of pairs of subregions/volumes (image patches) and pixels/voxels. For enrichment of the training samples, a training image, V T , extracted from the input CT image is divided pixel by pixel into a large number of subregions/volumes (image patches). Note that close subregions/volumes overlap each other. Single pixels are extracted from the corresponding teaching image as teaching values. The MTANN filter is massively trained by use of each of a large number of input subregions/volumes (image patches) together with each of the corresponding teaching single pixels/voxels, hence the term “massive training ANN.” The error to be minimized by training of the MTANN filter is given by



$$ E=\frac{1}{P}{\displaystyle \sum_c{{\displaystyle \sum_{x, y, z\in {V}_T}\left\{{T}_c\left( x, y, z\right)-{O}_c\Big( x, y, z\Big)\right\}}}^2}, $$

(2.3)
where c is a training case number and P is the number of total training voxels in V T . The MTANN filter is trained by a linear-output backpropagation algorithm (Suzuki et al. 1995, 2003c) where the generalized delta rule (Rumelhart et al. 1986) is applied to the linear-output ANN architecture (Suzuki et al. 2003c), which was derived for the linear-output ANN model by using the same method used for deriving the original BP algorithm (Rumelhart et al. 1986) (see Refs. (Suzuki et al. 1995, 2003c, ) for the details and the property of the linear-output BP algorithm). After training, the MTANN filter is expected to output the highest value when a lesion is located at the center of the subregion of the MTANN filter, a lower value as the distance from the subregion center increases, and zero when the input subregion contains a non-lesion.

In the computer vision field, a technology called deep learning (Lecun et al. 2015; Mnih et al. 2015) or deep convolutional neural networks (Krizhevsky et al. 2012) obtained enthusiastic attentions from the research communities and industries. The deep convolutional neural networks (Krizhevsky et al. 2012) were able to classify objects in images 20 % more correctly than did other existing classifiers that had been studied in the field in the past three decades. The MTANN approach is similar to deep convolutional neural networks, as both use image patches as input, but there are differences: (1) the output of the MTANN is images, whereas that of deep learning is class labels (e.g., cancer or non-cancer); (2) the MTANN can do image processing and pattern enhancement, but deep learning cannot; (3) the MTANN requires a very small number of training samples, but deep learning requires a million of samples; and (4) the MTANN has simpler architecture and training and thus easy to train.


2.3.3 Experiments



2.3.3.1 Database of Lung Nodules in CT


To test the performance of the MTANN filter, we applied it to our CT database consisting of 69 lung cancers in 69 patients (Li et al. 2002). The scans used for this study were acquired with a low-dose protocol of 120 kVp, 25 mA or 50 mA, 10 mm collimation, and 10 mm reconstruction interval at a helical pitch of two. The reconstructed CT images were 512 x 512 pixels in size with a section thickness of 10 mm. The 69 CT scans consisted of 2052 sections (slices). All cancers were confirmed either by biopsy or surgically. The locations of the cancers were determined by an expert chest radiologist.


2.3.3.2 Enhancement of Nodules in the Lungs in CT


To limit processing area to the lungs, we segmented the lung regions in a CT image by the use of thresholding based on Otsu’s threshold value determination (Otsu 1979). Then, we applied a “rolling-ball” technique (Hanson 1992), which is a mathematical morphology operator, along the outlines of the extracted lung regions to include a nodule attached to the pleura in the segmented lung regions (Armato et al., 2001).

To enhance lung nodules in CT images, we trained an MTANN filter with 13 lung nodules in a training database which was different from the testing database and the corresponding “teaching” images that contained maps for the “likelihood of being nodules,” as illustrated in Fig. 2.3a. To obtain the training regions, V T , we applied a mathematical morphology opening operator to the lung nodules that were segmented manually (i.e., binary regions) such that the training regions sufficiently covered nodules and surrounding normal structures. The number of hidden units was selected to be 20 by use of a method for designing the structure of an ANN (Suzuki et al. 2001; Suzuki 2004). The method is a sensitivity-based pruning method, i.e., the sensitivity to the training error was calculated when a certain unit was removed experimentally, and the unit with the smallest training error was removed. Removing the redundant hidden units and retraining for recovering the potential loss due to the removal were performed alternately, resulting in a reduced structure where redundant units were removed. The size of the input subregion, R S , was 9 by 9 pixels, which was determined experimentally in our previous studies, i.e., the highest performance was obtained with this size (Arimura et al. 2004; Suzuki and Doi 2005; Suzuki et al. 2003a); thus, the number of input units in the MTANN filter is 81. The slope of the linear function, a, was 0.01. With the parameters above, training of the MTANN filter was performed by 1,000,000 iterations. To test the performance, we applied the trained MTANN filter to the entire lungs. We applied thresholding to the output images of the trained MTANN filter to detect nodule candidates. We compared the results of nodule candidate detection with and without the MTANN filter.


2.3.3.3 A CAD Scheme Incorporating the MTANN Lesion Enhancement


A previously reported CAD scheme (Arimura et al. 2004) for detection of lung nodules in thoracic CT is shown in Fig. 2.4a. The CAD scheme employs a standard approach which consists of lung segmentation, difference image technique for enhancing nodules (Xu et al. 1997), multiple thresholding for detection of nodule candidates, segmentation of the detected nodule candidates, feature analysis of the segmented nodule candidates, rule-based scheme for reduction of FPs, and classification based on linear discriminant analysis (LDA) for the final FP reduction. The difference image technique uses two different filters: a matched filter is used for enhancing nodule-like objects in CT images, and a ring-average filter is used for suppressing nodule-like objects. We incorporated the MTANN lesion enhancement filter in our CAD scheme to improve the overall performance. A schematic diagram of our MTAN-based CAD scheme is shown in Fig. 2.4b. In the MTANN-based CAD scheme, nodule candidates are detected (localized) by the MTANN lesion enhancement filter followed by thresholding. The detected nodule candidates generally include true positives and mostly FPs.

A340376_1_En_2_Fig4_HTML.gif


Fig. 2.4
Comparison of a standard CAD scheme with an MTANN-based CAD scheme. (a) Schematic diagram of a standard CAD scheme. (b) Schematic diagram of an MTANN-based CAD scheme


2.3.4 Results



2.3.4.1 Enhancement of Nodules in the Lungs on CT Images


We applied the trained MTANN filter to original CT images. The results of enhancement of nodules in CT images by the trained MTANN filter (Suzuki et al. 2008a) are shown in Fig. 2.5. The MTANN filter enhances nodules and suppresses most of the normal structures in CT images. Although some medium-sized vessels remain in the output image, the nodule with spiculation is enhanced well. We applied thresholding with a single threshold value (65 % of the maximum gray scale) to the output images of the trained MTANN filter. We compared the MTANN nodule enhancement filter with a sphere enhancement filter (Li et al. 2003) based on Hessian matrix (Frangi et al. 1999), as shown in Fig. 2.6. There are a smaller number of candidates in the MTANN-based images, whereas there are many nodule candidates in binary images obtained by using the sphere enhancement filter. The MTANN filter followed by thresholding identified 97 % (67/69) of cancers with 6.7 FPs per section, which is a substantial improvement over the performance (96 % sensitivity with 19.3 FPs/section) of our previously reported CAD scheme without MTANNs.

A340376_1_En_2_Fig5_HTML.jpg


Fig. 2.5
Lesion enhancement by using a supervised MTANN lesion enhancement filter. (a) Original axial CT slice with a lung nodule (indicated by an arrow). (b) Output image of the trained MTANN nodule enhancement filter


A340376_1_En_2_Fig6_HTML.jpg


Fig. 2.6
Comparison of nodule enhancement by the conventional sphere enhancement filter based on the Hessian matrix and our supervised MTANN “nodule” enhancement filter


2.4 False-Positive Reduction with MTANNs


Reduction of FPs is very important, because a large number of FPs could adversely affect the clinical application of CADe. A large number of FPs are likely to confound the radiologist’s task of image interpretation and thus lower his/her efficiency. In addition, radiologists may lose their confidence in CADe as a useful tool. Suzuki et al. developed an FP reduction technique based on MTANNs (Suzuki et al. 2003a) for reduction of FPs in a CADe scheme for lung nodules in CT. The MTANNs were trained to enhance lung nodules and suppress various types of FPs (i.e., non-nodules) such as lung vessels.


2.4.1 A. Database of Low-Dose CT Images


The database used in this study consisted of 101 noninfused, low-dose thoracic helical CT (LDCT) scans acquired from 71 different patients who participated voluntarily in a lung cancer screening program between 1996 and 1999 in Nagano, Japan.3,18,7 The CT examinations were performed on a mobile CT scanner (CT-W950SR; Hitachi Medical, Tokyo, Japan). The scans used for this study were acquired with a low-dose protocol of 120 kVp, 25 mA (54 scans) or 50 mA (47 scans), 10 mm collimation, and 10 mm reconstruction interval at a helical pitch of two.18 The pixel size was 0.586 mm for 83 scans and 0.684 mm for 18 scans. Each reconstructed CT section (slice) had an image matrix size of 512 × 512 pixels. We used 38 of 101 LDCT scans which were acquired from 31 patients as a training set for our CAD scheme. The 38 scans consisted of 1057 sections and contained 50 nodules, including 38 “missed” nodules that represented biopsy-confirmed lung cancers and were not reported or misreported during the initial clinical interpretation.7 The remaining 12 nodules in the scans were classified as “confirmed benign” (n = 8), “suspected benign” (n = 3), or “suspected malignant” (n = 1). The confirmed benign nodules were determined by biopsy or by follow-up over a period of 2 years. The suspected benign nodules were determined by follow-up less than 2 years. The suspected malignant nodule was determined on the basis of results of follow-up diagnostic CT studies; no biopsy results were available. We used 63 of 101 LDCT scans which were acquired from 63 patients as a test set. The 63 scans consisted of 1765 sections and contained 71 nodules, including 66 primary cancers that were determined by biopsy and five confirmed benign nodules that were determined by biopsy or by follow-up over a period of 2 years. The scans included 23 scans from the same 23 patients as those in the training set, which were acquired at a different time (the interval was about 1 year or 2 years). Thus, the training set consisted of 38 LDCT scans including 50 nodules, and the test set consisted of 63 LDCT scans including 71 confirmed nodules.

The nodule size was determined by an experienced chest radiologist and ranged from 4 to 27 mm. The mean diameter of the 50 nodules in the training set was 12.7 ± 6.1 mm, and that of the 71 nodules in the test set was 13.5 ± 4.7 mm. In the training set, 38 % of nodules were attached to the pleura, 22 % of nodules were attached to vessels, and 10 % of nodules were in the hilum. As to the test set, 30 % of nodules were attached to the pleura, 34 % of nodules were attached to vessels, and 7 % of nodules were in the hilum. Three radiologists determined the nodules in the training set as three categories such as pure ground-glass opacity (pure GGO; 40 % of nodules), mixed GGO (28 %), and solid nodule (32 %); the nodules in the test set were determined as pure GGO (24 %), mixed GGO (30 %), and solid nodule (46 %).


2.4.2 Scheme for Lung Nodule Detection in Low-Dose CT


Technical details of our current scheme have been published previously (Armato et al., 1999, Armato et al., 2001). With our current CAD scheme, the multiple gray-level thresholding technique initially identified 20 743 nodule candidates in 1057 sections of LDCT images in the training set. Forty-five of 50 nodules were correctly detected. Then a rule-based classifier followed by a series of two linear discriminant classifiers was applied for removal of some false positives, thus yielding a detection of 40 (80.0 %) of 50 nodules (from 22 patients) together with 1078 (1.02 per section) false positives. The sizes of the 10 false-negative nodules ranged from 5 mm to 25 mm, and the mean diameter was 13.2±6.1 mm. In this study, we used all 50 nodules, the locations of which were identified by the radiologist, and all 1078 false positives generated by our CAD scheme in the training set, for investigating the characteristics of the MTANN and training the MTANN. The use of radiologist-extracted true nodules with computer-generated false positives was intended to anticipate future improvements in the nodule detection sensitivity of our CAD scheme. When a nodule was present in more than one section, the section that included the largest nodule was used. When we applied our current CAD scheme to the test set, a sensitivity of 81.7 % (58 of 71 nodules) with 0.98 false positives per section (1726/1765) was achieved. We used the 58 true positives (nodules from 54 patients) and 1726 false positives (non-nodules) for testing the MTANN in a validation test.


2.4.3 MTANN for FP Reduction



2.4.3.1 Architecture


The architecture and training method of the MTANN for FP reduction are shown in Fig. 2.7. When the task is the distinction between nodules and non-nodules, the output would be interpreted as the “likelihood of being a nodule.” In order to distinguish between nodules and various types of non-nodules, we extended the capability of the single MTANN and developed a multiple MTANN (multi-MTANN). The architecture of a mixture of expert MTANNs (multi-MTANN) is shown in Fig. 2.8. The multi-MTANN consists of plural MTANNs that are arranged in parallel. Each MTANN is trained by using a different type of non-nodule, but with the same nodules. Each MTANN acts as an expert for the distinction between nodules and a specific type of non-nodule, e.g., MTANN No. 1 is trained to distinguish nodules from false positives caused by medium-sized vessels; MTANN No. 2 is trained to distinguish nodules from soft-tissue-opacity false positives caused by the diaphragm; and so on. A scoring method is applied to the output of each MTANN, and then thresholding of the score from each MTANN is performed for distinction between nodules and the specific type of non-nodule. The output of each MTANN is then integrated by the integration ANN or the logical AND operation. If each MTANN can eliminate the specific type of non-nodule with which the MTANN is trained, then the multi-MTANN will be able to reduce a larger number of false positives than does a single MTANN.

A340376_1_En_2_Fig7_HTML.gif


Fig. 2.7
Architecture and training of an MTANN for classification of candidates into a nodule or a non-nodule. A teaching image for a nodule contains a Gaussian distribution at the center of the image, whereas that for a non-nodule contains zero (i.e., it is completely dark)


A340376_1_En_2_Fig8_HTML.gif


Fig. 2.8
Architecture of a mixture of expert MTANNs consisting of multiple MTANNs combined by the integration ANN


2.4.3.2 Training of MTANN


For the enhancement of nodules and suppression of non-nodules in CT images, the teaching volume contains a 3D distribution of values that represent the “likelihood of being a nodule.” We used a 3D Gaussian distribution with standard deviation σ T , the peak of which is located at the center of the nodule, as a teaching volume for a nodule and a volume that contains all zeros for a non-nodule, represented by
Jul 3, 2017 | Posted by in GENERAL RADIOLOGY | Comments Off on Computer-Aided Detection of Lung Cancer

Full access? Get Clinical Tree

Get Clinical Tree app for offline access