Clinical Artificial Intelligence Applications

Key points

•

AI in breast imaging is expected to impact both interpretation efficacy and workflow efficiency in radiology as it is applied to imaging examinations being routinely obtained in clinical practice.
•

AI algorithms, including human-engineered radiomics algorithms and deep learning methods, have been under development for multiple decades.
•

AI can have a role in improving breast cancer risk assessment, detection, diagnosis, prognosis, assessing response to treatment, and predicting recurrence.
•

Currently the main use of AI algorithms is in decision support, where computers augment human decision-making as opposed to replacing radiologists.

Introduction

Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer death among women in the United States, with over 281,000 estimated new cases and 43,000 estimated deaths in 2021. Owing to its high prevalence, the advancement of clinical practice and basic research to predict the risk, detect and diagnose the disease, and predict response to therapy has a high potential impact. Over the course of many decades, medical imaging modalities have been developed and used in routine clinical practice for these purposes in several capacities, including detection through screening programs, staging when a cancer is found, and planning and monitoring treatment. Screening with mammography is associated with a 20% – 40% reduction in breast cancer deaths. However, screening with mammography alone may be insufficient for women at high risk of breast cancer. For example, cancers can be missed at mammography in women with dense breasts because of the camouflaging effect. The need for more effective assessment strategies has led to the emergence of newer imaging techniques for supplemental screening and diagnostic, prognostic, and treatment purposes, including full-field digital mammography (FFDM), multiparametric magnetic resonance imaging (MRI), digital breast tomosynthesis (DBT), and automated breast ultrasound (ABUS). ^,

While imaging technologies have expanding roles in breast cancer and have provided radiologists with multimodality diagnostic tools applied to various clinical scenarios, they have also led to an increased need for interpretation expertise and reading time. The desire to improve the efficacy and efficiency of clinical care continues to drive innovations, including artificial intelligence (AI). AI offers the opportunity to optimize and streamline the clinical workflow as well as aid in many of the clinical decision-making tasks in image interpretations. AI’s capacity to recognize complex patterns in images, even those that are not noticeable or detectable by human experts, transforms image interpretation into a more quantitative and objective process. AI also excels at processing the sheer amount of information in multimodal data, giving it the potential to integrate not only multiple radiographic imaging modalities but also genomics, pathology, and electronic health records to perform comprehensive analyses and predictions.

AI-assisted systems, such as computer-aided detection (CADe), diagnosis (CADx), and triaging (CADt), have been under development and deployment for clinical use for decades and have accelerated in recent years with the advancement in computing power and modern algorithms. These AI methods extract and analyze large volumes of quantitative information from image data, assisting radiologists in image interpretation as a concurrent, secondary, or autonomous reader at various steps of the clinical workflow. It is worth noting that while AI systems hold promising prospects in breast cancer image analysis, they also bring along challenges and should be developed and used with abundant caution.

It is important to note that in AI development, two major aspects need to be considered: (1) development of the AI algorithm and (2) evaluation of how it will be eventually used in practice. Currently, most AI systems are being developed and cleared by US Food and Drug Administration (FDA) to augment the interpretation of the medical image, as opposed to autonomous use. These computer-aided methods of implementation include a second reader, a concurrent reader, means to triage cases for reading prioritization, and methods to rule out cases that might not require a human read (a partial autonomous use). In the evaluation of such methods, the human needs to be involved as in dedicated reader studies to demonstrate effectiveness and safety. Table 1 provides a list of AI algorithms cleared by the FDA for various use cases in breast imaging.

Table 1

FDA-cleared AI algorithm for breast imaging

Data from FDA Cleared AI Algorithm. https://models.acrdsi.org .

Product	Company	Modality	Use Case	Date Cleared
ClearView cCAD	ClearView Diagnostics Inc	US	Diagnosis	12/28/16
QuantX	Quantitative Insights, Inc	MRI	Diagnosis	7/19/17
Insight BD	Siemens Healthineers	FFDM, DBT	Breast density	2/6/18
DM-Density	Densitas, Inc	FFDM	Breast density	2/23/18
PowerLook Density Assessment Software	ICAD Inc	DBT	Breast density	4/5/18
DenSeeMammo	Statlife	FFDM	Breast density	6/26/18
Volpara Imaging Software	Volpara Health Technologies Limited	FFDM, DBT	Breast density	9/21/18
cmTriage	CureMetrix, Inc	FFDM	Triage	3/8/19
Koios DS	Koios Medical, Inc	US	Diagnosis	7/3/19
ProFound AI Software V2.1	ICAD Inc	DBT	Detection, Diagnosis	10/4/19
densitas densityai	Densitas, Inc	FFDM, DBT	Breast density	2/19/20
Transpara	ScreenPoint Medical B.V.	FFDM, DBT	Detection	3/5/20
MammoScreen	Therapixel	FFDM	Detection	3/25/20
HealthMammo	Zebra Medical Vision Ltd	FFDM	Triage	7/16/20
WRDensity	Whiterabbit.ai Inc	FFDM, DBT	Breast density	10/30/20
Genius AI Detection	Hologic, Inc	DBT	Detection, Diagnosis	11/18/20
Visage Breast Density	Visage Imaging GmbH	FFDM, DBT	Breast density	1/29/21

Abbreviations: US, ultrasound.

This article focuses on the research and development of AI systems for clinical breast cancer image analysis, covering the role of AI in the clinical tasks of risk assessment, detection, diagnosis, prognosis, as well as treatment response monitoring and risk of recurrence. In addition to presenting applications by task, the article will start with an introduction to human-engineered radiomics and deep learning techniques and conclude with a discussion on current challenges in the field and future directions.

Human-engineered analytics and deep learning techniques

AI algorithms often use either human-engineered, analytical, or deep learning methods in the development of machine intelligence tasks. Human-engineered features are mathematical descriptors/model-driven analytics developed to characterize lesions or tissue in medical images. These features quantify visually discernible characteristics, such as size, shape, texture, and morphology, collectively describing the phenotypes of the anatomy imaged. They can be automatically extracted from images of lesions using computer algorithms with analytical expressions encoded, and machine learning models, such as linear discriminant analysis and support vector machines, can be trained on the extracted features to produce predictions for clinical questions. The extraction of human-engineered features often requires a prior segmentation of the lesion from the parenchyma background. Such extraction of features has been conducted on mammography, tomosynthesis, ultrasound, and MRI. ^, For example, Fig. 1 presents a CADx pipeline that automatically segments breast lesions and extracts six categories of human-engineered radiomic features from dynamic contrast-enhanced (DCE) MRI on a workstation. Note that the extraction and interpretation of features depend on the imaging modality and the clinical task required.

In addition, AI systems that use deep learning algorithms have been increasingly developed for health care applications in recent years. Deep learning is a subfield of machine learning and has seen a dramatic resurgence recently, largely driven by increases in computational power and the availability of large data sets. Some of the greatest successes of deep learning have been in computer vision, which considerably accelerated AI applications of medical imaging. Numerous types of models, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), autoencoders, generative adversarial networks, and reinforcement learning, have been developed for medical imaging applications, where they automatically learn features that optimally represent the data for a given task during the training process. ^, Medical images, nevertheless, pose a set of unique challenges to deep-learning-based computer vision methods, and breast imaging is no exception. For one, the high-dimensionality and large size of medical images allow them to contain a wealth of clinically useful information but make them not suitable for naive applications of standard CNNs developed for image recognition and object detection tasks in natural images. Furthermore, medical imaging data sets are usually relatively small in size and can have incomplete or noisy labels. The lack of interpretability is another hurdle in adopting deep-learning-based AI systems for clinical use. Transfer learning, fusion and aggregation methods, multiple-instance learning, and explainable AI methods continue to be developed to address these challenges in using deep learning algorithms for medical image analysis.

Human-engineered radiomics and deep learning methods for breast imaging analysis both have advantages and disadvantages regarding computation efficiency, amount of data required, preprocessing, interpretability, and prediction accuracy. They should be chosen based on the specific tasks and can be potentially integrated to maximize the benefits of each. ^, The following sections will cover both of them for each clinical task when applicable.

Artificial intelligence in breast cancer risk assessment and prevention

Computer vision techniques have been developed to extract quantitative biomarkers from normal tissue that are related to cancer risk factors in the task of breast cancer risk assessment in AI-assisted breast image analysis. To improve upon the current one-size-fits-all screening programs, computerized risk assessment can potentially help estimate a woman’s lifetime risk of breast cancer and, thus, recommend risk-stratified screening protocols and/or preventative therapies to reduce the overall risk. Risk models consider risk factors, including demographics, personal history, family history, hormonal status, and hormonal therapy, as well as image-based characteristics such as breast density and parenchymal pattern.

Breast density and parenchymal patterns have been shown to be strong indicators in breast cancer risk estimation. Breast density refers to the amount of fibroglandular tissue relative to the amount of fatty tissue. These tissue types are distinguishable on FFDM because fibroglandular tissue attenuates x-ray beams much more than fatty tissue. Breast density has been assessed by radiologists using the four-category Breast Imaging Reporting and Data System (BI-RADS) density ratings, proposed by the American College of Radiology. Computerized methods for assessing breast density include calculating the skewness of the gray-level histograms of FFDMs, as well as estimating volumetric density from the 2D projections on FFDMs. ^, ^, Automated assessment of breast density on mammograms is now routinely performed using FDA-cleared clinical systems in breast cancer screening.

On a mammogram, the parenchymal pattern indicates the spatial distribution of dense tissue. To quantitatively evaluate the parenchymal pattern of the breast, various texture-based approaches have been investigated to characterize the spatial distribution of gray levels in FFDMs. Such radiomic texture analyses have been conducted using data sets from high-risk groups (eg, BRCA1/BRCA2 gene mutation carriers and women with contralateral cancer) and data sets from low or average risk groups (eg, routine screening populations). Results have shown that women at high risk of breast cancer tend to have dense breasts with parenchymal patterns that are coarse and low in contrast. Texture analysis on DBT images have also been conducted for risk assessment, with early results showing that texture features correlated with breast density better on DBT than on FFDM.

Beyond FFDM and DBT, investigators have also assessed the background parenchymal enhancement (BPE) on DCE-MRI. It has been shown that quantitative measures of BPE are associated with the presence of breast cancer, and relative changes in BPE percentages are predictive of breast cancer development after risk-reducing salpingo-oophorectomy. A more recent study shows that BPE is associated with an increased risk of breast cancer, and the risk is independent of breast density.

Various deep learning methods for breast cancer risk assessment have been reported. ^, One of these methods has shown strong agreement with BI-RADS density assessments by radiologists. Another deep learning approach has demonstrated superior performance to methods based on human-engineered features in assessing breast density on FFDMs, as deep learning algorithms can potentially extract additional information contained in FFDMs beyond features defined by human-engineered analytical expressions. Moreover, studies have also compared and merged radiomic texture analysis and deep learning approaches in characterizing parenchymal patterns on FFDMs, showing that the combination yield improved results in predicting risk of breast cancer ( Fig. 2 ). Besides analyses of FFDM, a deep learning method based on U-Net has been developed to segment fibroglandular tissue on MRI to calculate breast density.

Artificial intelligence in breast cancer screening and detection

Detection of abnormalities in breast imaging is a common task for radiologists when reading screening images. Detection task refers to the localization of a lesion, including mass lesion, clustered microcalcifications, and architectural distortion, within the breast. One challenge when detecting abnormalities is that dense tissue can mask the presence of an underlying lesion at mammogram, resulting in missed cancers during breast cancer screening. In addition, radiologists’ ability to detect lesions is also limited by inaccurate assessment of subtle or complex patterns, suboptimal image quality, and fatigue. Therefore, although screening programs have contributed to a reduction in breast cancer–related mortality, this process tends to be costly, time-consuming, and error-prone. As a result, CADe methods have been in development for decades to serve a reader besides the radiologists in the task of finding suspicious lesions within images.

In the 1980s, CADe for clustered microcalcifications in digitized mammography was developed using a difference-image technique in which a signal-suppressed image was subtracted from a signal-enhanced image to remove the structured background. Human-engineered features were extracted based on the understanding of the signal presentation on mammograms. With the introduction of FFDM, various radiomics methods have evolved and progressed over the years. ^, In 1994, a shift-invariant artificial neural network was used for computerized detection of clustered microcalcifications in breast cancer screening, which was the first journal publication on the use of CNN in medical image analysis ( Fig. 3 ).

The ImageChecker M1000 system (version 1.2; R2 Technology, Los Altos, CA) was approved by the FDA in 1998, which marked the first clinical translation of mammographic CADe. The system was approved for use as a second reader, where the radiologist would first perform their own interpretation of the mammogram and would only view the CADe system output afterward. A potential lesion indicated by the radiologist but not by the computer output would not be eliminated, ensuring that the sensitivity would not be reduced with the use of CADe. Clinical adoption increased as CADe systems continued to improve. By 2008, CADe systems were used in 70% of screening mammography studies in hospital-based facilities and 81% of private offices and stabilized at over 90% of digital mammography screening facilities in the US from 2008 to 2016.

With the adoption of DBT in screening programs, the development of CADe methods for DBT images accelerated, first as a second reader and more recently as a concurrent reader. A multireader, multi-case reader study evaluated a deep learning system developed to detect suspicious soft-tissue and calcified lesions in DBT images and found that the concurrent use of AI improved cancer detection accuracy and efficiency as shown by the increased area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, as well as the reduced recall rate and reading time.

The recommendation for high-risk patients to receive additional screening with ABUS and/or MRI has motivated the development of CADe on these imaging modalities. In a reader study, an FDA-approved AI system for detecting lesions on 3D ABUS images showed that when used as a concurrent reader, the system was able to reduce the interpretation time while maintaining diagnostic accuracy. Another study developed a CNN-based method that was able to detect breast lesions on the early-phase images in DCE-MRI examinations, suggesting its potential use in screening programs with abbreviated MRI protocols.

Investigators continue working toward the ultimate goal of using AI as an autonomous reader in breast cancer screening and have delivered promising results. A recent study demonstrated that their deep learning system yielded a higher AUC than the average AUC of six human readers and was noninferior to radiologists’ double-reading consensus opinion. They also showed through simulation that the system could obviate double reading in 88% of UK screening cases while maintaining a similar level of accuracy to the standard protocol. Another recent study proposed a deep learning model whose AUC was greater than the average AUC of 14 human readers, reducing the error approximately by half, and combining radiologists’ assessment and model prediction improved the average specificity by 6.3% compared to human readers alone ( Fig. 4 ). It is worth noting that similar to their predecessors, the new CADe systems require additional studies, especially prospective ones, to gauge their real-world performance, robustness, and generalizability before being introduced into the clinical workflow.