Clinical Artificial Intelligence Applications

We present an overview of current clinical musculoskeletal imaging applications for artificial intelligence, as well as potential future applications and techniques.

Key points

•

Deep learning for MSK imaging can be difficult because of the subtlety of pathology, the intricate biomechanical interplay of structures in diseases requiring clinical intuition to fully diagnose, and the overall breadth of the specialty.
•

There are several built-in advantages to deep learning for MSK imaging. These include ease of coregistration and localization due to the inherent rigidity of osseous structures and the ease of obtaining high-quality images owing to the widespread use of surface coils and overall patient comfort when scanning extremities.
•

Most deep learning models for musculoskeletal radiology tackle diagnosing and classifying injuries of the osseous structures.
•

Many time-consuming tasks in MSK imaging, such as measuring limb length, bone age assessment, and quantifying muscle volumes, are accelerated by deep learning.
•

As in other subspecialties, deep learning can aid in nonimaging interpretation aspects of MSK imaging including accelerating study acquisition, allowing for modality-to-modality conversion, and improving the quality of images with regard to signal-to-noise ratio.

Introduction

Unique Challenges of Musculoskeletal Radiology Artificial Intelligence

Musculoskeletal (MSK) radiology is a subspecialty of radiology that involves the interpretation of advanced medical images for the diagnosis of disorders of the osseous structures, joints, and their associated soft tissues along with the performance of minimally invasive imaging-guided procedures targeted to treat the disorders thereof. While we have seen many applications of artificial intelligence (AI) targeting disorders of other organ systems, MSK disorders present several unique challenges that have limited the rapid development of AI solutions. These difficulties involved not only the broad range of subtle diseases inherent to bone radiology but also the complex interplay of biomechanics that requires learned intuition.

The first obstacle relates to one of the largest current limitations of AI as it pertains to replacing human radiologists; each trained algorithm can only consider one or a very limited number of diagnoses at any one time. Unfortunately, the subject of MSK radiology involves a large number of differential diagnoses, spanning a large proportion of the total known diagnoses in the entire specialty. One popular online radiology encyclopedia (Radiopaedia) boasts over 55,000 radiology articles and cases, of which over 15,000 (28%) are categorized under MSK imaging. It is apparent the act of entirely replacing the duties of an MSK radiologist with the current technology would involve employing a prohibitive number of different algorithms.

Another obstacle involves the often subtle abnormalities in several significant MSK diseases or injuries, a problem that is compounded by the relatively large number of other structures that are unrelated to these subtle abnormalities. For example, when considering an elbow MR imaging in a professional pitcher, a partial tear of their single most important ligament, the ulnar collateral ligament, could lead to debilitating elbow instability and downstream degenerative changes that could prematurely end their career if not addressed. Detection of this partial tear may be limited to detecting less than a handful of abnormal voxels adjacent to the sublime tubercle of the ulnar bone on one view out of the tens of millions of voxels that compromise the entire study ( Fig. 1 ). This is a problem because contemporary deep learning techniques are based on statistical modeling. One intuitive way is to think of subtle abnormalities having exceedingly small signal-to-noise ratio (SNR), where the “signal” is the abnormality of interest and the “noise” is any other pixel or voxel. Additionally, this is a problem because most pipelines involve downsampling of original resolution volumes and images, thereby further reducing the amount of “signal” present in these subtle abnormalities.

A final challenge for the development of AI for MSK imaging tasks is that rendering diagnoses in MSK imaging often requires an understanding and integration of the underlying biomechanics of an injury with the imaging findings. The bones, joints, ligaments, muscles, and tendons of the skeletal system are a complex interdependent collective of levers, hinges, and wedges whose function depends on the normal function of proximal and adjacent parts of the system. As such, when one part of the system is damaged, there is often a cascade of interrelated injuries resulting from the subsequent malfunctioning of that specific link in the chain. Human radiologists learn the intuition necessary to use these principles when searching out injuries of the MSK system, which often significantly augments their ability to find corresponding diagnoses. The current state of deep learning technology possesses no ability to form an intuition and cannot extrapolate findings based on prior domain knowledge. One example sometimes encountered is that of a tertius (posterior malleolus) fracture. When an expert radiologist identifies an isolated finding of a posterior malleolus of the ankle fracture, they understand that the biomechanics of such an injury almost invariably preclude the absence of another fracture elsewhere in the lower extremity due to the biomechanical stresses that result in such a fracture, namely supination-external rotation forces. This triggers an automatic search for other, more subtle fractures, such as a distal fibular fracture or medial malleolus fracture, and a request for further imaging, such as a stress view of the mortise to evaluate for potentially associated deltoid ligament injury. A computer cannot be easily taught such intuition, and accordingly, AI’s ability to function in the same capacity as a radiologist is limited.

Benefits of Working with Musculosketal Radiology Artificial Intelligence

Although we have outlined several hurdles that impede the rapid development of AI toward fully replacing or automating the tasks of an MSK radiologist, there are several uniquely beneficial features of bone and joint disease that present tremendous opportunities toward automating his or her workflow and tasks that take advantage of some of the unique features of the MSK system. These involve the general rigidity of bone and joint structures, the ease of obtaining MSK studies for the patient and radiologist, and the ability to use surface coils focused on the structures of interest.

Many deep learning problems require the coregistration of different imaging acquisitions. Often when radiologists desire to consolidate the information inside multiple sequences of the same or different patients, such as an MR imaging obtained with T2 and T1 weighted sequences or a computed tomography (CT) scan containing a separate PET acquisition, they run into issues when the images contain separate coordinate systems or spatial resolutions. This issue can be addressed using image coregistration. Image coregistration is a broad subfield of study in computer vision that involves spatially aligning the features that are common in multiple acquisitions so that this and other common features overlap their footprints on any other images in all the sequences. Coregistration is typically one of the initial preprocessing steps in computer vision with medical imaging, as this obviates the requirement of keeping track of each voxels’ differing spatial coordinates when we aspire to use the information from multiple acquisitions simultaneously. Broadly speaking, coregistration of rigid structures such as the cranial vault is far easier from a compute and technical know-how requirement point of view than coregistration of elastic structures that deform and move during acquisitions such as the heart or any structure subject to respiratory motion artifact in the abdomen or chest. With few exceptions, MSK imaging involves body parts that suffer minimal to no respiratory motion, and when they do, they undergo motion in rigid, locally congruent ways. In reality, numerous MSK acquisitions suffer such little motion or movement between sequences that coregistration is not always needed. The patella, for example, remains at the same spatial coordinate in the MR imaging scanner during the entire patient’s scan, regardless of whether we are acquiring the T1, T2, or proton density sequence, with the rare exception of dynamic or kinematic CT or MR imaging.

Imaging of the extremities typically involves placing patients in relatively comfortable positions when compared to those required for imaging of internal organs or neurologic structures. A large percentage of extremity imaging is performed in extremely comfortable, extremity scanners that eliminate the issue of claustrophobia ( Fig. 2 ). This significantly reduces the potential artifacts one may face when performing imaging that is related to patient comfort, particularly motion artifacts. This not only assists with image preprocessing as discussed but is also beneficial for regularization during AI model training, as the reduction in artifacts present in MSK studies may reduce potential confounding factors in the imaging that can reduce a given algorithm’s ability to generalize to new data.

Increasingly, MR imaging is obtained using arrays of small surface coils placed at or near the body part of interest to be imaged. These have the advantage of dramatically increasing the SNR of the regions of interest while dramatically decreasing the signal contribution from a structure outside the region of interest in a type of “vignette” effect. When training classical imaging recognition neural networks on input images with a relatively small area of important signal, it often takes several steps of training for the neural network to “learn” to ignore structures outside the critical region of interest. For example, suppose we would like to train a neural network to segment the pancreas on a typical imaging slice through the upper abdomen. This slice will contain only a relatively small number of pixels that correspond to the pancreas and a large number of pixels that correspond to unimportant “noise” such as the liver, spleen, kidney, or vertebral body. It may take several hundred-thousand training “steps” comprised of forward propagation and backpropagation before the neural network learns to ignore the spurious background pixels corresponding to other abdominal organs. Furthermore, this learned irrelevance may be defective. Performing imaging with surface coils in MSK MR imaging allows us to narrow in on a much smaller and focused region of anatomy inside the human body while decreasing the signal from surrounding areas without heavy reliance on prelocalization of data.

General tasks

AI models can help perform basic tasks that a MSK radiologist is responsible for in the interpretation of imaging studies. At their simplest, these tasks can be broken down into three categories: (1) detection of a disease or image finding, (2) quantification of the severity or degree of a disease or relevant image, and finally (3) characterization of these findings or diseases. These tasks have the potential to augment radiologists by reducing the cognitive efforts required to perform them and allowing radiologists to focus their attention on more sophisticated mental tasks, such as synthesis of image findings to generate a differential diagnosis.

Detection of Disease or Condition

Perhaps the most fundamental of tasks in the interpretation of an imaging study is simply detecting the presence of disease or injury, which AI and deep learning are well primed to perform. Toward this task, there are two main approaches. The first is to train an image classifier that analyzes an entire image and outputs a single diagnosis based on what the classifier was trained to identify. For example, a binary classifier for the detection of a hip dislocation would analyze a single hip radiograph and predict whether or not a dislocation is or is not present ( Fig. 3 , top row). An important limitation of this approach is the lack of a localizer to help guide human interpretation or verification of the findings. Computational techniques, such as class activation mapping, have been developed toward this goal to create “heatmaps” showing the areas of the image most emphasized by the classifier in making its decision, although such methods have shown variable utility for the localization of diseases in medical imaging. The second main approach is to train an object-detection model that analyzes an entire image and outputs bounding boxes around parts of the image that the model thinks has the disease or condition of interest. For example, an object-detection model for hip dislocation would output bounding boxes around a dislocated hip with the label “dislocation” ( Fig. 3 , bottom row). Such models have the benefit of localization of the disease of interest to aid a radiologist, as well as inherent explainability in its decision to predict whether or not the disease or condition of interest is present. The primary limitation of such a model involves the need for images with annotation at the structure level, requiring manual placement of bounding boxes on the disease or condition of interest, a time-consuming task.

Regardless of the approach taken, deep learning models have tremendous potential to automate the detection of diseases and conditions across all age groups and settings. In the pediatric population, deep learning classifiers have been developed to identify elbow effusions, supracondylar humerus fractures, and other acute elbow fractures on radiographs with areas under the receiver operating characteristic curve (AUC) greater than 0.97. In the adult population, deep learning models have similarly been developed to identify fractures of both the upper and lower extremities on radiographs using both classifiers and object detectors on radiographs, with some studies showing performance of AI models exceeding that of human experts. Beyond trauma, deep learning models have been used extensively for the detection of degenerative joint diseases, including osteoarthritis in both the hip and knee on radiographs, and internal derangement of the knee, including meniscus tears and anterior cruciate ligament tears on knee MR imaging and rotator cuff tears on shoulder MR imaging. Although the initial use cases for deep learning in MSK disease or condition identification have largely focused on trauma, degenerative joint disease, and internal derangement, any relevant finding that a radiologist is tasked with identifying could potentially be automated using AI and deep learning.

Quantification

Once a disease or condition is identified, one of the next steps for the MSK radiologist is to quantify the severity or degree of disease burden. As for disease or condition identification, a few approaches exist. First, a classifier can be trained to categorize an image into one of two or more severity categories. The benefit of this approach is that labels are only necessary on an image-level basis. Second, a deep learning regression model ( Fig. 4 , top row) can be trained to directly output a quantitative score or measurement of disease burden or severity of a disease. This approach can be difficult to perform because it requires quantitative measurements of a disease or condition, such as angulation of a specific osseous deformity. Third, a segmentation model can be trained to directly outline a disease or condition of interest, which can then, in turn, be used to calculate various quantities, including the area or volume of a lesion and the angle of a particular anatomic relationship, such as fracture angulation ( Fig. 4 , bottom row).

Deep learning classifiers have been trained to quantify severity or subtype of disease, ranging from the grading of rotator cuff tears on shoulder MR imaging and ACL tears on knee MR imaging to specific subtype of calcaneal fractures on CT. Deep learning regression models have been used in MSK imaging toward quantification tasks such as bone mineral density prediction from CT and measurement of alpha angle on hip radiographs. Perhaps the most widely known type of deep learning regression models used for quantification of the MSK “condition” is for bone age prediction, which was the topic of the first Radiological Society of North America AI Challenge in 2017, in which the top-performing model achieved accurate bone age predictions with a mean absolute difference of 4.2 months compared to radiologist ground-truth, which have been improved even further using ensembles of the top submissions to that competition. Deep learning segmentation models have been used extensively to directly quantify the degree or severity of cartilage disease or defects ^, on MR imaging of the knee, the volume of the anterior cruciate ligament in both native and surgically reconstructed ligaments, ^, and body composition (eg, cross-sectional areas of fat vs muscle on CT). Segmentation models have also been used to identify anatomic landmarks, from which certain measurements of skeletal anatomy have been performed, such as spinal deformity on radiographs. ^,

Characterization

An additional basic task the MSK radiologist is tasked with once they have identified a relevant disease or condition of interest is to further characterize that disease. The specific characteristics of that disease or condition will depend on the specific treatment paradigms. For example, characterization of degenerative joint disease will require description and breakdown of what features of disease are present (Are there osteophytes, joint space narrowing, or subchondral cysts?), while characterization of a bone lesion or tumor will require description of whether or not the lesion is aggressive or malignant versus not aggressive or benign, both of which are characterization tasks that have been performed using deep learning. The approaches toward characterization of disease are similar to those used for identification of disease and quantification and can theoretically be used to characterize disease to any level of granularity provided that there are enough labeled data with high-quality annotations.

Change detection and prediction

Beyond the basic tasks described in the preceding sections, the MSK radiologist is responsible for comparing the current image to prior images to evaluate for change in disease or condition. Such a task is more complex than simply detecting, quantifying, or characterizing disease because it requires processing of two pieces of data from two separate time points. Nevertheless, comparing images for change is crucial when evaluating treatment responses. Advanced deep learning techniques have been developed that expand on the approaches described in the preceding section to be able to not only quantify or grade the severity of disease but also to evaluate for change. A recent study by Li and colleagues used a special type of convolutional neural network called a “Siamese” neural network that can compare two knee radiographs to each other and output a continuous measure of change in osteoarthritis severity scoring ( Fig. 5 ), thereby allowing for detection of change of disease over time. Although this is a proof of concept that has been applied to this single use case in MSK imaging, this approach could be applied to other clinical scenarios the MSK radiologist may face, such as evaluating change in fracture alignment or healing over time. An alternative approach also evaluated by Li and colleagues is outputting quantitative severity scores for osteoarthritis and directly comparing those scores.