Abstract
Objective
Early and accurate prediction of axillary lymph node metastasis (ALNM) is crucial in determining appropriate treatment strategies for patients with early-stage breast cancer. The aim of this study was to evaluate the efficacy of radiomic features extracted from ultrasound (US) images combined with machine learning (ML) methods in predicting ALNM to improve diagnostic accuracy and patient prognosis.
Methods
In this retrospective study, data of 282 early-stage breast cancer patients from two centers were analyzed. We considered clinicopathological characteristics, conventional US features, contrast-enhanced ultrasound (CEUS) characteristics, and radiomics features. Radiomics features were extracted from US images, and using least absolute shrinkage and selection operator (LASSO) regression, 12 key features were selected to compute a Radiomics score (Rad-score). A nomogram was developed based on these features, alongside five ML models: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost). Model performance was evaluated using metrics such as the area under the curve (AUC), accuracy (ACC), sensitivity (SEN), specificity (SPE), negative predictive value (NPV), and positive predictive value (PPV).
Results
Both the nomogram and ML models, including the Rad-score combined with histologic type, significantly predicted ALNM. Among all models, the XGBoost model showed the best performance with an AUC of 0.810 and an accuracy of 84.1% in the external test set, surpassing the nomogram and other ML models. SHapley Additive exPlanations (SHAP) analysis further provided insights into the influence of individual radiomics features on ALNM prediction.
Conclusions
While the nomogram provides a useful traditional statistical approach, integrating radiomics features with ML, particularly the XGBoost model enhanced by SHAP interpretability, offers superior predictive accuracy for ALNM in early-stage breast cancer patients.
Introduction
Early-stage breast cancer, typically classified as stage I or II, is characterized by a tumor that remains confined to the breast tissue with limited spread to nearby lymph nodes, particularly the axillary lymph nodes (ALNs) [ ]. Clinically, it may present as a palpable mass, although many cases remain asymptomatic. Radiologically, early-stage breast cancer is most commonly detected through mammography. However, US is often employed as a complementary tool, particularly in patients with dense breast tissue, as it provides enhanced lesion characterization and facilitates the evaluation of ALNs [ ].
The axillary status plays a pivotal role in both staging and treatment planning for early-stage breast cancer. ALNM is typically evaluated through a combination of clinical examination, US, and, in some cases, sentinel lymph node biopsy (SLNB) or axillary lymph node dissection (ALND) [ , ]. With the increased use of breast screening, early-stage diagnoses have become more frequent, and axillary node-negative patients now account for 70%–85% of cases [ ]. ALND can have significant side effects, while SLNB, remains the gold standard for assessing clinically node-negative (cN0) breast cancer. However, SLNB carries a false-negative rate ranging from 4.6% to 16.7% [ , ].
Breast cancer is particularly prone to ALNM, yet clinical diagnosis of ALNM remains challenging. Preoperative axillary US is a first-line method for evaluating axillary nodes, but it is associated with a high false-negative rate [ ]. Several preoperative US features of breast lesions, such as tumor size and shape, have been linked to ALNM [ , ]. CEUS offers improved visualization of tumor vascularity [ ] and microcirculation, and it has also been correlated with ALNM [ , ]. However, despite these advantages, the predictive accuracy of CEUS remains limited.
Radiomics extracts quantitative imaging features from modalities such as US and computed tomography (CT), which assists in disease diagnosis and prognosis. These features may appear subtle to the naked eye, but when quantified, they can provide valuable insights into the intrinsic characteristics of the lesion [ , ]. Predictive models play a vital role in clinical decision-making, especially in the treatment of breast cancer. Traditional nomograms, based on statistical methods, are valued for their interpretability [ ], while ML models excel in handling complex data and capturing nonlinear relationships [ ]. Comparing these two approaches is crucial to identify their strengths and guide clinicians in selecting the most suitable tool for personalized treatment strategies.
Based on the above considerations, the aim of this study was to investigate the relationship between preoperative US, CEUS and radiomics in predicting ALNM in cN0 breast cancer patients and to compare the predictive ability of traditional nomograms with that of multiple ML techniques. Additionally, data from two medical centers were used to enhance the models’ generalizability and reduce potential biases from single-center studies, ensuring broader clinical applicability.
Materials and methods
Patients
This retrospective study was approved by the Institutional Ethics Committees of Zhejiang Cancer Hospital (IRB-2022-738) and Xijing Hospital (KY20192119-F-1), serving as Center 1 and Center 2, respectively. Informed consent was waived, as the study involved the use of anonymized data. A total of 282 consecutive female patients with early-stage breast cancer were included in the analysis, with 200 cases from Center 1 (collected between January 2017 and December 2018) and 82 cases from Center 2 (collected between January 2021 and March 2022). Both centers are tertiary institutions.
Inclusion criteria:
- 1)
Patients diagnosed with cN0 stage breast cancer.
- 2)
Preoperative US and CEUS performed prior to surgery.
- 3)
ALN status assessed through either ALND or SLNB, based on pathological data.
- 4)
Normal ALNs identified via preoperative axillary US, with no suspicious features (oval shape, smooth cortex < 3 mm, clearly visible lymphatic gates, and central vascularization) [ ].
- 5)
Clear US images with identifiable lesions and complete clinical data.
- 6)
Patients who underwent surgery for breast cancer with definitive postoperative pathology results.
Exclusion criteria:
- 1)
History of prior breast or axillary surgery.
- 2)
Biopsy or resection of breast cancer performed before the US examination.
- 3)
Receipt of preoperative neoadjuvant chemotherapy or radiotherapy.
- 4)
Presence of non-mass-type lesions.
- 5)
Presence of multifocal lesions.
- 6)
Diagnosis of breast lymphoma or other specific malignant breast pathologies.
Patients from Center 1 were randomly divided into training and validation sets in a 7:3 ratio, resulting in 140 cases for the training set and 60 cases for the validation set. The 82 patients from Center 2 were used as an external test cohort. In accordance with the recommendations of the ACOSOG Z0011 trial [ ], ALND was not performed after SLNB if metastases were confined to one or two SLNs. However, ALND was conducted if metastases were present in three or more SLNs or if no SLN was detected. All resected ALNs were histologically examined and classified as metastatic or nonmetastatic. Collected patient data included demographic information (age), clinical characteristics (tumor laterality and orientation), and pathological details (histological type, grade, ER status, PR status, HER2 status, molecular subtype, and Ki-67 status). Molecular subtypes were categorized as triple-negative, HER2-positive, hormone receptor-positive, and HER2-negative. The overall workflow of this study is shown in Figure 1 .

Image acquisition and evaluation
The preoperative US examinations were performed using various instruments, including the Toshiba Aplio 400, LOGIQ E9, and Philips iU22. CEUS examinations were specifically conducted with LOGIQ E9. Conventional US utilized a high-frequency linear array probe (model L12-5, 5–12 MHz), while CEUS used the L9-3 probe (3–9 MHz). The contrast agent, SonoVue (Bracco S.p.A, Milan, Italy), was prepared by mixing 5 ml of the agent with sterile saline. US parameters included a mechanical index of ≤ 0.13 MHz, monofocal positioning at the bottom of the image, manual stabilization of the probe, and no additional pressure applied. After manually injecting 2.4 mL of SonoVue, the selected imaging plane remained unchanged, and 2 minutes of real-time imaging were recorded. All static and dynamic images were saved on the US system and later exported as DICOM (Digital Imaging Communications in Medicine) files.
The features of the lesions were reviewed based on the following criteria: (1) Maximum tumor diameter on US (mm). (2) Tumor size classification (T1: T ≤ 20 mm, T2: 20 mm < T ≤ 50 mm). (3) Shape (round or oval regular shapes vs. irregular shapes). (4) Margin (circumscribed vs. uncircumscribed). (5) Internal homogeneity (homogeneous vs. inhomogeneous). (6) Aspect ratio (≤ 1 vs. > 1). (7) Calcification (absent, macrocalcification, or microcalcification). (8) Posterior echo attenuation (present vs. absent). (9) Color Doppler flow imaging (CDFI) grade (0–1 or 2–3), based on Adler’s method [ ].
The qualitative CEUS features of the lesions included: (1) Enhancement degree (hypoenhancement, isoenhancement, or hyperenhancement). (2) Margin after enhancement (circumscribed vs. uncircumscribed). (3) Internal homogeneity (homogeneous vs. heterogeneous). (4) Presence of echo-free areas (present vs. absent). (5) Presence of thick or twisted penetrating vessels (present vs. absent). (6) Size after enhancement (larger vs. similar to baseline size). All US and CEUS data were independently evaluated by two radiologists with 6 and 5 years of experience, respectively, in breast tumor diagnosis using US. Both radiologists were fully blinded to the patients’ demographic and clinical information, including age, medical history, prior imaging results, and the study’s hypothesis or model details. Disagreements between the two radiologists were initially resolved through discussion. If consensus could not be reached, the chief of the US department at each center acted as the final arbiter.
Depiction of region of interest (ROI) and extraction of radiomics features
In the process of delineating ROIs on US images using the 3D Slicer software (version 5.0.3), as shown in Figure 2 , two experienced radiologists independently performed the segmentation. To ensure the reliability and reproducibility of the segmentation, the consistency between the radiologists was quantitatively assessed using the Intraclass Correlation Coefficient (ICC). An ICC value greater than 0.80 indicated excellent agreement, suggesting minimal variability in ROI delineation.

In cases where the ICC showed lower agreement, a consensus was achieved through discussion, or a third senior radiologist was consulted to resolve any discrepancies. A total of 837 radiomics features consisting of 6 classes were extracted via the SlicerRadiomics plug-in. These features included: (1) 162 first-order parameters. (2) 216 gray-level co-occurrence matrix (GLCM) parameters. (3) 126 gray-level difference method (GLDM) parameters. (4) 144 gray-level run-length matrix (GLRLM) parameters. (5) 144 gray-level size zone matrix (GLSZM) parameters. (6) 45 neighborhood gray-tone difference matrix (NGTDM) parameters.
Feature screening and Rad-score calculation
In this study, a two-step feature selection process was employed to minimize the risk of overfitting in the predictive model. First, Z-score and mean normalization methods were applied to standardize the characteristics of each patient. The initial selection of important variables was performed using a randomForest regression model, which narrowed the feature set down to 50 key features. Next, the LASSO method was utilized to further refine the selection of the most predictive features for ALN status, based on the training data. This step used 10-fold cross-validation to ensure robust feature selection. Ultimately, 12 radiomics features were identified based on the nonzero coefficients derived from the LASSO regression ( Figure 3 a).

The Rad-score was then calculated using the following formula:
Rad−score=0.304*ZoneEntropy+0.019*Skewness+0.006*GrayLevelVariance+0.002*Autocorrelation+0.099*RunEntropy−0.003*HighGrayLevelRunEmphasis−0.002*Autocorrelation+0.043*GrayLevelVariance−0.042*RunEntropy+0.091*DependenceVariance+0.001*Complexity−0.003*Complexity
Construction and validation of radiomics nomogram, clinical models, and ML models
Univariate and multivariate logistic regression analyses were conducted to identify significant predictors of ALN status in patients with early-stage breast cancer. These analyses incorporated the Rad-score, clinical factors, and features derived from US and CEUS. A nomogram model was constructed by combining clinical risk factors with the Rad-score, in contrast to models based solely on clinical factors or the Rad-score alone. To further explore predictive capabilities, five ML models were developed using the 12 radiomics features with nonzero coefficients. These models included LR, NB, SVM, KNN, and XGBoost. The performance of both the nomogram model and the ML models was assessed using various metrics such as SEN, SPE, ACC, NPV, PPV, and AUC. These metrics were calculated for the training, validation, and external test sets. Additionally, calibration curves were employed to assess the agreement between observed and predicted outcomes, ensuring model reliability and consistency across both the nomogram and ML models.
Model interpretability
To enhance the interpretability of our model, we employed the SHAP method. SHAP quantifies each feature’s contribution to predictions by calculating its average marginal impact, providing a reliable measure of feature importance. By averaging SHAP values across the training set, we were able to identify the most influential features, improving the model’s transparency and allowing for a clearer assessment of key predictors within the training cohort.
Statistical analysis
Continuous variables were expressed as mean ± standard deviation (SD) and assessed using either the independent samples t-test or the Mann–Whitney U test for non-normal distribution data. Categorical variables were analyzed using Fisher’s exact test or the Chi-square test. Univariate and multivariate binary logistic regression analyses were conducted, leading to the construction of a nomogram. The performance of the models was evaluated using receiver operating characteristic (ROC) curves, along with metrics such as AUC, ACC, SEN, and SPE. All the analyses were performed using SPSS Statistics version 27.0, Python version 3.11.3, and R version 4.0.2. A p -value of less than 0.05 was considered statistically significant.
Results
Fundamental clinical, pathological, and imaging characteristics of the patient
The clinicopathological characteristics of the entire cohort, comprising 282 patients with early-stage breast cancer from two centers, are summarized in Table 1 . The mean age of the patients was 53.1 years, with an age range of 28 to 82 years. All patients underwent breast cancer surgery at the cN0 stage, and final pathology results revealed ALNM in 87 patients (30.9%), who formed the ALN-positive group, while 195 patients (69.1%) were classified as ALN-negative. No significant differences were observed in clinicopathological characteristics between the ALN-positive and ALN-negative groups, with the exception of histologic type, which showed a statistically significant difference ( p = 0.005). Table 2 outlines the fundamental clinicopathological characteristics of patients in the training, validation, and external test sets. Tables 3 and 4 summarize the US and CEUS characteristics of breast lesions in patients from Center 1. Analysis of the data in these tables indicated no statistically significant correlation between the US or qualitative CEUS characteristics of breast lesions and ALNM (all p values > 0.05).
