An image guided robot only becomes fully useful with integrated software leveraging image fusion. Image fusion is the process of registering and superimposing imaging data in the same coordinate space and can be helpful to image-guided robotic interventions. Effective percutaneous robotic procedures can utilize real-time image guidance and navigation which are powered by fusion technologies. By integrating information from multiple imaging modalities, fusion technologies provide insights into anatomic features and procedural targets that may not be apparent through traditional positional tracking or single-modality imaging. Current robots available for interventions highlight different approaches to utilizing real-time fusion and procedure planning. As robotics become increasingly integrated into interventional radiology clinical practice, the continued innovation and adoption of fusion-based approaches will enable more seamless use of this technology, offering the potential for improved safety, standardization, and clinical efficacy. This review explores key techniques in image fusion and highlights the integration of fusion and robotics towards the goal of optimized and automated interventional procedures.
Introduction
Fusing real-time procedural images with preoperative cross-sectional imaging is fundamental to the adoption of interventional robotics because diagnostic-quality image fusion provides the foundation to guide robotic procedures as it does in human led procedures. Fusion is the superimposition of data or imaging acquisitions on top of each other in the same coordinate space to provide the key information at the point of care. This is frequently done using complementary imaging modalities of ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) to leverage the strengths or overcome the shortcomings of each modality. Fusion can also be done with the same imaging modality at different time points (such as with and without contrast) or by combining metabolic and morphologic imaging to enable molecular interventions, or precision interventions guided by targeted agents (such as Prostate-Specific Membrane Antigen PET, DOTATATE PET, or other nuclear medicine agents). The decision on fusion approach depends on the clinical context.
Robotics are physical, information-driven tools that assist the interventionalist and lead to more standardized and automated approaches. Robotics and fusion roles span the temporal scope of preprocedural planning, intraprocedural serial iterative device guidance for placement, and postprocedural treatment confirmation. For example, overlaying a patient’s PET-CT with a noncontrast early intraprocedural planning CT scan of a liver mass highlights fluorodeoxyglucose (FDG) avid regions, aiding in target selection for biopsy to decipher tumor heterogeneity and provide optimized biomarkers. During procedures, fusing target images with real-time imaging feedback facilitates accurate visualization of device positioning in a repeatable iterative fashion, where each subsequent needle placement is adjusted based upon the collective aggregate of all previously placed needles. The treatment plan evolves in this fashion according to procedural feedback and data during the procedure. Postprocedurally, fused pre- and post-ablation images ( Fig. 1 ) enable direct comparison and quantitative assessment of therapeutic outcomes, such as minimum ablation margin, which directly and logically equates to survival, local control, and outcomes.

The use of fusion further evolves interventionalists current practice of relying on cognitive registration, spatial triangulation and landmarks to navigate toward targets utilizing information memorized from pre-procedural imaging. Image fusion can enhance visualization and improve accuracy when targeting lesions that are inconspicuous on ultrasound or non-contrast CT or difficult to target by a single modality. Looking across the room at PACS on the monitor during a case to think about 3-dimensional (3D) tomographic adds a layer of complexity that can be mitigated by fusion with robotics to standardize and expand patient care. Leveraging fusion techniques could also reduce radiation exposure by limiting the use of intraprocedural CT or fluoroscopic guidance. The ability to fuse prior contrast-enhanced CT, MRI, PET/CT with intra-procedural US, CT, or fluoroscopy, facilitates precise and serial navigation of needles, catheters, and instruments to an accurate target with real-time tracking of serial robotic devices in relation to an evolving iterative treatment plan, that enables sequential needle placements as well as updating the plan with regards to tissue at risk for under-treatment with every needle placement. This capability becomes critical to outcomes with larger tumors (> 3cm diameter) reliant upon composite ablation from multiple needles. Knowing when an endpoint is reached also relies upon accurate estimates of ablation margins, which in turn must be based upon accurate target definition and superimposed planned and actual treatment volumes.
The growth of automation in interventional procedures highlights that image-fusion and robotics are synergistic technologies. Fusion enables multimodal navigation for treatment planning while robotics facilitates automation and standardization, allowing inexperienced operators to implement plans with similar accuracy as experienced interventionalists. This leads to more consistent and improved outcomes. The downstream impact of standardization translates to enhanced opportunities for uniform, hypothesis-driven, scientific studies as well as coordinated cooperative groups for multicenter and multioperator analysis with less inherent variability of human operators who are otherwise more dependent upon hand-eye coordination without robotics.
The development and penetration of robotics into clinical practice will naturally lead to a union of robotic automation (and hardware) with fusion (and navigation software). The automatic standardized robotic treatment plan and precision interventions will rely upon interfaces and integration with treatment planning, registration, multimodality fusion as well as post ablation verification of margin software. Therefore, a review of fusion tools is relevant for a robotics community that is rapidly evolving to interface and incorporate fusion guidance.
Image registration techniques
The core concept in fusion imaging is accurate image and/or device registration. Image registration is the process of aligning 2 or more image modalities from different time points so that they are in spatial alignment. Distinct from image fusion which combines the registered images into a single composite view or alternatively may display each modality independently, linked and updated side-by-side. There are several different methods that can be considered in registration models. Knowledge of these technical aspects will help frame the complexity of image fusion used in robotic interventional radiology.
Transformation functions registration methods
This can be accomplished by establishing correspondences between related images and applying rigid, affine, nonrigid, or projective registration transformation functions ( Fig. 2 ). Transformation functions allow for adjustments of 1 image to match the spatial configuration of another. This is needed to account for the inherent variability of different image data sets acquired at different times, orientations, angles, deformations, or differing respiratory cycles, in order to bring them into spatial alignment.

Rigid registration aligns images without deformation of either image. The approach assumes that the anatomy between the 2 images is the same and preserves the distance and angles between any given point. This technique is suitable for aligning images where there are minimal significant changes, like rigid bony landmarks, given that only rotation and translation (eg panning) adjustments are possible. This transformation is limited because it does not compensate for respiration or organ deformation.
Affine registration is a more flexible form of rigid registration that allows for scaling and shearing of the image in addition to translation and rotation which allows for changes in position and shape such as those caused by patient movements or respiration. This broader range of adjustments is particularly useful for maintaining image alignment in real-time image guided procedures where continuous alignment is needed to accurately guide tools to targets.
Nonrigid or deformable registration is more complex, allowing for changes in the shape and size of anatomical structures by accounting for tissue movement, deformation, and differences in anatomy over time to be able to adapt to specific patient anatomy. This method aligns the images by applying transformations that can stretch, compress, or warp certain areas of the image to match structures or features in different scans. Nonrigid registration requires more processing time which may not be ideal for real-time guidance. It can move anatomy, which can introduce wrong assumptions, such as unrealistic scaling or moving of structures. Thus, accuracy has to be verified when using this method, since it effectively changes reality of one image to match another.
Dimensionality registration methods
The dimensionality criterion in medical image registration includes spatial and time-series dimensions. For 2D-to-2D registrations, simple transformations like rotations and translations are often sufficient, though scaling adjustments may also be needed. 3D-to-3D registration aligns imaging by assuming stable internal anatomy but requires careful calibration. 2D-to-3D registration aligns 3D volumes with a series of 2D projection images, often used in interventions with CT or MRI , to monitor disease progression or treatment response.
Intrinsic vs extrinsic registration methods
Extrinsic methods use external devices, such as fiducials, to aid in the registration process. The markers are visible in all images being aligned, providing an external reference frame. Extrinsic registration is less affected by image quality issues and facilitates precise alignment across different imaging modalities or adjustment during dynamic movements. Intrinsic registration, on the other hand, relies exclusively on internal image characteristics, such as pixel intensity patterns, anatomical features, or edges. By extracting these inherent features, intrinsic methods establish alignment based on the image data alone, avoiding external aids.
Deep learning registration methods
Deep-learning and machine-learning has been increasingly applied in image registration. , Learning-based methods leverage the ability to learn the optimal transformations from large training datasets to improve image registration speed, accuracy, and adaptability. This typically is more useful during relatively uniform data collection, such as creation of a 3D volume from a series of 2D ultrasound “sweeps”. The evolution of learning-based methods has been driven by their ability to handle complex transformations and diverse imaging scenarios. This is particularly helpful with nonrigid registrations where classic deformable transformation techniques struggle. Once trained, deep-learning methods can perform registrations quickly, making them valuable for real-time registration, which is essential for robotic image-guided interventions. For a typical procedure with uniform motion (such as prostate fusion interventions), a 2D to 3D deep learning model may obviate EM or optical tracking hardware entirely.
Image fusion algorithms
Image fusion methods differ in how they combine information from various sources. Pixel-based fusion merges raw pixel values directly, useful for images with similar resolutions, often from a single modality , Feature-based fusion extracts key features like anatomical landmarks, aligning them for enhanced detail and clarity. , Decision-based fusion integrates relevant findings from each modality separately, such as CT for bone, MRI for soft tissue, and ultrasound for real-time guidance, allowing each modality to contribute its strengths without combining all raw data.
Practical IR considerations for registration/fusion technique
The choice of registration method depends on the characteristics of the image datasets to be fused, the robotic procedure being performed, the features of the target of interest, and the strength and integrity of each modality chosen for guidance. In many applications, a simple rigid transformation is sufficient. However, in cases involving intrasubject or intersubject tissue deformation or anatomical variability, nonrigid transformation may be more appropriate.
Operators may need to select an organ, region of interest (ROI), or nearby edges (eg, the liver capsule) to prioritize for accuracy. For example, if a liver lesion is peripheral and adjacent to the liver capsule, the operator may favor accurate registration to the nearby liver capsule, even if this results in mismatches in more distant regions (eg, bone or distant sections of the liver capsule) to optimize precision near the target ( Fig. 3 ). For abdominal procedures, specific considerations include differences in imaging phases: diagnostic CT is often acquired during full inspiration, while procedural CT is typically captured during end expiration or shallow breathing. These variations introduce deformation and edge changes that require sacrificing a global accuracy of registration in favor of a local accuracy. In the example of the peripheral tumor adjacent to the liver capsule, rigid registration may outperform deformable registration when prioritizing matching only the adjacent organ edge closet to the tumor.
