ESTRO 2025 - Abstract Book
S2428
Physics - Autosegmentation
ESTRO 2025
1003
Proffered Paper Uncertainty quantification and calibration of population-based and patient-specific autosegmentation models for MRI-guided radiotherapy of lung cancer Moritz Rabe 1 , Ettore F. Meliadò 2 , Sebastian Marschner 1,3 , Claus Belka 1,3,4 , Stefanie Corradini 1 , Cornelis A.T. van den Berg 2 , Guillaume Landry 1 , Christopher Kurz 1 1 Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany. 2 Department of Radiotherapy, Division of Imaging & Oncology, University Medical Center Utrecht, Computational Imaging Group for MR diagnostics & therapy, Center for Image Sciences, Utrecht, Netherlands. 3 German Cancer Consortium (DKTK), partner site Munich, a partnership between DKFZ and LMU University Hospital Munich, Munich, Germany. 4 Bavarian Cancer Research Center, BZKF, Munich, Germany Purpose/Objective: Uncertainty assessment of deep learning autosegmentation (DLAS) models could guide manual corrections of autosegmented organ-at-risk (OAR) contours in adaptive radiotherapy (ART), for example by providing uncertainty maps generated with Monte Carlo Dropout (MCD). However, uncertainties of DLAS networks are often poorly calibrated (spatial concordance of uncertainty and mis-segmentation) on an individual patient level, making these maps unreliable and clinically nonviable [1,2]. We assessed model uncertainties and patient-level uncertainty calibration of population-based and patient-specific DLAS networks and propose a patient-specific post-training uncertainty calibration method for OAR DLAS in ART. Material/Methods: The study included 122 lung cancer patients treated with a 0.35 T MR-linac, divided into 80 training, 19 validation, and 23 test cases. Six single-label 3D U-Net-based population baseline models (BM) were trained with dropout using planning MR images and clinical contours of six OARs (aorta, esophagus, left and right lungs, heart, spinal canal). Patient-specific models (PS) for improved fraction autosegmentation were obtained by fine-tuning BMs using planning MR images for each test patient. Model uncertainty was assessed using MCD with 20 variational inference samples, averaged to create MCD probability maps. Uncertainty calibration was assessed using reliability diagrams and expected calibration error (ECE), a standard metric for quantifying the differences between predicted confidences and actual accuracy [3]. A novel post-training uncertainty calibration method was implemented by fitting reliability diagrams from baseline MRIs for each test patient, enabling MCD probability rescaling for fraction images for both BM (calBM) and PS (calPS). All models were evaluated on test patient fraction images for segmentation accuracy and uncertainty calibration, using Dice Similarity Coefficient (DSC), 95th percentile Hausdorff Distance (HD95), and ECE. Averaging metrics over all patients and OARs, models were compared using non parametric Friedman and posthoc-Nemenyi tests (α=0.05). Results: Averaged over all OARs, patient-specific fine-tuning significantly (p<0.001) improved mean DSC from 0.83 (BM) to 0.89 (PS) and reduced HD95 from 13 mm (BM) to 6.5 mm (PS) (p<0.001). No significant ECE difference was found between BM and PS (p=0.35). Uncertainty calibration significantly (p<0.001) decreased ECE from 0.24 (BM) to 0.09 (calBM) and 0.22 (PS) to 0.10 (calPS) (Figure 1), with no significant changes (p>0.05) in DSC or HD95 (Figure 2).
Made with FlippingBook Ebook Creator