ESTRO 2025 - Abstract Book
S2519
Physics - Autosegmentation
ESTRO 2025
3241
Mini-Oral Quality monitoring of AI auto-segmentation performance for prostate cancer patients
Yi Rong, Libing Zhu, Quan Chen, Nathan Y Yu Radiation Oncology, Mayo Clinic, Phoenix, USA
Purpose/Objective: Deep learning-based auto-segmentation (DLAS) has become a critical tool in modern radiation therapy, enhancing the accuracy of target and organ-at-risk (OAR) contouring. However, its clinical performance can deteriorate over time due to input data drift, such as variations in imaging protocols or patient-specific factors compared with original training datasets. This study aims to evaluate the performance of a retrained DLAS model for a clinically implemented DLAS model at male pelvis region and establish a robust framework for monitoring segmentation quality using statistical process control (SPC)-based methods. Material/Methods: A total of 340 prostate cancer cases were analyzed, comprising 170 cases in 2022 (pre-retrained model) and 170 cases in 2023 (post-retrained model implemented in clinic). The post-retrained DLAS model, based on a 3D U-Net architecture and retrained with institutional data, was assessed using Dice Similarity Coefficient (DSC), Hausdorff Distance 95 percentile (HD95), and Surface Dice Similarity Coefficient (SDSC). Outliers were detected using z-score based SPC methods with 2σ criterion, and monthly performance monitoring was conducted. Prospective (2023) and retrospective (2022) cohorts were compared to evaluate their suitability for establishing control limit to monitor DLAS performance. Results: The retrained DLAS model demonstrated improved segmentation accuracy, with mean DSC increasing from 0.84– 0.94 (2022) to 0.89–0.97 (2023) and mean HD95 decreasing from 0.56–0.68 cm to 0.25–0.57 cm across five OARs. Figs.1&2 demonstrate control charts of monitoring for outliers and monthly degradation of bladder. Monthly monitoring revealed that the retrained model reduced outlier rates and variability compared to the pre-retrained model. Outlier analysis attributed deviations to metal artifacts and model generalization issues. The accuracy of retrospective cohort is 97.02%, 96.43%, 100%, 98.21% and 98.80% for prostate, rectum, bladder, femur_head_l and femur_head_r, respectively while the accuracy of prospective cohort is 100%, 98.81%, 100%, 99.40% and 99.40% for five organs.
Fig.1 (a) 2σ criterion-based individuals control charts of DSC, HD95 (a) and SDSC (b) performance monitoring for bladder
Made with FlippingBook Ebook Creator