ESTRO 2024 - Abstract Book
S334
Beachytherapy - Physics
ESTRO 2024
Clinical evaluation of organs at risk deep learning auto-segmentation for cervix brachytherapy
Simon A. Keek, Anton Mans, Marlies E. Nowee, Eva E. Rijkmans, Eva C. Schaake, Rita Simões, Tomas M. Janssen
Netherlands Cancer Institute, Radiation oncology, Amsterdam, Netherlands
Purpose/Objective:
In January 2023 we clinically introduced deep learning (DL)-based auto-segmentation of organs at risk (OAR) for cervix intrauterine brachytherapy patients. At each treatment fraction, the bladder, rectum, small bowel and sigmoid are automatically segmented on the brachytherapy MRI scan according to EMBRACE II guidelines. The auto-segmentation is verified by the RTTs and edited if necessary. This work aims to evaluate auto-segmentation in daily clinical practice by studying model performance, editing required and time spent during segmentation.
Material/Methods:
Data of 13 cervical cancer patients (39 fractions) treated with brachytherapy after the introduction of auto segmentation was collected. For each patient and fraction, axial T2-weighted MR images and two sets of OAR segmentations were available: the auto-segmentations (“auto”) and the manually edited auto-segmentations (“edit”), which were used clinically. In addition, the OAR for the first fraction of each patient were manually segmented retrospectively and without knowledge of the auto-segmentation (“manual”). The model performance after clinical implementation was compared with the performance obtained during the model development stage on an independent test dataset (N=30 patients, 82 fractions) [1] using Dice coefficient and 95th percentile Hausdorff Distance (95 HD). To determine the magnitude of the edits required and to compare it with the difference between the edited auto segmentations and unbiased manual segmentations, pairwise comparisons were performed between the “auto”, the “edit” and the “manual” segmentations using Dice coefficient and 95 HD.
In a subgroup of patients, 18 fractions before and 19 fractions after the introduction of auto-segmentation, the time the RTTs spent segmenting the four OARs was measured and compared.
Statistical significance was tested using Wilcoxon signed-rank test for paired comparisons, and Mann-Whitney U test for different patient groups.
Results:
The (“manual” vs. “auto”) similarity metrics were not significantly different to those that had been measured during model development, with the exception of 95 HD for the bladder (p=0.016).
The pairwise similarity metrics over the three sets of segmentations are shown in Figure 1. Median Dice coefficient between the clinical OAR segmentation (edited) and the auto-segmentation (auto) for all OARs were above 0.8, and the 95 HD below 15mm, indicating a high similarity. When evaluating the Dice coefficient and 95 HD, all clinical OAR segmentations (edited) were more similar (p<0.05) to the auto-segmentations (auto) than to the independent manual segmentations (manual). This indicates a bias towards the auto-segmentation model output has been introduced in the clinical segmentations.
Made with FlippingBook - Online Brochure Maker