ESTRO 2025 - Abstract Book

S2539

Physics - Autosegmentation

ESTRO 2025

Purpose/Objective: Deep-learning (DL) enhances tumor (GTV) auto-segmentation, but requires substantial manual corrections. With interactive DL (iDL), clinical expertise can be used to guide the auto-segmentation process actively, aiming for minimal input to achieve clinically-acceptable segmentations. We implemented an iDL workflow for head-and-neck GTV segmentation, and validated its performance, transferability across centers and datasets, and usability through an observer study. Material/Methods: For primary tumor (GTVt), we pre-trained a 3D-UNet (baseline-CNN), using multi-modal imaging. For the iDL part, the oncologist needs to localize the center of the GTVt, and delineate the corresponding 3 orthogonal slices. The slices are combined with the baseline-CNN prediction and multi-modal imaging to update the baseline-CNN for 20 iterations with 4 augmentations, where the dice+focal loss is calculated with a weight map that puts most weight on false positive and -negative voxels, medium weight to true-positive and -negative voxels, and low weight to the rest. For the nodes (GTVn), we trained a 3D-UNet using multi-modal imaging and a click-map with simulated clicks inside each node. For clinical use, the oncologist needs to provide the clicks, and then the GTVn segmentation is produced. Performance was simulated on a Danish dataset (DK, n=129 baseline-training, 51 testing), and on the HECKTOR 2022 [1] training set (n=450 baseline-training, 74 testing). Cross-center performance was evaluated on datasets from the Netherlands (NL) and United States (US), comparing using the DK-trained network with use of transfer learning (NL 51-training, 24-testing; US 45-training, 20-testing). Finally, 3 observers delineated 9 cases using iDL. Main endpoint was the relative percentage added path length (APL) between the iDL prediction and the final delineation. Usability was evaluated using the System Usability Scale (SUS). Results: Overall, high segmentation accuracy was achieved with iDL in the DK and HECKTOR datasets (Table 1). Despite a drop in cross-center GTVt baseline performance, the iDL performance was similar independent of transfer learning, indicating that the tool trained in one center can be deployed in another center without losing segmentation accuracy (Table 1). The mean time to achieve clinically acceptable delineations was 9 minutes (Figure 1). The mean APL of the 3 observers for GTVt was 11/6/39%; and for GTVn 4/6/11%. The SUS scores were 95, 100, and 97.5.

Made with FlippingBook Ebook Creator