ESTRO 2025 - Abstract Book

S2440

Physics - Autosegmentation

ESTRO 2025

Purpose/Objective: A major challenge with validating auto-contouring systems is the lack of ground-truth. Most studies compare auto contours to only a single contour using geometric metrics (e.g. Dice) which do not predict clinical acceptability. Auto contours should be consistent with inter-observer variation (IOV), but IOV studies are rarely performed as they are resource-intensive 1 . A proposed approach to overcome this lack of ground-truth is whether expert observers can identify the regions of acceptable uncertainty 2 . The study aim was to compare an expert-predicted range to IOV for validating a novel nnU-Net-based auto contouring system for cervical cancer radiotherapy. Material/Methods: Target volumes and OARs were contoured on 100 CT scans from patients previously treated with EBRT for cervical cancer using a standardised contouring protocol. Five-fold cross-validation was used to train a five model novel nnU-Net ensemble-based auto-contouring system 3 . An inter-observer contouring study was undertaken on six additional cases by six radiation oncologists. The common and encompassing volumes for all protocol compliant slices were amalgamated to make an “inter-observer range”. Prior to reviewing compliance, two experts created consensus acceptability ranges for each structure, with an inner boundary that definitely contained the structure and an outer boundary that definitely did not 2 . For each structure, the volume of auto-contour outside the expert and IOV range was calculated. Dice was calculated for the auto-contour in relation to each observer. Results: The nnU-Net performed well, and Dice values were different relative to each observer. The median Dice was 0.94 0.98 (bladder), 0.90-0.93 (CTV primary), 0.87-0.90 (CTV nodes), 0.91-0.94 (uterus) and 0.76-0.91 (anorectum). More than 96% of these auto-contours were within the IOV and expert range (Figure 1). The expert range was tighter than, but still included within the IOV (Figure 2). More auto-contours therefore fitted within the IOV than the expert range, but large IOV in some areas makes this a less discriminating comparison when validating auto-contouring systems. For the structures most challenging to contour, the Dice scores were lower reflecting greater inter-observer variation (parametrium: 0.76-0.84 and vagina: 0.54-0.70). There was similarly a greater proportion of nnU-Net auto contours falling outside the expert range and IOV.

Made with FlippingBook Ebook Creator