ESTRO 2023 - Abstract Book

First page Table of contents Previous page 277 Next page Last page

S264

Saturday 13 May

ESTRO 2023

Conclusion Following minor editing (AI- MinEd ), AI contours were dosimetrically non-inferior to manual delineations and reduced delineation time by 79%.

PD-0330 Clinical generalisability of a custom auto-contouring model for Prostate radiotherapy Y. McQuinlan 1 , T. Guerrero Urbano 2 , D. Eaton 2 , M. Battye 1 , M. Gooding 1 , M. Khan 2

1 Mirada Medical, Science and Research, Oxford, United Kingdom; 2 Guy's and St Thomas' NHS Foundation Trust, Radiotherapy, London, United Kingdom Purpose or Objective The performance of Artificial Intelligence (AI) based contouring solutions depends on the quality of the data provided and assessment is often done using the development set. Within a public healthcare setting, this makes it difficult to understand generalisability beyond a given population. The purpose of the study was to evaluate the generalisability of a clinic specific AI autocontouring model on an independent test set. Materials and Methods Computed Tomography (CT) scans from 200 Prostate patients were retrospectively collected from a National Health Service Trust (NHS). A single observer outlined Prostate, Seminal Vesicles, Rectum, Bladder, Penile Bulb and Femoral Heads according to consensus guidelines, on each CT. The contours were peer-reviewed by a Consultant Oncologist specializing in Prostate radiotherapy. The contours used in the training data were compliant to consensus guidelines. The Research Autosegmentation Model (RAM) was trained on 160 of those cases and evaluated on a test set of 20 cases. The outputs of the model were assessed quantitatively using Added Path Length (APL), 2D 95% Hausdorff Distance (HD2D95) and 3D Dice Similarity Coefficient (DSC). A commercial deep learning contouring model (DLC), trained on another population, was evaluated on the RAM test set. The DLC model was developed to comply with consensus guidelines. Both models were then assessed for performance on a third external dataset, sourced from a United Kingdom (UK) population. This external dataset had reference contours, outlined to consensus guidelines. A Wilcoxon Sign Rank Test was used to determine statistical significance. This statistical test was chosen to determine if the outputs of RAM and DLC, from a single group of shared patients, are significantly different from each other. Results As expected, each model performed more favourably on the dataset population from which the model was derived. On the independent UK external data set, performance was comparable. Observing DSC, most structures showing no statistically significant difference in performance, except for Prostate, p=0.05. For HD2D95, only Femoral Head Left and Right showed statistical significance, with p<0.01 and p<0.05, respectively. For APL, normalised to reference contour length, all structures showed statistically significant difference with p<0.05, except Seminal Vesicles and Penile Bulb.

Made with FlippingBook flipbook maker