ESTRO 2025 - Abstract Book
S2497
Physics - Autosegmentation
ESTRO 2025
2956
Digital Poster Clinical Evaluation of 2D and 3D Deep Learning-based CT Auto-contouring in Prostate Cancer. Maram Alqarni 1,2 , Christopher Thomas 3 , Vinod Mullassery 4 , Simon Hughes 4 , Kirsty Morrison 4 , Lydia Pascal 4 , Sindu Vivekanandan 4 , Luis Ribeiro 4 , Vishal Manik 4 , Van Sim 4 , Victoria Harris 4 , Stephen Morris 4 , Gurdip Azad 4 , Ajay Aggarwal 4 , Yasmin McQuinlan 5 , Sarah Misson-Yates 3 , David Eaton 6 , Teresa Guerrero Urbano 4 , Andrew P. King 1 1 School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom. 2 Biomedical Engineering, Imam Abdurhamn bin Faisal University, Dammam, Saudi Arabia. 3 Medical Physics, Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom. 4 Department of Clinical Haematology and Oncology, Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom. 5 Science, Mirada Medical Ltd, Oxford, United Kingdom. 6 Radiotherapy Physics, Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom Purpose/Objective: Deep learning (DL) autocontouring in prostate radiotherapy (RT) reduces contouring time and interobserver variability. DL tools contour 3D volumetric imaging data so the use of 3D deep learning architectures offers potential advantages 1 , but has a higher computational cost, longer training time, and consequently a higher carbon footprint 2 . We compared the performance and carbon footprint of 2D and 3D DL models for prostate and OAR CT autocontouring. Material/Methods: Prostate, bladder, rectum, right/left femoral heads (RFH, LFH) and penile bulb were contoured by an experienced dosimetrist and subsequently checked by an experienced oncologist on CT data from 200 anonymised prostate cancer patients treated locally with radical RT. Two different DL autocontouring models were trained: a 2D nnU-net and a 3D nnU-net cascade 3 . In both, the data were split into 80% training (n=160) and 20% testing (n=40). A 5-fold cross-validation was performed on the training set with an ensemble of the 5 models applied on the test set. Quantitative evaluation included Dice similarity (DSC) ,Surface Dice Similarity coefficients (sDSC),and 95% Hausdorff distance (95% HDmm) at surface level=2mm. For qualitative evaluation, the DL autocontours for 10 test cases were reviewed by 5 independent observers who, fully blinded to the model type (i.e. 2D or 3D), were asked to select which required the least editing time. Significance testing was done using Mann-Whitney U test for quantitative and Chi-square test for qualitative results. The quantitative results of only 10 subjects, which were the same subjects used for visual assessment, were reported. Results: Quantitative metrics showed no statistically significant differences (p>0.05) (Figure 1). Qualitative assessment revealed comparable performance except for both femoral heads (2D better, p<0.05) (Figure 2). Training times were 5 days (2D) and 31 days (3D), both on an NVIDIA A6000 48GB GPU, with corresponding estimated carbon footprints of 15.5kg CO 2 and 96.4kg CO 2 respectively.
Made with FlippingBook Ebook Creator