ESTRO 2025 - Abstract Book

S2547

Physics - Autosegmentation

ESTRO 2025

Technology Calicut, Kozhikode, India. 18 Dept of Radiotherapy, MVR Cancer Centre and Research Institute, Kozhikode, India. 19 Dept of Radiotherapy, Netherlands Cancer Institute (NKI), Amsterdam, Netherlands. 20 Dept of Radiotherapy, Oslo University Hospital, Oslo, Norway. 21 Dept of Radiotherapy, Radboud University Medical Centre, Nijmegen, Netherlands. 22 Dept of Radiotherapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China. 23 Dept of Radiotherapy, University of Pennsylvania Medical Centre, Philadelphia, USA. 24 Dept of Radiotherapy, University Hospital Zurich, Zurich, Switzerland Purpose/Objective: Segmentation of a primary tumour is a mandatory step for lung cancer radiotherapy. Deep learning models can efficiently aid clinicians with segmentation, but require patient data to train on. Sharing of individual patients’ radiotherapy plans might raise privacy concerns. In this work, we report on a multi-national collaborative effort to develop, validate and test lung tumour segmentation models using a Federated Deep Learning approach, utilizing clinical radiotherapy DICOM CTs and RTSTRUCTs, without patient data leaving its host institution. Material/Methods: The ARGOS (ARtificial intelligence for Gross tumOur volume Segmentation) Consortium consists of 20 institutions in 10 countries with one common legal agreement for federated learning. Clinical RTSTRUCTs were taken “as treated” (no re-editing allowed). A wide range of training-institution combinations were compared. Only primary tumours in the lung were considered. Local models were exclusively trained within each participating institution; then local model weights were shared and arithmetically averaged centrally. Each cycle, the global model (no delta gradients) was transmitted back to each institution, retrained for a fixed number of steps, then re-averaged, until a total of 100 cycles was reached with warm restart after 50 cycles. The deep learning architecture was a 3D self-normalizing squeeze-and-excitation convolutional neural network adapted from previous MICCAI grand challenge publications. No post-processing was applied to model results. Results: In all, 1606 patients were used for training, 472 for (holdout) validation and 1567 for external testing. Median Dice score in holdout validation plateaued after 20 cycles (Fig 1A), but improved slowly up to 100 cycles. Median Dice for single-country and multi-national models were 0.79 and 0.80, respectively, with wide dispersal of individual Dice scores. This compares well with 0.82 reported in a widely-cited study using curated data [1]. Geometric agreement improved across the board after single-country training; possibly due to larger sample size of consistent images and reference delineations (Fig 1B,C). Robustness in unseen data improved after training multi-nationally, but not for median Dice, (Fig 1C,D) presumably due to training the model with heterogeneous image quality and varying delineation styles.

Made with FlippingBook Ebook Creator