ESTRO 2024 - Abstract Book

S4525

Physics - Machine learning models and clinical applications

ESTRO 2024

Standardised nomenclature of Organs-At-Risk (OARs) and Target Volumes (TVs) is crucial in enabling learning from large real-world radiotherapy (RT) datasets. Manual standardisation require significant time and effort and can lead to variations between observers [1]. Despite international guidelines [2], there are few automated tools available for standardising retrospective data. Recently, Artificial intelligence (AI) and machine learning (ML) methods have been utilised in radiation oncology to ensure consistency and avoid errors, including automatically contouring TVs and OARs, treatment planning, quality assurance [3, 4]. However, it is important to rigorously validate these tools across diverse datasets. The primary objective of this study is to develop an automation system to standardise retrospective RT structure datasets, reducing manual effort and enabling development of ML models in clinical decision making, particularly for breast cancer research. The model will be evaluated across multicentre datasets, demonstrating its potentially wide use. In this study, we developed a transformer-based ML model that fuses different information, bridging the gap between different modalities and capturing the complementary effects of multi-modal features. The visual- Generative Pre trained Transformer (GPT) model [5] combines 2D central CT axial slices, containing the largest TV/OAR area as image features, structure names as text features, geometry features characterising the shape of RT structures and spatial relationships that provide context around an RT structure. These are integrated into the visual-GPT to create a comprehensive multimodal feature vector. This is further trained by a neural network, which classifies the data into standardised labels for 17 different categories of OARs, primary and nodal TVs. The model was developed using data from one centre (centre A) that consisted of 1436 breast cancer patients treated between 2014 and 2018. The nomenclature of the structures was initially inconsistent across the entire cohort as per clinical practice. To ensure the accuracy of the ML model for classification, it was crucial to standardise the labels of both the training and testing samples. A script containing a set of rules was therefore developed [6] after discussion with clinicians that mapped every variant of these structures to their standardised names to generate the labels. Overall, 80% of samples were included for the training and the rest for the testing. During training, 10% of the training samples were used for validating the model training process. After training, we performed an internal validation on the withheld 90% of the test set for centre A. Additionally, univariate models and models with different combinations of features were also developed using the same training/validation/test split and compared with the final multimodal approach. The trained models were then externally validated at centre B, centre C, centre D and centre E on breast cancer patient datasets from clinical practice which had varied naming conventions. The model (Figure 1) was evaluated against F1-Score by weighting each class by the number of samples from that class. A higher value (close to 100%) indicates that the model achieved good precision, producing more true positives/negatives and very few false positives/negatives per RT structure. Material/Methods:

Made with FlippingBook - Online Brochure Maker