ESTRO 2025 - Abstract Book
S3405
Physics - Machine learning models and clinical applications
ESTRO 2025
2711
Mini-Oral Harmonizing multi-lingual and multi-institutional structure names using open-source large language models Adrian Thummerer 1 , Matteo Maspero 2,3 , Erik van der Bijl 4 , Stefanie Corradini 1 , Claus Belka 1,5,6 , Guillaume Landry 1 , Christopher Kurz 1 1 Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany. 2 Department of Radiation Oncology, Imaging and Cancer Division, University Medical Center Utrecht, Utrecht, Netherlands. 3 Computational Imaging Group for MR Diagnostics & Therapy, Center for Image Sciences, University Medical Center Utrecht, Utrecht, Netherlands. 4 4Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, Netherlands. 5 German Cancer Consortium (DKTK), partner site Munich, a partnership between DKFZ and LMU University Hospital Munich, Munich, Germany. 6 Bavarian Cancer Research Center, (BZKF), Munich, Germany Purpose/Objective: Despite efforts to standardize radiotherapy delineation names, institution- and language-specific variations still present significant challenges for inter-institutional and international collaboration, data sharing and automatised quality assurance procedures. Manual structure renaming is time-consuming, especially for large datasets used in deep learning applications. This study investigates open-source/open-weight large language models (LLMs) and their advanced capabilities in multilingual natural language processing to harmonize structure names according to the AAPM TG263 guidelines [1]. Material/Methods: The study sourced structure set data from two Dutch and one German university medical centers, encompassing 216 patients across three anatomical regions (head and neck, thorax and abdomen), collected as part of an upcoming deep learning challenge. The collected structure sets did not follow the TG263 guideline, and structures were defined in English, German, or Dutch. A local instance of the open-weights Llama 3.1 Instruct model (Meta AI, USA) with 70.6 billion parameters was used [2]. To limit GPU memory usage, model weights with q4_0 quantization were used on an Nvidia RTXA6000 48GB VRAM GPU. The LLM was automatically prompted for each structure individually, with a list of the TG263 guideline names for the respective anatomical regions, general non-center specific rules for renaming (e.g. to make the LLM aware of various languages), and a few examples of the desired output format. Structures not part of the TG263 guideline were filtered out, e.g. derived structures, combined or expanded OARs, immobilization devices, etc., resulting in 2033 structures being renamed and evaluated. Medical physicists reviewed the renamed structures and compared them to the TG263 guideline. Accuracy was calculated as the number of correctly renamed structures divided by the total number of renamed structures. Results: The overall accuracy of the LLM in renaming structures was 92.6% (1883/2033). Renaming a single structure on the used hardware required 1.2s. The lowest accuracy, 61.5%, was observed in the Thorax dataset of Center C (Table 1), caused by a low number of structures (52) and consistently incorrect interpretation of the “External” structure, abbreviated as “Ex” in the original structure set. Table 2 provides some examples of structures that were either correctly or incorrectly renamed, highlighting challenging cases.
Made with FlippingBook Ebook Creator