ESTRO 2024 - Abstract Book

S1811

Clinical - Lung

ESTRO 2024

1 Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo, 200-00128, Radiation Oncology, Rome, Italy. 2 Università Campus Bio-Medico di Roma, via Alvaro del Portillo, 21-00128, Radiation Oncology, Rome, Italy. 3 Università Campus Bio-Medico di Roma, via Alvaro del Portillo, 21-00128, Radiation Oncology, Roma, Italy. 4 Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo, 200-00128, Radiation Oncology, Roma, Italy. 5 Università Campus Bio-Medico di Roma, via Alvaro del Portillo, 21-00128, Unit of Computer Systems and Bioinformatics, Department of Engineering, Rome, Italy. 6 Università Campus Bio-Medico di Roma, via Alvaro del Portillo, 21-00128, Unit of Computer Systems and Bioinformatics, Department of Engineering, Roma, Italy

Purpose/Objective:

Electronic Health Records (EHRs) have become the standard repository for all patients and disease information, supporting oncology research, treatment plan design, and ultimately improving patient outcomes. Named Entity Recognition (NER) has been introduced in order to recognize and classify biomedical entities by means of machine learning and deep learning-based approaches. Despite the extensive use of NER in the biomedical field, there is still very little work aiming to extract information about non-small cell lung cancer (NSCLC) patients, especially from NSCLC clinical texts written in Italian. Hence, in this work, we performed NER on a real world dataset of NSCLC patients’ clinical reports in Italian.

Material/Methods:

We reviewed clinical reports from a dataset consisting of 257 patients diagnosed with stage III and IV NSCLC. In total, we obtained 758 clinical reports, encompassing two main report categories, i.e., oncological and radiotherapy visits. These reports were collected before the start of each patient’s therapy and they included personal data, medical history, reason for visit, histology, imaging reports, physical examinations, preliminary diagnosis, prescriptions and advice, conclusions, and follow-up details. The population was enrolled under two different approvals of the Ethical Committee (30 October 30, 2012, ClinicalTrials.gov Identifier NCT03583723; 16 April 2019 Identifier 16/19 OSS). Written informed consent was obtained from all patients. The proposed approach, which consisted of three steps: a) corpus generation; b) model training; c) model validation is showed in figure 1. We used the Italian biomedical checkpoint called MedBITR3+ and we compared it to two other state-of-the-art BERT-based models: multilingual BERT (mBERT), which was pre-trained on the top 104 languages (including Italian) using Wikipedia, and umBERTo, which was pre-trained on Commoncrawl ITA using OSCAR (Open Superlarge Crawled ALMAnaCH coRpus) Italian large corpus2, both not pretrained on biomedical domain specific knowledge.

Results:

Results of mBERT, UmBERTo and MedBITR3+ are shown in figure 2 for each entity type, whereas the lower panel presents the average performance with the standard deviation (std). The highest scores are highlighted in bold.

Made with FlippingBook - Online Brochure Maker