ESTRO38 Congress Report

3. ESTRO-VARIAN AWARD

Distributed learning on 20 000+ lung cancer patients (E38-0799) Timo M. Deista 3 , Frank J.W.M. Dankers 1,2,3

1 Maastricht University, Maastricht, The Netherlands, 2 Radboudumc, Nijmegen, The Netherlands, 3 MAASTRO clinic, Maastricht, The Netherlands

Context of the study Distributed learning is a machine learning/prediction modelling methodology which allows analysing data that is located in multiple institutes, but without individual patient data leaving the institute. The technical challenge of distributed learning is to formulate algorithms in such a way that they produce identical results as if the data were centralized in one location as in conventional data analysis. Since the researcher does not see the individual patient data, and the data does not leave its institute, this technology is privacy-preserving by design. For distributed learning to work effectively, a secure infrastructure and standardized data representations are indispensable. Overview of abstract The promise of distributed learning in radiotherapy, when it was conceived nearly a decade ago, was to enable machine learning research on large patient cohorts spread across institutes and continents. With this study, we show that distributed learning meets its promise and the technology has matured enough to be applied quickly, globally, and on sizeable data volumes. In only four months, we executed ‘The 20k challenge' (www.clinicaltrials.gov/ct2/show/ NCT03564457) across eight international oncology institutes from China, England, Italy, The Netherlands, and Wales. What were the three main findings of your research? 1. We established data stations containing data of more than 23 000 patients across eight oncology institutes adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles1 allowing privacy-preservingmachine learning. 2. We trained a logistic regression model to predict two- year survival based on cancer staging and survival data of 14 810 non-small cell lung cancer (NSCLC) patients diagnosed between 1978 and 2011. Themodel is validated on 8 393 patients diagnosed between 2012 and 2015. 3. We showed that, with a team of researchers dedicated to a common scientific goal, international distributed machine learning studies can be executed in only a few months (four months in this case). What impact could your research have? With this study, we show that distributed learning technology can be deployed and a research question can be answered in a short time frame (<5 months), paving the way for rapid learning healthcare in radiation oncology. Outcome prediction models trained on patient data available in the distributed learning infrastructure can be regularly updated with new data becoming available. In this way, these models will take into account changes in treatment guidelines and technology.

Future work will be focused on extensions of the distributed learning approach to image (radiomics) and genomic patient data analyses. Is this research indicative of a bigger trend in oncology? We see a strong trend in medical sciences and radiation oncology towards more extensive data analysis (AI), which will force institutes to systematically collaborate to meet these increased data demands. For data driven medical research to succeed and also be reproducible, we need sustainable data repositories (FAIR1). On the other hand, there is a strong movement towards stricter privacy regulations to protect individuals and their personal data. Distributed learning and the methodologies presented in this study will allow the medical community to realize its data driven ambitions, but at the same time meet privacy regulations.

References 1Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data (2016). doi:10.1038/ sdata.2016.18

AWARDS | Congress report

50

Made with FlippingBook Online newsletter