ESTRO 2024 - Abstract Book

S4425

Physics - Machine learning models and clinical applications

ESTRO 2024

Our evaluations show that merely 22% of all submitted papers contain enough information for the publicly available code to be useful. We have discovered major flaws in all other code repositories hindering their reproducibility. According to the conduced poll of authors and researchers in the field, 96% of them have faced issues while trying to reproduce publicly available methods from peer-reviewed papers. The major flaws are the lack of necessary package dependencies for running the code, code for training and evaluating the model proposed in the corresponding paper, trained model weights, proper documentation explaining the available material, and licensing to ensure the material can be used for what the authors intended. The use of publicly available code is becoming increasingly more popular, however the evaluated repositories show no signs of improvement throughout the years in any aspect. Our findings convinced MIDL to adopt our guidelines as the official guideline for reproducibility at the conference. Similar guidelines have already been introduced at larger venues (NeurIPS, MICCAI) however what we propose has been carefully adjusted to where the previously submitted papers are lacking.

Conclusion:

Exploring the concerns around reproducibility uncovered dismaying results. Our findings show that there is extensive work that needs to be done. Our publicly available reproducibility guidelines (https://www.midl.io/reproducibility) offer help for researchers writing papers, aiming to provide reproducible results for the readers. The scope of this project is not over. Advertising our findings and performing similar evaluations at other venues should help raising the awareness about the reproducibility crisis. Continuously adjusting our set of guidelines to more imminent issues makes it a generally useful online resource.

Keywords: machine learning, reproducibility

References:

[1] D. Sculley, Jasper Snoek, Ali Rahimi, and Alex Wiltschko. Winner’s curse? on pace, progress, and empirical rigor. 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings, pages 1–4, 2018.

[2] John Mongan, Linda Moy, and Charles E. Kahn. Checklist for artificial intelligence in med- ical imaging (claim): A guide for authors and reviewers. Radiology: Artificial Intelligence, 2020.

[3] Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Lariviere, Alina Beygelz- imer, Florence d’Alche Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). Journal of Machine Learning Research, 22:1–20, 2021. [4] Fabian Balsiger, Alain Jungo, Naren Akash R J, Jianan Chen, Ivan Ezhov, Shengnan Liu, Jun Ma, Johannes C. Paetzold, Vishva Saravanan R, Anjany Sekuboyina, Suprosanna Shit, Yannick Suter, Moshood Yekini, Guodong Zeng, and Markus Rempfler. The miccai hackathon on reproducibility, diversity, and selection of papers at the miccai conference. 2021.

Made with FlippingBook - Online Brochure Maker