ESTRO 2022 - Abstract Book

S37

Abstract book

ESTRO 2022

Poster Discussion: 01: Image processing & analysis

PD-0064 Multicenter comparison of measures for quantitative evaluation of automatic contouring

E. Brunenberg 1 , J. Derks van de Ven 1 , M.J. Gooding 2 , D. Boukerroui 2 , Y. Gan 3 , E. Henderson 4 , G.C. Sharp 5 , F. Vaassen 6 , E. Vasquez Osorio 4 , J. Yang 7 , R. Monshouwer 1 1 Radboudumc, Radiation Oncology, Nijmegen, The Netherlands; 2 Mirada Medical Ltd, Science, Oxford, United Kingdom; 3 University of Groningen, University Medical Center Groningen, Radiation Oncology, Groningen, The Netherlands; 4 University of Manchester, Division of Cancer Studies, School of Medical Sciences, Faculty of Biology, Medicine and Health, Manchester, United Kingdom; 5 Massachusetts General Hospital, Harvard Medical School, Radiation Oncology, Boston, USA; 6 Maastricht University Medical Centre, Department of Radiation Oncology (MAASTRO), GROW - School for Oncology and Developmental Biology, Maastricht, The Netherlands; 7 The University of Texas MD Anderson Cancer Center, Radiation Physics, Houston, USA Purpose or Objective Automatic contouring performance can be evaluated quantitatively using geometric measures. Overlap measures such as Dice Similarity Coefficient (DSC) are computationally straightforward and provide coherent test results. However, definition and implementation of distance measures like the Hausdorff distance (HD) greatly influence results and hinder comparison between multiple centers [1]. To assess this, we have performed a multicenter benchmark study using both synthetic and real data. Materials and Methods In our survey, contributors first had to list which measures they use in their contour evaluation pipeline, including definitions, implementation methods and (if applicable) source. In addition, they were asked to process two datasets. The first set contained synthetic shapes (squares, spheres, octahedrons), with different size, position and control point spacing between reference and test contours. The second set consisted of publicly available clinical CT data with contouring ground truth and test contours [2]. The resolution of both datasets was 0.977 mm in-plane, with 2 mm slices. Results The survey was filled out by 7 institutes, using 8 different implementations for DSC, 10 for maximum Hausdorff distance (HD100), 9 for 95th percentile Hausdorff distance (HD95), 12 for AD, 4 for surface DSC, and 3 for added path length (APL). Figure 1 shows the variation of contributions with respect to implementation choices for dimensionality and model. Because most DSC results corresponded well, and for AD, surface DSC and APL, the definitions already varied widely, we focused on HD results. As can be seen in Figure 2, variation in results is large. Most of the outliers of HD100 (Figure 2.i and 2.iii) resulted from a mesh-based implementation which used a normal vector to measure the distance. For synthetic shapes B and C, the control point spacing was different between reference and test contours. The deviating measurements around 40-50 mm (Figure 2.i and 2.ii) were due to implementations without interpolation between test contour points. Conclusion While for HD, some differences in definition and implementation between institutes might be expected, this study highlighted the magnitude of the variation. Future work should focus on accuracy in order to develop a public benchmarking dataset, which can be used to optimize agreement on definition and implementation of contouring evaluation measures. Because HD100 is more sensitive to outliers than HD95, differences in implementation will be amplified in HD100 results. It is therefore advisable to (also) use HD95. When implementing an evaluation pipeline, the definition and implementation of the used measures should be considered carefully, and the pipeline should be validated with synthetic data.

[1] Yang, Sharp & Gooding. Auto-Segmentation for Radiation Oncology. 2021. https://doi.org/10.1201/9780429323782

[2] TCIA Lung CT Segmentation Challenge 2017. https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017

Made with FlippingBook Digital Publishing Software