ESTRO 2020 Abstract Book

S343 ESTRO 2020

reliable approach to automated organ-at-risk (OAR) contour QA. Material and Methods DL models were trained to generate contours for the parotid (PG) and submandibular glands (SMG), using 1418/1152 non curated clinical training cases. Two metrics, Sørensen–Dice coefficient (SDC) and Hausdorff distance (HD), were used to assess the agreement between the DL and clinical contours. The approach was tested on 62 patients from the EORTC-1219-DAHANCA-29 clinical trial. To test the DL models’ ability to detect sub-optimal contours, 3 types of systematic errors (expansion, contraction and displacement) were gradually applied to all clinical contours (which for this purpose was considered the ground-truth) and the effect on SDC and HD was evaluated. The agreement between the DL and clinical OAR contours was then evaluated using a threshold for SDC of the average for all clinical cases-1 standard deviation (SD), and for HD of the average+1SD. All contours in the original data highlighted as potentially sub-optimal were visually inspected by a Radiation Oncologist and Medical Physicist. A sample of the non-highlighted contours (same size as number of highlighted contours) was inspected for false-negatives. Results The DL model performed similarly on our in-house training cases (using cross validation) and trial data: SDC for PG/SMG were 0.84/0.85 for both data sets. Increasing the magnitude of all 3 types of deliberate error resulted in progressively severe deterioration/increase in the test- set’s average SDC/HD. Out of 124 clinical PG contours, 19 were highlighted as potentially sub-optimal contours based on either SDC, HD or both. After visual inspection, 5 of these 19 (26%) clinical contours were deemed sub-optimal versus 2 out of 19 non-highlighted contours (11%). Out of 69 clinical SMG contours, 15 were highlighted based either SDC, HD or both. After visual inspection, 7 of these 15 (47%) were deemed sub-optimal versus 2 out of 15 non- highlighted contours (13%). Figure 1 shows a scatter plot with the thresholds for both metrics (sub-optimal contour=red cross). For 9/14 PG and 6/8 SMG contours where quality was deemed clinically acceptable, clear causes for low agreement were found: OAR deformation (e.g. secondary to displacement by tumor), deviating CT slice thickness, missing clinical contour or missing PG anterior lobe in the DL contour.

Conclusion AutoConfidence can identify data outliers and low- confidence prediction regions of DL predictions, independent of the production network, enabling automated per-patient validation of 'black box' methods. Regions requiring human intervention can be highlighted for review, increasing clinical confidence and facilitating highly efficient automated workflows for (e.g.) online adaptive re-planning. PH-0607 Investigating the potential of deep learning for quality assurance of organ-at-risk contours H. Nijhuis 1 , W. Van Rooij 1 , V. Gregoire 2 , J. Overgaard 3 , B. Slotman 1 , W. Verbakel 1 , M. Dahele 1 1 Amsterdam UMC, Radiotherapy, Amsterdam, The Netherlands ; 2 Université Catholique de Louvain, Radiation Oncology, Brussels, Belgium ; 3 Aarhus University Hospital, Department of Experimental Clinical Oncology, Aarhus, Denmark Purpose or Objective Quality assurance (QA) of radiotherapy contours is time and labor intensive and subject to inter-observer- variability. Automated QA could be far more efficient. We investigated whether deep-learning (DL) is a feasible and

Made with FlippingBook - professional solution for displaying marketing and sales documents online