ESTRO 2020 Abstract Book

S861 ESTRO 2020

PO-1586 Prediction of late xerostomia with clinical, atlas based and deep learning contours L.V. Van Dijk 1,2 , C.D. Fuller 1 , C.S. Mayo 3 , S.Y. Lai 4 , A.S.R. Mohamed 1 , K.A. Hutcheson 4 1 MD Anderson Cancer Center, Radiation Oncology, Houston, USA ; 2 University Medical Center Groningen, Radiation Oncology, Groningen, The Netherlands ; 3 University of Michigan, Radiation Oncology, Ann Harbor, USA ; 4 MD Anderson Cancer Center, Head and Neck Surgery, Houston, USA Purpose or Objective Prediction of radiation-induced toxicities based big data is important in order to guide treatment, for example to select patients for proton therapy or guide dose optimization. Large datasets with adequate organ at risk (OAR) delineations are crucial for developing and validating toxicity prediction models. Unfortunately, large curated datasets are often not readily available, since delineated OARs are regularly missing or of inadequate quality. Multiple head and neck OAR auto-segmentation have been published in recent years, making dosimetric analysis on larger datasets feasible. Nevertheless, investigating the effect of auto-segmentation on model development and performance remains to be investigated. Our hypothesis is that toxicity prediction performance is similar using automated segmentation compared to clinical used contours. The purpose is to test the robustness of auto-segmented parotid gland contours for predicting moderate-to-severe radiation-induced xerostomia 12 months after radiotherapy (Xer 12m ). Material and Methods Clinically available, atlas-based (AB) and deep learning (DL) based auto-contours were obtained for 172 head and neck cancer. No atlas patients were included in this cohort; deep learning training was performed on patients from another institute. Mean dose was extracted from all contralateral parotid gland clinical and auto-contours. Patient-rated baseline xerostomia scores and 12 months following treatment were prospectively collected as part as the MDASI follow-up program. Mean dose differences were tested with paired Samples Wilcoxon Test. Logistic regression models with the contralateral parotid gland mean dose and baseline xerostomia scores were fitted for doses for clinical, AB and DL auto-contours. In addition, external validation of a published Xer 12m model, developed in a different institute, was performed with mean parotid dose from the different contours and baseline xerostomia scores (Table 1). Results Fifty-one (30%) patients developed moderate-to-severe Xer 12m (MDASI>4). Mean dose values of the clinical contours (16.2 ±6.8 Gy) were significantly different with for DL contours (17.3 ±7.8 Gy; p<0.001), but not for AB contours (16.4 ±7.3 Gy; p=0.46). Coefficients of the Xer 12m models fitted to the dose data from different contours were nearly identical with slight changes in the intercept (see Table 1). Performance of the both auto-contoured based models was slightly better than the one based on the clinically used contours (e.g. lower BIC, higher AUC and explained variance R 2 ). In addition, validation of Xer 12m model that was previously published in literature showed similar performance and coefficients. Conclusion Coefficients and prediction performance of Xer 12m with both AB and DL auto-contouring were comparable to clinical contours, as well as the previously published models from a different institute. This work shows the potential utility of auto-segmentation methods to contour OARs for large number of patients in developing and validating toxicity prediction models.

Results Of the 60 patients, 27% reported xerostomia (grade>1) at 12 months. The highest median mAUC train / mAUC test were 0.93/0.83, 0.90/0.79 and 0.93/0.80 for the SL, DL and WP, respectively. As shown in Figure 2, the 50 combinations of predictors that were selected for the WP gland yielded a significantly higher median mAUC test compared to the DL (p<0.01). The selected combinations of the SL yielded higher median mAUC test compared to the WP (p<0.01).

Conclusion The slopes of the IMBs extracted from the DL (from fraction 1-20) have a lower predictive power compared to those extracted from the WP gland. Despite the smaller number of voxels considered when analyzing the SL, the IMBs extracted from the SL have a higher predictive power compared to those from the WP. However, this requires further validation on a larger cohort and different selected periods. To obtain a greater understanding of the differences in predictive power of each of these subregions requires further information on the underlying biological processes.

Made with FlippingBook - professional solution for displaying marketing and sales documents online