ESTRO 2024 - Abstract Book
S3155
Physics - Autosegmentation
ESTRO 2024
Deep learning (DL) model performance for automated annotation in radiotherapy planning depends on the quality and size of the training dataset. Use of clinical annotations for training, instead of making special high-quality contours, is ideal from a time-saving perspective. However, despite consensus guidelines, large inter- and intra-rater variabilities exist among experts, due to variation in guidelines, work pressure and human errors sometimes caused by image quality aspects such as high noise, low contrast and presence of artifacts. Moreover, required quality of contours of organs-at-risk located further away from the primary target may be lower in clinic than when used as training example. Hence, data typically used to train models, contains noisy annotations. We aim to better understand the relation between imperfect contours in training and evaluation data, dataset size and model performance. Additionally, we investigate a simple auto-curation method when training networks for parotid gland segmentation.
Material/Methods:
1925 clinical parotid gland annotations and 3D CT scans were used to train 3D U-Nets with 5-fold cross-validation to obtain DL contours. Three experiment series were then conducted. Experiment A estimates the impact of annotation noise on performance by simulating the effects of random and systematic errors from multiple raters. 160 (100/20/40 train/val/test) cases were randomly selected from the highest-scoring Dice similarity coefficient (DSC) quartile to minimize confounders from annotation noise. Models were trained where increasing fractions of the training annotations were actively perturbed (unidirectional dilation or erosion) either systematically (same direction) or randomly (varying directions), and evaluated on unperturbed data. Experiment B applies and evaluates a simple auto curation method during training on various sample distributions. Three test sets are established to estimate 1) apparent performance (N=125; from training distribution), 2) true performance (N=50; from training distribution but manually checked for quality) and 3) external performance (N=96; AAPM open-source MICCAI 2015 segmentation challenge) 1 . Our curation step was done by evaluating the training set when 1/4th of training has passed. Then, the fraction of lowest-scoring DSC patients was removed from the training set before restarting training from scratch. Experiment C repeats this, using the entire clinical cohort for training. Experiments A and B were iterated 5 and 10 times, respectively, to account for model variability arising from patient selection and learning stochasticity. All U Nets were optimized using DSC-loss and used on-the-fly augmentation using random rotation, flipping, zooming and intensity shifts. The lowest apparent loss model was saved and evaluated. Bonferroni-corrected Wilcoxon signed-rank tests were used to test statistical significance between DSC and mean surface distance (MSD) in test sets. Post-hoc analyses included correlating curation effect to the sampled patient distribution in experiment B.
Results:
Median[IQR] DSC of the entire clinical cohort was 0.867[0.081] and of corrupted contours was 0.770[0.026]. At 15%- 30% corruption, performance losses were 1.0%-0.8% for systematic and 1.1%-2.8% for random corruptions, respectively (Fig. 1). Averaged over all model iterations, before patient removal, median[IQR] true, apparent, AAPM DSCs were 0.872[0.058], 0.850[0.021], 0.843[0.057] and increased to 0.873[0.054] (p<1e-3), 0.855[0.020] (p<1e-4), 0.851[0.058] (p<1e-4) at R=15% (Fig. 2). No adequate (R 2 >0.6) correlations were found between curation effect and the sampled distribution properties. In clinical data, before patient removal, true, apparent, AAPM DSCs were 0.892[0.052], 0.872[0.055], 0.858[0.047] and, at R=10%, increased to 0.896[0.054] (p=0,22; not significant (NS)), 0.879[0.062] (p=.001), 0.872[0.044] (p<1e-10), and MSD decreased by 0.058(p=0.23; NS ), 0.042(p=0.013; NS ), 0.073(p<1e-6).
Made with FlippingBook - Online Brochure Maker