ESTRO 2024 - Abstract Book

S3035

Physics - Autosegmentation

ESTRO 2024

Model uncertainty was estimated using the Monte Carlo (MC) dropout approach (Gal and Ghahramani, 2016; Roy et al., 2019). The U-net model contains multiple convolutional blocks to extract image features. During MC dropout inference, 4% of the neurons from the output of the convolutional blocks were randomly removed, thus creating a slightly modified version of the original DL model. Repeating the MC inference several times generated several slightly different versions of the original model that can potentially differ in their segmentation prediction, particularly if the model is not robust (i.e. uncertain of its prediction). It has been shown that this MC approach is similar to Bayesian sampling (Damianou and Lawrence, 2013; Gal and Ghahramani, 2016; Roy et al., 2019), thus making it appropriate for uncertainty estimation. To create the uncertainty heatmap which is superimposed onto the images, we combined 15 MC inferences using ensemble averaging, then calculated the entropy E = -x log (x) of each voxel x in the ensembled images. The resulting entropy maps acted as the uncertainty heatmap in which voxels with higher entropy values indicate higher model uncertainty at these specific voxels. Note that the entropy values range from 0 to 0.37 = 1/e. In addition, the entropy values within the correctly and incorrectly predicted areas within the images were used to quantify the performance of the MC uncertainty estimation approach.

Results:

The U-net model with the highest performance obtained median DSC scores of 0.77 and 0.73 on the internal and external test sets, respectively.

Figure 1 shows examples of uncertainty maps. Generally, the model had a high uncertainty (high entropy) in voxels near the edges of the predicted contours. In most cases, areas with high entropy indicated potentially False Positive or False Negative voxels (Figure 1). In addition, Figure 2 shows that incorrectly classified voxels (False Positives and False Negatives) had on average three times higher entropy values than correctly classified voxels (True Positives).

Made with FlippingBook - Online Brochure Maker