ESTRO 2024 - Abstract Book

S4969

Physics - Radiomics, functional and biological imaging and outcome prediction

ESTRO 2024

interobserver contour variation, all of the ICC values calculated from all observers against the STAPLE contour for the dataset needed to be above the pre-determined ICC limits. Therefore, in the 6-observer dataset, if one of the observer's ICC values was <0.75, the feature was determined to have poor reproducibility.

Results:

The bladder had the least interobserver contour variation for all observer datasets with mean DSC values ranging from 0.96-0.97 and mean MASD between 0.40-0.58 mm, followed by the uterus (mean DSC: 0.88-0.90, MASD 1.02 1.23 mm). The range of mean DSC and MASD values for the rectum and GTV were 0.84-0.91 and 1.01-1.98 mm, and 0.86-0.87 and 1.40-1.64 mm, respectively. A comparison between the different numbers of observers involved in an interobserver contour variation study and the impact on radiomic feature reproducibility can be seen in figure 1. Overall reproducibility of the radiomic features decreased with an increase in number of observers included in the dataset. In the majority of observer datasets, more texture-based features had excellent reproducibility, whereas more shape-based features had poor reproducibility for all datasets. The reproducibility of radiomic features extracted from contours of different volumes can be observed in figure 2. The bladder had the highest number of features with excellent reproducibility, followed by the uterus, GTV and rectum. Intensity-based features had fewer features with excellent reproducibility for all volumes except the uterus.

Conclusion:

When changing the number of observer contours from 3 to 4, and from 4 to 5, approximately 20% and 30% of radiomic features were deemed to no longer have excellent reproducibility, respectively. The decrease in feature reproducibility with an increase in the number of observers in the dataset is likely due to the increase in sample size of the contours providing a better representation of the population of potential and reasonable contours. This suggests that features with excellent reproducibility for the smaller observer datasets but not the larger datasets, were only reproducible over a subset of the population of contours. As more observer contours were included, the greater the variation in contours and hence, less consistent overlap between observer contours which impacted the

Made with FlippingBook - Online Brochure Maker