ESTRO 2021 Abstract Book
S1397
ESTRO 2021
PO-1673 Improving data collection for deep-learning auto-segmentation models D. McSweeney 1 , P.A. Bromiley 2 , M. van Herk 1 , A. Green 1 , A. McWilliam 1 1 University of Manchester, Division of Cancer Sciences, Manchester, United Kingdom; 2 University of Manchester, Division of Informatics, Imaging and Data Sciences, Manchester, United Kingdom Purpose or Objective Image segmentation is a necessary precursor to extracting many imaging biomarkers, with deep learning approaches currently dominating the field. The accuracy of these algorithms is largely dictated by training data quality. We propose an approach for assessing observer bias, facilitating data quality control, with the aim of improving model performance. Segmentation of skeletal muscle in CT images, for the evaluation of sarcopenia, is used as an exemplar to investigate the relationship between model performance and training data accuracy. Models trained on expert data were compared to models trained on multiple, non-expert contours. Materials and Methods We adapt STAPLE to combine multiple, non-expert manual segmentations. STAPLE calculates an optimal probabilistic combination by weighting each segmentation with an estimated performance. Ten radiographers (split into two groups) and one expert contoured the skeletal muscle compartment at the L3 vertebral level on 40 oesophago-gastric cancer PET-CT scans. Quality assurance was performed by assuming a constant sensitivity and specificity for each observer, allowing STAPLE to identify lower-performing observers. Auto-segmentation models were then trained on training sets of varying size and content: 1) expert segmentations; 2) observer segmentations; 3) observer segmentations excluding biased observers. Model accuracy was determined by measuring the Dice similarity coefficient (DSC) between model predictions and expert segmentations on 8 test images. DSC between expert delineations and the STAPLE estimate of the unbiased observers was used as an approximation for the maximum achievable performance. Results Our approach can detect observers who consistently over- or under-segment the skeletal muscle compartment (Fig 1 ) . For this exemplar, deep-learning models were found to be robust to the presence of biased observers in the training set (Fig 2). No statistically significant difference in performance was found between models trained on expert data and those trained on observer data. We saw no significant difference in performance when reducing the number of training images (20 ➡ 5) while also increasing the number of observer delineations (1 ➡ 4).
Made with FlippingBook Learn more on our blog