ESTRO 2020 Abstract book
S199 ESTRO 2020
= 0.65, IQR =0.28) and H95th= 12.9 mm on the test dataset. On the validation dataset the average DSC was 0.81 with a median of 0.88 (IQR = 0.12), the average J was 0.71, with a median of 0.78 (IQR = 0.19), and the H95th was 6 mm. For the qualitative assessment, 56% of all participants (IQR=11%) preferred automatic segmentation to the experts contour. Among the groups, results were as follows: computer scientist group score = 52% (IQR=10%), medical doctors group score = 56% (IQR =9%) and radiologists group score = 60 % (IQR =3%). Preliminary results for the contouring time were 129 and 20 s/patient for semi-automatic manual and fully automatic contouring respectively.
original images, with traditional data augmentation (e.g. flipping) and with domain-specific data augmentation (e.g. changing gland density). The influence of data quality was tested by comparing training/testing on a small set of clinical versus meticulously curated contours. The effect of using several custom cost functions - aimed at increasing the influence of surface errors - was explored. Patient-specific Hounsfield unit (HU) windowing was applied during inference in order to optimize image contrast and lastly, the effect of using model ensembles rather than stand-alone models was analyzed. All beneficial strategies were then combined into a final model. In total, nearly 1400 models were trained to properly (cross-)validate the results. Results A positive effect was observed from increasing the training set size, applying traditional/domain-specific data augmentation, patient-specific HU windowing and from using model ensembles (figure 1; +28/15%, +9/5%, +10/6%,+1/2%, +1/3% respectively for SMG/PG). No effect was observed from using curated rather than clinical data nor from the use of alternative cost functions. The strategies’ effects on performance seemed to diminish when the base model performance was already ‘high’. The effect of combining all beneficial strategies was an increase in SDC avg of 4/3% and a decrease in SDC std of 1/1% compared to a model trained with the maximum set size (figure 2), resulting in an SDC avg of .89/.88 for SMG/PG.
Conclusion Our proposed pipeline can potentially provide a low-cost, observer-independent and reproducible method for the detection and segmentation of lung cancers on CT images, facilitating adaptive re-planning. In addition we extended our pipeline by implementing the RECIST and volumetric RECIST functionality. OC-0346 Strategies to improve deep learning-based salivary gland segmentation. W. Van Rooij 1 , M. Dahele 1 , H. Nijhuis 1 , B. Slotman 1 , W. Verbakel 1 1 Amsterdam UMC, Radiation Oncology, Amsterdam, The Netherlands Purpose or Objective Automated, deep learning-based delineation (DLD) of organs-at-risk (OAR) is a promising method to reduce the time-intensiveness and inter-/intra-observer variability associated with manual delineation. However, performance of DL for OAR segmentation is not always perfect and further improvement is likely to be possible. We systematically evaluated ways to improve the performance and reliability of DL for OAR segmentation, using both simple and more complex strategies, with the submandibular and parotid gland (SMG/PG) as the paradigm. Improving DLD performance is clinically relevant with applications ranging from the initial contouring process, to on-line adaptive radiotherapy. Material and Methods This work consisted of a base model (U-net trained on 90 clinically contoured images) to which variations were applied. The influence of those variations was measured with average Sørensen-Dice coefficient (SDC avg ) and its standard deviation (SDC std ). Various experiments were designed: increasing the amount of training data with
Made with FlippingBook - Online magazine maker