ESTRO 2024 - Abstract Book
S3069
Physics - Autosegmentation
ESTRO 2024
1 Liverpool & Macarthur Cancer Therapy CentresMac, Radiation Oncology, Sydney, Australia. 2 Ingham Institute for Applied Medical Research, Medical Physics, Sydney, Australia. 3 University of New South Wales, School of Clinical Medicine, Sydney, Australia. 4 University of Sydney, School of Physics, Institute of Medical Physics, Sydney, Australia
Purpose/Objective:
Manual delineation of breast nodal clinical volumes (bCTVn) is time consuming and prone to inter- and intra-observer variability [1]. The Deep Learning (DL) nnUNet framework has shown versatility to adapt to a range of clinical sites and imaging modalities [2]. The popularity of DL segmentation led to the development of this tool within numerous commercial systems. Review of these contours in clinical practice can be challenging particularly in an online setting. This work evaluates the accuracy of bCTVn contours segmentation from a commercial system DL model and from a nnUnet model with comparison to manually delineated contours. It also assesses the feasibility of using a nnUNet DL model as an independent contour audit quality assurance (QA).
Material/Methods:
The auto-segmentation accuracy of six bCTVn contours including axillary Level 1-4 lymph nodes (CTVnL1-4), internal mammary chain lymph nodes (CTVn_IMC) and inter pectoral lymph nodes (CTVn_INTERPEC) for left sided breast patients from four DL auto-segmentation models: three multi-label prediction models (2D, 3D with low resolution (3D_L) and 3D with high resolution (3D_H) trained on a separate local dataset using nnUNet network and a pre-trained model from RayStation (RS) ver.2023B (RaySearch laboratories, AB, Sweden) were assessed. An in-house PyDicer tool [3] was used to fetch DICOM for 37 CT datasets with manually delineated contours for the six bCTV contours (assessed by second RO as a part of clinical audit). Pre-processing and preparation of data followed by nnUNet network training is integrated within the PyDicer tool. Using Built-in hyperparameter optimisation with nnUNet in this framework, the need for network adaption and hyper-parameter tuning is removed. The nnUNet model was trained separately for CTVn_IMC and CTVnL1,3,4 and then for CTVn-L2 and CTVn_INTERPEC to avoid overlapping structures. Accuracy was evaluated on 13 additional test datasets, comparing automatic segmentation from each DL model with manual contours as well as the nnUNet segmentation with the RSDL segmentation using Dice Similarity Coefficient (DSC) and 95th Percentile of Hausdorff Distance (HD95) metrics. Qualitative scores as outlined by Abadi et al [4] were assigned to the results, 1 (no modification required, accurate), 2-3 (minor edit required, offers clinical benefit) and 4-5 (major edit required, offer no clinical benefit) by two Radiation Oncologists (RO). Pearson’s correlation was evaluated between clinical grading for RSDL auto-contours from the test cohort with DSC (best performing nnUNet Vs RSDL) to assess utility as an independent contour audit tool. In the independent test cohort, the mean (± standard deviation) DSC of DL automatic segmentation was 0.68±0.15, 0.70±0.12 0.72 ± 0.12, and 0.61±0.15 while the mean HD95 score was 10.0±7.9 , 9.7±7.5, 9.6±7.9, and 12.6±11.1 (Table 1) for 2D, 3D_L, 3D_H and RSDL models respectively and manual delineation across all six bCTVn auto-segmented contours. RO qualitative clinical grading showed DL auto-contours were scored as clinically acceptable, with or without corrections in ≥ 93% (clinical grading ≤3) and 54.8 % (clinical grading ≤2) for 3D_H models. RSDL models had the poorest performance with clinical acceptability ranging from 53%-96% (clinical grading ≤3) and 35.9% (clinical grading ≤2). Moreover, the RO qualitative clinical grading score for auto-contours as clinically unacceptable was least for 3D_H models (3.2% ± 3.3%), 3D_L (4.5% ± 3.6%) while 2D (15.6% ± 7.2%) and RSDL (18.6% ± 16.2%) models had the highest proportion of clinically unacceptable across all six bCTVn. Figure 1 displays the correlation between the clinical grading for RSDL and DSC score between RSDL and the best performing nnUNet model (3D_H) for CTVn_L4 and CTVn_INTERPEC auto-contours (P<0.05) while the other four bCTVn did not any show any correlation. Results:
Made with FlippingBook - Online Brochure Maker