ESTRO 2024 - Abstract Book
S822
Clinical - CNS
ESTRO 2024
Manual definition of target volumes and organs at risk (OAR) for radiotherapy planning is a very time consuming task. Automatic segmentation (AS) models could be the solution to make this task easier. Unfortunately, the current state of the art deep learning-based models have issues with robustness. Therefore, it is important to comprehensively evaluate them before clinical use. Ideally, such evaluation is standardized to form a benchmark for other AS methods.
Material/Methods:
A multi-facetted clinical evaluation was performed on an in-house developed AS method for intracranial organs at risk. First a geometrical similarity comparison was performed on the AS results versus the ground truth. The geometric similarity was analyzed in the dice similarity coefficient (DSC) and the Hausdorff distance (HDD). Second, a qualitative evaluation with nine radiotherapy experts took place. Clinical experts rated the results on a 4-point Likert-scale of clinical acceptance. Furthermore, the evaluators adjusted the AS results to be clinically acceptable. The time for the evaluation and adjustment was recorded as well as the adjusted surface fraction. Lastly, a dosimetric evaluation was performed between plans on the reference and AS contours.
Results:
The overall average DSC was 0.78 ± 0.08 and the HDD was 3.68 ± 5.17 mm. The evaluators rated in total 605 auto segmented structures of which 38.7% was defined as acceptable, 49.3% as requiring minor adjustments, 9.9% as major adjustments and 2.1% as unacceptable. The average time for evaluation and adjustments was 22 minutes and 6 seconds per case. On average, a total of 3.52% of the surface contours surface ratio was adjusted by more than 1%. There was no direct correlation between the ratings, the amount of time and adjusted surface ratio. The dosimetrical evaluation showed similar dose coverage to the target. The overall average difference of the mean dose was and maximum dose to all OARs was 0.30 (± 2.36) and 0.23 (± 2.75) Gy respectively.
Conclusion:
The multi-facetted evaluation of the AS model provided a comprehensive overview of its quality. Geometrical comparison provides a good benchmark to inter-rater variability and other available models. The quantitative evaluation, although prone to subjectivity, gives insights in the clinical usefulness. The average time for evaluation and adjustments were 3.15 times faster as the standard manual contouring process. The dosimetric evaluation is important in verifying the possible clinical impact. However, clear benchmarks for decision making on dosimetry are required.
Keywords: auto-segmentation, intracranial, deep learning
1085
Digital Poster
Pencil-beam scanning proton therapy for low-grade glioma patients
Made with FlippingBook - Online Brochure Maker