ESTRO 2024 - Abstract Book
S2982
Interdiscplinary - Other
ESTRO 2024
B. Inter- and intra-rater reliability
Inter- and intra-rater agreement was generally poor for the CTV prostate (κ=0.14 and κ=0.34 respectively). Inter- and intra- observer variation in contouring is already known to exist for target structures as we are unable to truly identify ground-truth.
The bladder had poor inter-rater reliability (κ=0.24), but fair intra-rater reliability (κ=0.55). The posterior border of the bladder meets directly with the prostate and therefore may be vulnerable to variation.
There was good inter-rater (κ=0.60) and intra-rater reliability (κ=0.65) for the anorectum, suggesting the scoring system was applied consistently for this structure. This scoring system may therefore have potential when applied to the anorectum or organs at risk not directly in contact with target structures.
Individual intra-rater reliability varied between κ=-0.04 and κ=1.00. Re-calculating inter-and intra-rater reliability statistics for scores reclassified on 2-, 3- and 4-point scales made no difference to the overall agreement.
Conclusion:
A structured method for assessment of auto-contours is recommended when used clinically. This study demonstrates both inter- and intra-rater variation in the assessment of auto-contoured structures which must be considered if using qualitative assessment for a validation method. Despite individual education on the use of our scoring system, there was still generally poor agreement between raters for the prostate and bladder. While this may reflect the system, it also highlights there is significant inter-observer variation in contouring structures and that there is no “ground-truth”. Further training and modification to scoring systems is required to ensure auto contours can be assessed in a standardised way.
Keywords: Auto-contouring, Evaluation, Quality Assurance
References:
[1] Mackay K, Bernstein D, Glocker B, Kamnitsas K, Taylor A. A Review of the Metrics Used to Assess Auto Contouring Systems in Radiotherapy. Clin Oncol (R Coll Radiol). 2023. 10.1016/j.clon.2023.01.016.
[2] Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. 10.3389/fpubh.2018.00149.
Made with FlippingBook - Online Brochure Maker