ESTRO 2025 - Abstract Book
S3381
Physics - Machine learning models and clinical applications
ESTRO 2025
reinforcement-learning framework to automatically generate treatment plans with the flexibility to adjust planning strategy according to clinical priorities.
Material/Methods: A priority encoded deep-reinforcement-learning framework(PEDRL) was developed to self-interact with the clinical TPS and dynamically adjust objective constraints during the inverse planning. The framework was powered by the discrete soft-actor-critic (SAC) algorithm(2), integrated with a multi-layer-perceptron (MLP) architecture, enabling it to effectively encode and navigate the complex trade-offs in HN treatment planning. Modeled to emulate clinical decision-making, the PEDRL agent dynamically assessed intermediate plan states and iteratively adjusted objective constraints, enhancing sparing across multiple OARs while maintaining robust target coverage. By integrating parotid- sparing preferences into the agent’s state space, the agent autonomously tailored its sparing approach, choosing bilateral or unilateral sparing to optimize plan quality in alignment with clinical priorities. The agent was trained through iterative treatment plan generation using a total of 40 HN patients without requiring human supervision. During training, the plan quality change after each action was quantified to shape the reward function, providing feedback to update the network parameters. The trained model was subsequently evaluated on additional 44 patients, comparing its generated plans to those of clinical experts. Key dosimetric endpoints, including brainstem/cord/mandible Dmax, parotids/oral cavity/larynx/pharynx D50, plan max dose, conformity index (CI), homogeneity index (HI), were reported. Wilcoxon Signed-Rank test was performed for statistical comparison. Results: With clinical priorities encoded, the PEDRL-generated plans demonstrated comparable or superior performance across all dosimetric metrics for testing cases (Table 1). For clinical scenarios where sparing option is challenging to decide, PEDRL can autonomously generate both options, providing valuable reference points and supports clinical decision-making by exploring trade-offs without additional costs. In Figure 1, a test patient with unilateral sparing of the left parotid is shown. In addition to generating a plan (PEDRL A) that aligns with the unilateral sparing prescription, the agent was capable of sparing both parotids by adapting the sparing option to bilateral, without compromising target coverage (PEDRL B).
Made with FlippingBook Ebook Creator