ESTRO 2024 - Abstract Book

S4436

Physics - Machine learning models and clinical applications

ESTRO 2024

right femoral heads, PTV and body. Five equispaced beam angles were considered. Fluence Map Optimization (FMO) was performed using a quadratic optimization model that minimizes a weighted sum of squared deviations, with all the needed parameters (weights and bounds) being automatically defined by a fuzzy inference system as described in [1]. This automated FMO approach works in an iterative matter, changing the optimization model parameters until converging to a plan that complies with the medical prescription. RL, namely Q-learning, was used to dynamically determine the set of rules that should constitute the fuzzy inference system in each step of the automated iterative FMO approach. RL requires learning what to do (how to connect situations to actions) in order to maximize a numerical reward signal. It does not require the existence of a ground truth used for training. It works like an agent-based system where the agent learns by performing an action and receiving a feedback (reward) that the agent will take into consideration in its next decision making step. The agent does not know what the optimal decision should be, but it will learn how to react in different possible states of the system in which it operates. Q-learning consists of an agent learning an optimal policy in a Markov decision process without having any prior knowledge of the environment. The relationship between possible environmental states and actions is represented by a Q-table. This table is updated every time the agent, being in a given and known state, chooses an action. This action will trigger a reward that is used to update the Q table by using an equation known as Bellman equation [2]. In the present work five states are considered for each structure of interest, defined by the comparison between the dose metrics achieved in the current treatment plan (current iteration of the automated FMO procedure) and the desired dose defined by the medical prescription. A Q-table is built for each structure of interest. The different actions consider different possible fuzzy rules to be used within the fuzzy inference system and that will rule the corresponding changes in the quadratic optimization model parameters. These changes can be more or less pronounced depending on the current state of the structure (on how far the structure is from what has been determined by the medical prescription). As most structures of interest are correlated, meaning that improving one may worsen another, the reward function considered for updating each of the existing Q-tables also takes into account what has happened with all the other structures. The Q-tables are built during the learning phase by considering a set of cases and applying the fuzzy inference automated FMO approach having the agent choosing the fuzzy rules in a random way. The procedure is prevented from early converging by imposing random perturbations to the model parameters, increasing the exploration and diversification of the learning process. After the learning process, the Q-tables are considered as fixed and used directly, as lookup tables, to decide, for each state, what is the best action to consider (the one that leads to the maximization of the reward).

Results:

The approach described in [1] had already proven to be able to obtain high quality treatment plans. With the inclusion of Q-learning, fuzzy rules are dynamically changed as the algorithm progresses, instead of being fixed. This has led to a decrease in the total number of iterations needed to reach a treatment plan complying with the medical prescription. Average values considering Cross Validation show a reduction in the total number of iterations ranging from 50% to 63%.

Conclusion:

Made with FlippingBook - Online Brochure Maker