ESTRO 2023 - Abstract Book

First page Table of contents Previous page 145 Next page Last page

S131

Saturday 13 May

ESTRO 2023

Several ML methods were considered: Logistic Regression (LR), Lasso, Elastic Net, k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machines (SVM), Gaussian Naive Bayes (GNB) and Multi Layer Perceptron (MLP). First, we standardized the training dataset using the Robust Scaling, limiting the impact of outliers. The test dataset was consequently scaled accordingly. A One Hot Encoder on categorical features was applied and variables with high p-value at univariate LR were excluded; also features with Spearman ρ >0.8 were dropped. The Synthetic Minority Oversampling Technique was applied to create synthesized data of the minority to compensate class imbalance. Finally, we run each model by applying a Bayesian search to maximize the chosen metrics (balanced accuracy and AUC) on a stratified K-Folds cross-validator sample. For each model, we derived the feature importance and complementary metric scoring for both training and test dataset. A Sequential Feature Selector was applied; we chose a parsimonious feature number for which AUC falls inside the minimum between 1% of the max metric value and its SD error. Procedures were implemented in Python v.3.7.9. Results As shown in Fig1, model’s performance were compared on training-test dataset over different metrics. The models are in ascending order starting from the one with less training-test discrepancy. Overall, the best models found were MLP and GNB. LR, more often used in literature, showed similar performance (as shown in Fig2, including features importance). KNN, RF and SVC tend instead to overfit. The AUC of the test data is overall slightly above 0.6, f1 score on patients with toxicity is around 0.25, while it depends on model for patients without toxicity. MLP and GNB have best brier score, while LR is worse and more discrepant in test compared to training. The test slope derived from the calibration plot is low compared to the one expected and its R2 is quite different between training and test.

Made with FlippingBook flipbook maker