ESTRO 2024 - Abstract Book
S4496
Physics - Machine learning models and clinical applications
ESTRO 2024
for the organs-at-risk (e.g. mean heart dose) or relative dose differences for the clinical target volume (CTV), and different levels within these constraints. Next to this, multilabel classification was implemented using the constraints for the different OARs as separate labels. The full dataset was split into training (70%), validation/hyperparameter optimization (10%) and hold-out test sets (20%), with all fractions of one patient belonging to the same set. Next to this, 5-fold cross validation was also performed in the training and validation sets combined. The γ-maps together with their corresponding label were used as input for convolutional neural networks (CNNs). The exact CNN architecture and corresponding hyperparameters were optimized using Bayesian optimization. Binary Cross Entropy (BCE) is the commonly used loss function in binary classification tasks. However, in this study, class imbalance was present and the Sigmoid Focal Cross Entropy (SFCE) loss can be used for binary classification in imbalanced datasets [6]. The performance of these two loss functions was compared. The models were trained using the area under the curve (AUC) of the precision-recall curve (“AUC_PR”) as monitor value. Results of model training were evaluated using the AUC_PR, AUC for ROC (AUC_ROC), precision, recall and F1-score. For the F1-score, a threshold for classification into the error class of 0.5 was used, as well as an optimized threshold based on equal precision and recall in the validation set. For the multilabel classification, weighted F1-score with a 0.25 threshold was used as monitor value.
Results:
Preliminary results show that training and validation performance of the CNN was good for the ground-truth classification based on clinical constraints but also indicated overfitting (Table 1). Sigmoid Focal Cross Entropy as loss function showed better performance on the validation set in terms of AUC_PR and AUC_ROC. Confusion matrices comparing ground-truth DVH-classification with CNN-classification using Sigmoid Focal Cross Entropy as loss function are shown in Figure 1 for the training and validation sets. 5-fold cross validation results showed considerable overfitting on the training set (Table 1).
For the ground-truth classification based on a 2% relative dose difference in the target CTV D95%, results showed moderate performance (Table 1).
The multilabel classification model using SFCE showed worse performance in terms of F1-score (threshold 0.25), calculated for each instance and then averaged (F1-score training set = 0.349, validation set = 0.408).
Made with FlippingBook - Online Brochure Maker