ESTRO 2025 - Abstract Book
S3847
Physics - Radiomics, functional and biological imaging and outcome prediction
ESTRO 2025
Results: The 3-folds accuracy is shown in Table 1. VLM model using only pre-treatment CT and text/genomic features surpassed either vision and language model in accuracy (VLM: 0.728 vs. LLM: 0.667 vs. CT 0.712 AUC) . The VLM model also including the first post-treatment CT achieved the highest AUC of 0.779, indicating the benefit of integrating multimodality and longitudinal image information. The ROC curves from various models are shown in Figure 2. Table 1. Prediction accuracy of LLM only, Vision only and the combination of LLM and Vision. Abbreviations: LLM (Large language model), Baseline (first treatment), FU( Following up treatment).
Models
AUC
Specificity
Sensitivity
LLM
0.667±0.038
0.567±0.038
0.680±0.044
Vision (Baseline)
0.712±0.030
0.597±0.008
0.697±0.022
Vision (Baseline) + LLM
0.728±0.038
0.608±0.016
0.714±0.027
Vision (Baseline + FU)
0.766±0.032
0.619±0.011
0.732±0.008
Vision (Baseline + FU) +LLM
0.779±0.040
0.639±0.029
0.767±0.039
Figure 2. The ROC curve of different models
Conclusion: Incorporating multimodality and longitudinal information into AI models may lead to more accurate prediction of treatment response. Independent evaluation on additional patient cohorts are necessary to establish feasibility for large-scale use.
Keywords: Immunotherapy, Response prediction
References: [1] Jiang, Jue, Neelam Tyagi, Kathryn Tringale, Christopher Crane, and Harini Veeraraghavan. "Self-supervised 3D anatomy segmentation using self-distilled masked image transformer (SMIT)." In International Conference on Medical Image Computing and Computer-Assisted Intervention , pp. 556-566. Cham: Springer Nature Switzerland, 2022. [2] BehnamGhader, Parishad, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. "Llm2vec: Large language models are secretly powerful text encoders." Conference on Language modeling, (2024).
Made with FlippingBook Ebook Creator