ESTRO 2020 Abstract book

S1012 ESTRO 2020

PO-1732 Feasibility of virtual dual-energy imaging through deep learning for markerless tumour tracking S. Amador Sánchez 1,2 , J. Dhont 1,2 , J. De Mey 3 , M. Malbrain 4 , J. Vandemeulebroucke 1,2 1 Vrije Universiteit Brussel, Department of Electronics and Informatics, Brussels, Belgium ; 2 IMEC, Imec, Leuven, Belgium ; 3 UZ Brussel, Department of Radiology, Brussels, Belgium ; 4 UZ Brussel, Department of Intensive Care, Brussels, Belgium Purpose or Objective Dual-energy imaging (DEI) has shown to significantly increase soft-tissue visibility in X-ray images, and several groups have shown its added value in radiotherapy (RT), especially with respect to marker-less tumor tracking [1- 2]. However, as it requires dedicated equipment and increases imaging dose, clinical implementation is limited. In deep learning-based DEI (DL-DEI), a neural network (NN) trained on a retrospective DE dataset can generate soft- tissue images from single X-ray images in real-time, without the need for special equipment. As bony anatomy in chest X-rays consists of multiple spatial frequencies, multi-level resolution decomposition and level-specific bone-suppression is expected to improve the DL-DEI workflow. Purpose of this study was to determine the most effective DL-DEI technique with respect to network topology and resolution decomposition technique, and evaluate the feasibility of its use in RT. Material and Methods Based on literature search, two high-performing network topologies were evaluated; a 3-layer massive training artificial NN (MTANN) as proposed by [3], and a five-layer convolutional NN (CNN) as proposed by [4]. The input to each NN is a regular X-ray image, while the output is its soft-tissue image, making the NN perform bone- suppression. Both topologies were implemented using TensorFlow, and each topology was equipped with a resolution decomposition scheme; as illustrated in Figure 1. Briefly, both input and output images are decomposed in a high, medium and low-resolution image, and a separate NN is applied to each resolution level. Three reversible decomposition techniques were evaluated; bilinear interpolation, the use of B-splines, and wavelet-based. Reversibility is required in order to recompose the 3 output images into a single soft-tissue image during application. From 35 DE image pairs, a test set of 10 was set aside for validation. CNNs were trained using full images and required data augmentation from 25 image pairs to 6425. To the contrary, training of MTANNs was performed at pixel level, requiring only 4 image pairs to reach convergence. Results Evaluating on the test set, based on structural similarity (SSIM) and normalized root mean squared error (NRMSE) between the predicted soft-tissue images and their ground-truth; the CNN topology significantly (p < 0.05) outperformed MTANN with each decomposition technique. Based on the same metrics, there was no significant difference between decomposition techniques, see Figure 2. However, novel dedicated metrics are currently being defined as visually, bony-anatomy appeared most suppressed when wavelet-based decomposition is applied. Once trained, both networks were able to render soft- tissue images in the order of a second. Conclusion A limited retrospective dataset of 25 DE image pairs is sufficient to train a CNN for DL-DEI, which can be employed in real-time to increase soft-tissue visibility in X-ray images acquired during RT, requiring no special equipment or increase in imaging dose.

coronal plane, generating 3 different sCT volumes respectively. The final sCT was obtained by computing, voxel-by-voxel, the median value of the 3 predictions. The method was tested by a 4-folds statistical approach. In turn, for each fold, 13 patients served as training set, 2 as validation and 5 were left out for testing. Conversion accuracy was quantified in terms of Mean Absolute Error (MAE) and Absolute Error (ME) between pCT and sCT. Results Median±interquartile values for MAE and ME were equal to 40.2±3.3 HU and -3.5±3.2 HU respectively. 5th/95th percentile ranges were 33.8/52.7 HU and -14.5/9.4 HU. For a single fold, training step took, on average, 2 days. On a GNU/Linux workstation equipped with a Nvidia P6000 graphic card, CBCT conversion time was about 3.5 minutes. Figure 1 shows an exemplary case.

Conclusion In this work the capability of deep learning strategy to convert CBCT into sCT in pelvis district was evaluated. The proposed strategy could be used to improve the current prostate radiotherapy process, where a new CT scan of the patient is acquired when anatomical changes, detected by CBCT, are too large. This would reduce both the delivered ionizing dose to the subject and the time required for the whole treatment. Future efforts will aim at testing the approach on larger cohort of patients and to validate the method in terms of dosimetry and segmentation of the structures of interest.

Made with FlippingBook - Online magazine maker