ESTRO 2024 - Abstract Book

S2996

Physics - Autosegmentation

ESTRO 2024

first Foundation Model (FM) for this problem. However, its application on medical images is challenging [5-7]. In this work we evaluated different FMs with various prompting strategies to demonstrate how well they can facilitate organ delineation in medical images.

Material/Methods:

Dataset

The test dataset involved 55 pelvis and 50 head-and-neck cases, where each case included a volumetric MR scan. The ground-truth segmentation was manually defined for all organs of interest in all test cases: 10 pelvis and 27 head/neck (HN).

Foundation model

We evaluated the original SAM ViT-B and ViT-H models, the MedSAM, and the LVM-Med [8] encoder combined with SAM ViT-B components. The prompting strategies included bounding box only, points only, bounding box combined with points, and a refinement step. The prompting was automated with scripts based on ground-truth. The FMs were applied to the 2D axial slices of 3D volumes, and the Dice Similarity Coefficient (DSC) was computed in 3D w.r.t. ground-truth and compared to a supervised 3D SegResNet [9] model that was trained on a separate dataset.

Prompting strategies

Possible prompts for the model are a bounding box, points with positive or negative label, and the logits of an inference result as a mask. The first prompt is always generated from the ground-truth: its centroid as a positive point or its bounding box. Consecutive prompts are always points that are generated from the largest error region (compared to ground-truth) that can be over- (negative point) or under-segmented area (positive point). This leads to three evaluation modes: box, points, box+points. For each additional point the logits of the previous iteration are also passed to the model. An additional inference is performed after the last prompt, referred to as refinement step, when no additional prompt is computed but the output is refined by passing the last logits.

Metric computation

The MR scans were processed in slice-by-slice manner and the result was stacked into a 3D volume. If the ground truth contained more connected components in one slice, each component was prompted individually, and the results were merged. DSC was computed for the resulting 3D masks. The experiment was applied to all organs of interest.

Results:

We have checked whether the multi- or single-mask output (SMO) provides better scores. In almost all cases SMO provided better values, especially for points, where the largest gain was 40% DSC.

Region

Model

Box

Points

Box+points

HN

ViT-B

74.6%

85.6%

90.1%

Made with FlippingBook - Online Brochure Maker