Skip to content

Benchmarks

All numbers are on the held-out test split: 387 axial slices from 11 TCGA patients never seen during training. Hardware: RTX 3080 10 GB. Inference is single-slice, FP32.

Main results

Model Dice IoU Pixel acc Params Inference (ms/slice, RTX 3080)
SegFormer-B2 (ours, main) 65.5% 66.2% 99.73% ~27 M ~22 ms
U-Net (ours, baseline, 32→256 ch) 51.9% 57.7% 99.66% ~1.9 M ~5 ms

Literature context

Reported numbers on the same LGG / TCGA kaggle_3m split vary by paper; here are representative figures:

Model Dice Source
U-Net (literature, Buda 2019) ~82% Original paper, larger U-Net, more epochs
U-Net++ ~70% Published replications 2020-2022
Attention U-Net ~68% Published replications
SegFormer-B2 (ours) 65.5% This repo, v0.1.0, early-stopped at epoch 13/99
U-Net-tiny (ours, baseline) 51.9% This repo, 1.9M-param reference

Our SegFormer-B2 run was early-stopped (validation Dice plateaued) after just 13 epochs — it's a demonstration of the pipeline, not a tuned SOTA. More training epochs + data augmentation + auxiliary deep-supervision losses would close the gap to Buda's original ~82%.

Trade-offs

  • SegFormer vs U-Net: SegFormer wins on Dice by ~14pp thanks to global self-attention — but costs 14x more parameters and 4x more inference latency. For edge deployment the tiny U-Net is a reasonable choice despite the weaker Dice.
  • Why not bigger SegFormer variants: SegFormer-B4/B5 would likely add 3-5pp Dice but don't fit comfortably alongside a training pipeline on 10 GB VRAM at batch 16 × 256² input.
  • Why 2D, not 3D: a 3D U-Net would exploit through-plane context and likely add 5-10pp Dice, but requires 10x the memory and doesn't fit the "single-RTX-3080" constraint of this portfolio.

Reproducing these numbers

See REPRODUCIBILITY.md for the one-command re-run. Expected variation: ± 1% from test-set size (small n=387) and floating-point noise.