Benchmarks

All numbers are on the held-out test split: 387 axial slices from 11 TCGA patients never seen during training. Hardware: RTX 3080 10 GB. Inference is single-slice, FP32.

Main results

Model	Dice	IoU	Pixel acc	Params	Inference (ms/slice, RTX 3080)
SegFormer-B2 (ours, main)	65.5%	66.2%	99.73%	~27 M	~22 ms
U-Net (ours, baseline, 32→256 ch)	51.9%	57.7%	99.66%	~1.9 M	~5 ms

Literature context

Reported numbers on the same LGG / TCGA kaggle_3m split vary by paper; here are representative figures:

Model	Dice	Source
U-Net (literature, Buda 2019)	~82%	Original paper, larger U-Net, more epochs
U-Net++	~70%	Published replications 2020-2022
Attention U-Net	~68%	Published replications
SegFormer-B2 (ours)	65.5%	This repo, v0.1.0, early-stopped at epoch 13/99
U-Net-tiny (ours, baseline)	51.9%	This repo, 1.9M-param reference

Our SegFormer-B2 run was early-stopped (validation Dice plateaued) after just 13 epochs — it's a demonstration of the pipeline, not a tuned SOTA. More training epochs + data augmentation + auxiliary deep-supervision losses would close the gap to Buda's original ~82%.

Trade-offs

SegFormer vs U-Net: SegFormer wins on Dice by ~14pp thanks to global self-attention — but costs 14x more parameters and 4x more inference latency. For edge deployment the tiny U-Net is a reasonable choice despite the weaker Dice.
Why not bigger SegFormer variants: SegFormer-B4/B5 would likely add 3-5pp Dice but don't fit comfortably alongside a training pipeline on 10 GB VRAM at batch 16 × 256² input.
Why 2D, not 3D: a 3D U-Net would exploit through-plane context and likely add 5-10pp Dice, but requires 10x the memory and doesn't fit the "single-RTX-3080" constraint of this portfolio.

Reproducing these numbers

See REPRODUCIBILITY.md for the one-command re-run. Expected variation: ± 1% from test-set size (small n=387) and floating-point noise.