Skip to content

Limitations & Failure Modes

This page is a deliberate call-out of where the published model falls short. Read it before using the weights for anything beyond demo / research.

Dataset scope

  • Small cohort: 110 patients (~3929 2D slices) from TCGA-LGG — the Buda et al. (2019) Kaggle release. Low-grade glioma only; high-grade glioma (HGG), metastases, meningiomas, and non-brain pathology are out of distribution.
  • Scanner bias: The kaggle_3m set comes from ~4 scanner models in TCGA archives. Field-strength and sequence variations across real-world scanners will shift performance.
  • 2D slice-level model: The architecture processes each axial slice independently. Volumetric context (through-plane continuity of tumor boundary) is not exploited — a 3D U-Net or volumetric transformer would likely do better.
  • Small held-out test set: 387 slices from 11 patients. Reported Dice=65.5% has ~±3pp confidence interval.

Segmentation-specific failure modes

  • Thin tumor edges get under-segmented: Dice of ~65% reflects edge-voxel misses. The model is confident in the tumor core but less so at the periphery.
  • Hyperintense non-tumor regions get over-segmented: flair-hyperintense but non-tumor tissue (edema, gliosis) sometimes gets predicted as tumor.
  • Cross-midline tumors underperform: tumors extending across the brain midline miss the far-side extension more often than expected.
  • Empty-mask slices: ~40% of slices in the dataset are tumor-free. The model handles these correctly most of the time but occasionally predicts tiny spurious blobs.

Not a medical device

  • Not FDA-cleared, not CE-marked, not clinically validated.
  • Any clinical use requires independent validation by qualified neuroradiologists.
  • Do not use for treatment planning or patient-facing diagnostics.

Adversarial & reliability

  • No adversarial or corruption robustness testing.
  • No uncertainty estimation — single mask output, no per-voxel confidence. Monte Carlo dropout or a Bayesian head would help.

What this project is good for

  • A production ML pipeline template: Lightning + Hydra + MLflow + DVC + FastAPI + Docker + HF Hub + CI/CD.
  • A reproducible baseline for brain-tumor segmentation research.
  • Comparing SegFormer vs a small U-Net on medical imaging with a modest compute budget.