Training
Prerequisites
uv sync --all-groups
bash scripts/sync_data.sh /path/to/kaggle/chest_xray
uv run python -m chest_xray_classifier.data.prepare --raw data/raw --out data/processed
Main — ConvNeXt-V2-Tiny
uv run python -m chest_xray_classifier.training.train experiment=sota
Expected wall time: ~90 min on an A10/A100. Checkpoint written to artifacts/checkpoints/best.ckpt.
Baseline — DINOv2 linear probe
uv run python -m chest_xray_classifier.training.train \
model=baseline \
trainer.max_epochs=20 \
trainer.output_dir=artifacts/baseline
MLflow tracking
mlflow ui --backend-store-uri ./mlruns
Browse at http://localhost:5000 — every Hydra run is one MLflow run with the full resolved config logged as params and train/loss, val/loss, val/acc, val/f1_macro as metrics.
Hydra overrides (common)
| Override | Effect |
|---|---|
trainer.max_epochs=50 |
Longer training |
trainer.accelerator=gpu |
Force GPU |
data.batch_size=64 |
Larger batches |
model.lr=1e-4 |
Different learning rate |
seed=7 |
Reproducibility |
Multi-run sweep example:
uv run python -m chest_xray_classifier.training.train -m \
model.lr=1e-5,3e-5,1e-4 \
data.batch_size=32,64