Skip to content

API Reference

Auto-generated reference for the vehicle_keypoints package via mkdocstrings. Each module below lists its public classes and functions with their docstrings and type signatures.

Package root

vehicle_keypoints

Production-grade vehicle keypoint detection (14 anatomical car keypoints, CarFusion).

Data

coco_dataset

Top-down COCO-keypoints dataset for ViTPose training.

Yields (crop_tensor, target_heatmap, visibility) for each GT car instance. Crop is extracted from bbox (with margin) and resized to a fixed (H, W). Heatmap targets are Gaussian blobs at GT keypoint locations in the crop.

Classes

CocoKeypointsDataset
CocoKeypointsDataset(
    images_root: Path | str,
    annotations_json: Path | str,
    crop_hw: tuple[int, int] = DEFAULT_CROP,
    heatmap_hw: tuple[int, int] = DEFAULT_HEATMAP,
    margin: float = 0.1,
)

Bases: Dataset

Iterate per-instance (one crop per annotation).

Source code in src/vehicle_keypoints/data/coco_dataset.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def __init__(
    self,
    images_root: Path | str,
    annotations_json: Path | str,
    crop_hw: tuple[int, int] = DEFAULT_CROP,
    heatmap_hw: tuple[int, int] = DEFAULT_HEATMAP,
    margin: float = 0.1,
) -> None:
    self.images_root = Path(images_root)
    self.crop_hw = crop_hw
    self.heatmap_hw = heatmap_hw
    self.margin = margin
    raw: dict[str, Any] = json.loads(Path(annotations_json).read_text(encoding="utf-8"))
    self.images_by_id = {img["id"]: img for img in raw["images"]}
    self.annotations = [a for a in raw["annotations"] if a["num_keypoints"] > 0]

datamodule

Lightning DataModule for ViTPose baseline (COCO-style top-down pose).

Classes

prepare

Prepare COCO -> Ultralytics YOLO-format dataset layout.

Models

factory

Model factory -- main YOLO26-pose + baseline ViTPose-S.

Functions

build_model
build_model(name: str, num_keypoints: int, pretrained: bool = True) -> Any

Return either an ultralytics.YOLO or a torch.nn.Module.

YOLO path (name startswith "yolo"): - loads an Ultralytics .pt (pretrained on COCO human pose; we fine-tune the head on CarFusion's 14-kpt layout at train time). ViTPose path (name startswith "vitpose"): - returns a ViTPoseSmall nn.Module emitting heatmaps of shape (B, num_keypoints, H', W').

Source code in src/vehicle_keypoints/models/factory.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def build_model(name: str, num_keypoints: int, pretrained: bool = True) -> Any:
    """Return either an `ultralytics.YOLO` or a `torch.nn.Module`.

    YOLO path (name startswith "yolo"):
      - loads an Ultralytics `.pt` (pretrained on COCO human pose; we fine-tune the head
        on CarFusion's 14-kpt layout at train time).
    ViTPose path (name startswith "vitpose"):
      - returns a `ViTPoseSmall` nn.Module emitting heatmaps of shape (B, num_keypoints, H', W').
    """
    if name.startswith("yolo"):
        from ultralytics import YOLO

        if pretrained:
            for candidate in (f"{name}-pose.pt", *YOLO_FALLBACKS):
                try:
                    return YOLO(candidate)
                except (FileNotFoundError, Exception):
                    continue
            raise ValueError(f"No pretrained pose checkpoint found for {name} (tried fallbacks)")
        return YOLO(f"{name}-pose.yaml")

    if name.startswith("vitpose"):
        from .vitpose import ViTPoseSmall

        return ViTPoseSmall(num_keypoints=num_keypoints, pretrained=pretrained)

    raise ValueError(f"Unknown model: {name}")

lightning_module

Lightning wrapper for ViTPose (heatmap-regression head).

Classes

KeypointsModule
KeypointsModule(
    model: Module,
    num_keypoints: int,
    lr: float = 0.0005,
    model_name: str | None = None,
)

Bases: LightningModule

Heatmap-regression Lightning module for top-down pose estimation.

Forward expects a (B, 3, H, W) crop. Output is (B, K, H', W') heatmaps. Training target is a set of Gaussian-blobbed heatmaps centered on GT kpt locations. Loss is MSE, masked by keypoint visibility.

Source code in src/vehicle_keypoints/models/lightning_module.py
18
19
20
21
22
23
24
25
26
27
28
def __init__(
    self,
    model: nn.Module,
    num_keypoints: int,
    lr: float = 5e-4,
    model_name: str | None = None,
) -> None:
    super().__init__()
    self.model = model
    self.num_keypoints = num_keypoints
    self.save_hyperparameters(ignore=["model"])

vitpose

ViTPose-Small wrapper returning heatmap predictions for N keypoints.

Classes

ViTPoseSmall
ViTPoseSmall(num_keypoints: int = 14, pretrained: bool = True)

Bases: Module

Thin wrapper around HF VitPose model, re-headed to N keypoints.

We use usyd-community/vitpose-small-simple as the pretrained backbone (ImageNet + MS-COCO human pose). The head is replaced to emit num_keypoints heatmaps.

Source code in src/vehicle_keypoints/models/vitpose.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def __init__(self, num_keypoints: int = 14, pretrained: bool = True) -> None:
    super().__init__()
    from transformers import VitPoseConfig, VitPoseForPoseEstimation

    model_id = "usyd-community/vitpose-small-simple"
    if pretrained:
        try:
            self.backbone = VitPoseForPoseEstimation.from_pretrained(
                model_id, num_labels=num_keypoints, ignore_mismatched_sizes=True
            )
        except Exception:
            cfg = VitPoseConfig(num_labels=num_keypoints)  # type: ignore[call-arg]
            self.backbone = VitPoseForPoseEstimation(cfg)
    else:
        cfg = VitPoseConfig(num_labels=num_keypoints)  # type: ignore[call-arg]
        self.backbone = VitPoseForPoseEstimation(cfg)
    self.num_keypoints = num_keypoints

Training

train

Training entrypoint (Hydra-powered).

train_vitpose

ViTPose baseline training entrypoint (Lightning + Hydra).

Separate from train.py - the main YOLO path bypasses Lightning entirely and is handled by Ultralytics' own training loop in train.py.

Evaluation

evaluate

Evaluate pose predictions against COCO-keypoints GT.

Supports two inputs
  1. YOLO checkpoint (.pt) - auto-predicts over data/processed/images/test/
  2. Precomputed predictions JSON (COCO result format)

Metrics: OKS-mAP via pycocotools + PCK@0.05 (per-keypoint-correct within 5% of bbox diagonal).

Classes

Inference

overlay

Render keypoints + skeleton overlays on input images (CPU, OpenCV).

predict

Inference helper: load YOLO pose checkpoint and run detection.

Classes

Detector dataclass
Detector(model: Any)

Thin wrapper around an ultralytics.YOLO predictor.

Functions
from_pretrained_or_random classmethod
from_pretrained_or_random(base_name: str = 'yolo26n') -> Detector

Factory used in tests - loads pretrained pose .pt if available, else YAML.

Source code in src/vehicle_keypoints/inference/predict.py
30
31
32
33
34
35
36
37
38
39
40
@classmethod
def from_pretrained_or_random(cls, base_name: str = "yolo26n") -> Detector:
    """Factory used in tests - loads pretrained pose `.pt` if available, else YAML."""
    from ultralytics import YOLO

    for candidate in (f"{base_name}-pose.pt", "yolo11n-pose.pt", f"{base_name}-pose.yaml"):
        try:
            return cls(model=YOLO(candidate))
        except Exception:  # nosec B112 - intentional fallback over candidate list
            continue
    raise RuntimeError(f"Could not instantiate YOLO for {base_name}")

Serving

dependencies

FastAPI DI - lazy-loaded Detector singleton.

Classes

errors

Exception types and handlers.

main

FastAPI application.

routes

FastAPI routes for the pose detection service.

Classes

schemas

Pydantic schemas for the /detect endpoint.

Script helpers

convert_carfusion

Convert raw CarFusion (CMU) dumps to COCO keypoints JSON.

Reusable functions (import-friendly for tests); a thin CLI wrapper lives at scripts/convert_carfusion_to_coco.py.

Input layout (raw CarFusion): raw_dir//gt/.txt - per-frame keypoint rows raw_dir///.jpg

Each .txt row has 5 comma-separated fields: x, y, keypoint_id(1..14), instance_id, visibility(1|2|3)

CarFusion visibility convention -> COCO visibility: 1 (visible) -> 2 (labeled + visible) 2 (labeled occluded) -> 1 (labeled but not visible) 3 (occluded) -> 2 (labeled + visible) # legacy script treated 3 as 1 other -> 0 (not labeled)

Utilities

hf_hub

HuggingFace Hub helpers.

logging

Structured logging configuration.

seed

Deterministic seeding across libraries.