Skip to content

ue5-vehicle-synth

Synthetic vehicle-keypoint datasets, rendered inside Epic's City Sample with Unreal Engine 5.

ue5-vehicle-synth demo: a car drives, gains its 24-point keypoint skeleton, then drives on as a wireframe ghost

Full demo: videos/demo.mp4 - three scenes, each a single vehicle driving normally, then with its 24-point keypoint skeleton, then as a wireframe ghost.

Real vehicle-keypoint data is scarce and expensive to label. A game engine renders the same scene with perfect, free, pixel-exact annotations and unlimited variety of viewpoint, vehicle, weather, and time of day. This project builds that generator as a reusable Unreal Engine C++ plugin and uses it to extend the vehicle-keypoints model from a 14-point to a 24-point schema with a sim-to-real recipe.

  • Rendered in the Matrix city


    Frames come from Epic's City Sample, so the dataset carries a recognizable, high-fidelity look while staying fully redistributable.

    How it works

  • 24-point schema


    The first 14 points are the CarFusion canonical order; ten new points (mirrors, bumper and window corners) extend it.

    The schema

  • Honest evaluation


    A kill switch gates the work: synthetic pre-training must measurably beat the real baseline, or the result is reported as-is.

    Status

  • Run it yourself


    Step-by-step setup, capture, configuration, and troubleshooting guides.

    Project Wiki

How it works

The pipeline decouples annotation (fast, in-editor) from rendering (gold-path, offline), kept in lockstep by a keyframed Level Sequence so every rendered frame has a matching label.

A C++ wrapper reads City Sample's baked ZoneGraph lane network directly from the engine and returns lane positions plus travel directions. The vehicle is placed on a real lane, aligned to traffic.

Road network detected from ZoneGraph

For each pose, 24 anatomical keypoints are projected to pixels with a bidirectional occlusion trace for CarFusion-style visibility, and a mesh-bounds bounding box that reaches the tire bottoms. Every visible city vehicle is labelled, not just the rig.

24 keypoints and bounding box on a City Sample vehicle

A single camera is keyframed through every pose; Movie Render Queue renders each as a sharp 1280x720 frame with full Lumen global illumination.

Three poses from one keyframed Movie Render Queue sequence

Per-frame records become a validated multi-instance COCO keypoint dataset, consumed directly by the vehicle-keypoints training pipeline.

Read the full engineering walkthrough

The 24-point schema

Range Points Group
0-3 4 wheels CarFusion canonical
4-7 head / tail lights CarFusion canonical
8 exhaust CarFusion canonical
9-12 4 roof corners CarFusion canonical
13 body center CarFusion canonical
14-15 2 side mirrors extension
16-19 4 bumper corners extension
20-23 4 window-base corners extension

Keeping points 0-13 in the exact CarFusion order means a 24-point model stays backward-compatible with the v1 14-point evaluation. Anchors are defined per vehicle type in configs/vehicles/.

Status

Phase 0 vertical slice

The capture pipeline, the C++ plugin, ZoneGraph road placement, the gold-path render, and the sim-to-real training driver are built and validated end to end. The current focus is the Phase 0 kill switch - synthetic pre-training must lift the v1 model's OKS-mAP by at least +2pp on the CarFusion test set. Results, positive or negative, are documented openly in the repo. A narrow single-city slice is a hard case for sim-to-real transfer; the pipeline and an honest write-up are the deliverable either way.

Results & label quality

A model trained only on the synthetic frames reaches 0.86 box mAP@50 on held-out synthetic data - the 24-point labels are clean and learnable. On the real CarFusion test set the synthetic pre-training did not beat the real-only baseline (a narrow single-city slice is a hard sim-to-real case), and that gap is partly a label-convention mismatch: the kill switch grades against CarFusion's own annotations.

That matters because our labels are demonstrably cleaner. Below, our synthetic ground truth (24 points, pixel-exact by construction) versus CarFusion's real ground truth (14 points, multi-view triangulated):

Label quality: our pixel-exact synthetic 24-point labels (left) vs CarFusion's triangulated 14-point labels (right)

Ours are denser and sit precisely on the anatomy; CarFusion's are sparser and visibly scattered. So a model pulled toward our convention is penalised by a noisier reference - the negative transfer number conflates label-convention mismatch with true transfer quality, and is reported honestly rather than hidden.

Get it

Frames are non-interactive media rendered with Unreal Engine from City Sample. Under the UE EULA, such rendered images are freely distributable; only the underlying UE-Only-Content asset files are not. This project ships rendered PNG frames and JSON annotations only - never Epic asset files. Full analysis: City Sample EULA review.