ue5-vehicle-synth

Synthetic vehicle-keypoint datasets, rendered inside Epic's City Sample with Unreal Engine 5.

ue5-vehicle-synth demo: a car drives, gains its 24-point keypoint skeleton, then drives on as a wireframe ghost

Full demo: videos/demo.mp4 - three scenes, each a single vehicle driving normally, then with its 24-point keypoint skeleton, then as a wireframe ghost.

Real vehicle-keypoint data is scarce and expensive to label. A game engine renders the same scene with perfect, free, pixel-exact annotations and unlimited variety of viewpoint, vehicle, weather, and time of day. This project builds that generator as a reusable Unreal Engine C++ plugin and uses it to extend the vehicle-keypoints model from a 14-point to a 24-point schema with a sim-to-real recipe.

Rendered in the Matrix city

Frames come from Epic's City Sample, so the dataset carries a recognizable, high-fidelity look while staying fully redistributable.

How it works
24-point schema

The first 14 points are the CarFusion canonical order; ten new points (mirrors, bumper and window corners) extend it.

The schema
Honest evaluation

A kill switch gates the work: synthetic pre-training must measurably beat the real baseline, or the result is reported as-is.

Status
Run it yourself

Step-by-step setup, capture, configuration, and troubleshooting guides.

Project Wiki

How it works

The pipeline decouples annotation (fast, in-editor) from rendering (gold-path, offline), kept in lockstep by a keyframed Level Sequence so every rendered frame has a matching label.

1. Road detection2. Keypoint projection3. Gold-path render4. Export

A C++ wrapper reads City Sample's baked ZoneGraph lane network directly from the engine and returns lane positions plus travel directions. The vehicle is placed on a real lane, aligned to traffic.

Road network detected from ZoneGraph

For each pose, 24 anatomical keypoints are projected to pixels with a bidirectional occlusion trace for CarFusion-style visibility, and a mesh-bounds bounding box that reaches the tire bottoms. Every visible city vehicle is labelled, not just the rig.

24 keypoints and bounding box on a City Sample vehicle

A single camera is keyframed through every pose; Movie Render Queue renders each as a sharp 1280x720 frame with full Lumen global illumination.

Three poses from one keyframed Movie Render Queue sequence

Per-frame records become a validated multi-instance COCO keypoint dataset, consumed directly by the vehicle-keypoints training pipeline.

Read the full engineering walkthrough

The 24-point schema

Range	Points	Group
0-3	4 wheels	CarFusion canonical
4-7	head / tail lights	CarFusion canonical
8	exhaust	CarFusion canonical
9-12	4 roof corners	CarFusion canonical
13	body center	CarFusion canonical
14-15	2 side mirrors	extension
16-19	4 bumper corners	extension
20-23	4 window-base corners	extension

Keeping points 0-13 in the exact CarFusion order means a 24-point model stays backward-compatible with the v1 14-point evaluation. Anchors are defined per vehicle type in configs/vehicles/.

Status

Phase 0 vertical slice

The capture pipeline, the C++ plugin, ZoneGraph road placement, the gold-path render, and the sim-to-real training driver are built and validated end to end. The current focus is the Phase 0 kill switch - synthetic pre-training must lift the v1 model's OKS-mAP by at least +2pp on the CarFusion test set. Results, positive or negative, are documented openly in the repo. A narrow single-city slice is a hard case for sim-to-real transfer; the pipeline and an honest write-up are the deliverable either way.

Results & label quality

A model trained only on the synthetic frames reaches 0.86 box mAP@50 on held-out synthetic data - the 24-point labels are clean and learnable. On the real CarFusion test set the synthetic pre-training did not beat the real-only baseline (a narrow single-city slice is a hard sim-to-real case), and that gap is partly a label-convention mismatch: the kill switch grades against CarFusion's own annotations.

That matters because our labels are demonstrably cleaner. Below, our synthetic ground truth (24 points, pixel-exact by construction) versus CarFusion's real ground truth (14 points, multi-view triangulated):

Label quality: our pixel-exact synthetic 24-point labels (left) vs CarFusion's triangulated 14-point labels (right)

Ours are denser and sit precisely on the anatomy; CarFusion's are sparser and visibly scattered. So a model pulled toward our convention is penalised by a noisier reference - the negative transfer number conflates label-convention mismatch with true transfer quality, and is reported honestly rather than hidden.

Get it

Source

Plugin, capture scripts, COCO export, configs.

github.com/kiselyovd/ue5-vehicle-synth
Dataset

1,440 rendered frames + multi-instance COCO on the Hub.

HuggingFace dataset

Legal

Frames are non-interactive media rendered with Unreal Engine from City Sample. Under the UE EULA, such rendered images are freely distributable; only the underlying UE-Only-Content asset files are not. This project ships rendered PNG frames and JSON annotations only - never Epic asset files. Full analysis: City Sample EULA review.