This repository contains the research code behind Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion. The model ingests one FMCW LiDAR sweep (xyz + Doppler) plus synchronized camera views, fuses them with CLIP-derived semantic priors, and forecasts BEV occupancy/uncertainty at multiple horizons (Δ=0.3 s, 0.6 s in the default configuration). Training is supervision-free: future LiDAR sweeps provide the occupancy targets via ray casting.
- Single-frame inference with FMCW LiDAR radial velocity channels and CLIP image semantics.
- Future occupancy & uncertainty heads trained with future-LiDAR supervision only.
- Semantic priors & gating (
Semantics-Conditioned Motion Prior) regularize the fusion backbone.
code/– core Python package (datasets, preprocessing, fusion network, heads, metrics, training/eval loops).tools/– CLI utilities for inspecting cached frames and rendering qualitative predictions.config.yaml– example training configs; edit these instead of hard-coding paths.requirements.txt– pinned dependencies (PyTorch 2.7.1 + CUDA 12.8 wheels, Transformers, Open3D, etc.).
- Use Python 3.10+ with a CUDA-capable GPU. The pinned PyTorch wheels (
torch==2.7.1+cu128) expect CUDA 12.8. - Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install --upgrade pip pip install -r requirements.txt
- (Optional) install the repo as a package to enable
python -m code.*entry points from anywhere:pip install -e .
-
The code assumes access to the AevaScenes FMCW LiDAR dataset (not included). Each sequence is stored as a
UUID.tar.gzbundle that contains LiDAR sweeps, synchronized camera frames, calibration, and metadata. Place them under a common root, e.g.data/aevascenes/. -
Many scripts reference the environment variable
AEVASCENESordata_root(see configs). Layout example:data/aevascenes/ metadata.json exclude.txt 0b0a1559-9c8d-4b28-8b92-103f5e1d0051.tar.gz ... -
Download the dataset from:
Training all networks expects a dataset_cache_dir that stores per-frame tensors (LiDAR BEV, camera BEV features, CLIP semantic priors, masks, calibration, and future-occupancy targets). Use code/preprocess/build_cache.py: https://scenes.aeva.com/download
python -m code.preprocess.build_cache \
--data-root /path/to/extracted/aevascenes \
--cache-root /path/to/cache/ \
--sensor-id front_wide \
--deltas 0.3 0.6 \
--target-dilation "0.3:1,0.6:2" \
--device cuda \
--feature-device cuda \
--camera-backbone clip \
--save-sem-prior \
--bev-cam-dtype fp16 \
--bev-cam-compressed-dim 64 \
--sequences-file sequences.txt \
--num-workers 0Key options:
--data-rootpoints to extracted sequences (untar before calling the script or use the SLURM helper incluster/preprocess.sh).--cache-rootis where.pttensors are produced (<frame_uuid>/lidar_bev.pt,bev_cam.pt, masks,target_delta*.pt, etc.).--camera-backbone clip+--save-sem-priorbuilds CLIP-based semantic movability priors and confidences alongside BEV tensors.--bev-cam-compressed-dim 64projects high-dimensional camera features to a compact footprint, reducing cache size.--target-dilationcontrols post-processing of occupancy targets per horizon.
Utilities for inspecting cached content:
python tools/inspect_cache_frame.py --frame-dir <cache_root>/<frame_uuid> --out-dir viz/inspectplots LiDAR BEV channels, semantic priors, and cached targets.
- Pick or duplicate a config (e.g.
config.yaml). Important fields:data_root: path to extracted sequences (can be a temporary scratch directory).dataset_cache_dir: path to the cache built above. The trainer refuses to start if the placeholder"/path/to/cache-root"is left unchanged.train_seqs/val_seqs: UUID lists; reuse the provided YAML helpers or load fromcluster/sequences.yaml.deltas,target_dilation: horizons and dilations that must match the cached targets.use_camera,use_lidar,use_sem_prior,sem_gate,use_velocity_channels: ablation switches for fusion inputs.batch_size,epochs,lr,weight_decay,workers,amp: training hyperparameters.
- Launch training:
python -m code.train --config config.yaml
- Logs, checkpoints, and plots land in
runs/<timestamp>/. - Mixed precision (
amp) and gradient clipping are enabled by default; override in the YAML if needed.
- Logs, checkpoints, and plots land in
code/eval.py loads a checkpoint and reports IoU/AP/Brier/ECE metrics, optionally per-distance-band or per-class. Example:
python -m code.eval \
--ckpt runs/checkpoints/best.pt \
--root /path/to/extracted/aevascenes \
--dataset-cache-dir /path/to/cache/clip_front_wide_compressed64 \
--sequences-file cluster/sequences_val.txt \
--sensor-id front_wide \
--deltas 0.3 0.6 \
--target-dilation "0.3:1,0.6:2" \
--batch-size 2 \
--use-anno-targets \
--anno-root /path/to/gt_annotations \
--x-bands "0:30,30:60,60:100"Outputs include aggregated tables, TensorBoard scalars, PR/reliability plots, and optional per-class breakdowns.
python tools/visualize_quali.py --root <data_root> --dataset-cache-dir <cache> --ckpt <checkpoint> --sequence <uuid> --out-dir runs/qualirenders LiDAR BEV channels, semantic priors, fused features (PCA), and predicted occupancy masks for each Δ. Add--anno-rootto overlay ground-truth occupancy BEVs if you have them.tests/test.pyincludes point-cloud→image projection utilities useful during calibration debug sessions.
- Missing cache: ensure
dataset_cache_dirpoints to the cache root and that its manifest matches the sequences requested by the loader. Usepython tools/inspect_cache_frame.pyon a few frames to confirm channel statistics. - Config placeholders:
config.yamlships with/path/to/cache-root; replace it before running. The trainer explicitly checks for this placeholder to prevent silent failures. - Worker deadlocks: when extracting CLIP features on GPU during preprocessing, set
--num-workers 0(PyTorch limitation). For CPU feature extraction you may increase workers. - Changing horizons: re-run preprocessing with the new
--deltas/--target-dilationso the cache contains the necessarytarget_delta*tensors. - Ablations: to reproduce LiDAR-only or camera-only baselines, toggle
use_camera/use_lidar/use_sem_priorin the config or through CLI overrides incode.eval.