Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion

This repository contains the research code behind Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion. The model ingests one FMCW LiDAR sweep (xyz + Doppler) plus synchronized camera views, fuses them with CLIP-derived semantic priors, and forecasts BEV occupancy/uncertainty at multiple horizons (Δ=0.3 s, 0.6 s in the default configuration). Training is supervision-free: future LiDAR sweeps provide the occupancy targets via ray casting.

Highlights

Single-frame inference with FMCW LiDAR radial velocity channels and CLIP image semantics.
Future occupancy & uncertainty heads trained with future-LiDAR supervision only.
Semantic priors & gating (Semantics-Conditioned Motion Prior) regularize the fusion backbone.

Repository layout

code/ – core Python package (datasets, preprocessing, fusion network, heads, metrics, training/eval loops).
tools/ – CLI utilities for inspecting cached frames and rendering qualitative predictions.
config.yaml – example training configs; edit these instead of hard-coding paths.
requirements.txt – pinned dependencies (PyTorch 2.7.1 + CUDA 12.8 wheels, Transformers, Open3D, etc.).

Environment setup

Use Python 3.10+ with a CUDA-capable GPU. The pinned PyTorch wheels (torch==2.7.1+cu128) expect CUDA 12.8.

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate      # Windows: venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

(Optional) install the repo as a package to enable python -m code.* entry points from anywhere:
```
pip install -e .
```

Dataset expectations

The code assumes access to the AevaScenes FMCW LiDAR dataset (not included). Each sequence is stored as a UUID.tar.gz bundle that contains LiDAR sweeps, synchronized camera frames, calibration, and metadata. Place them under a common root, e.g. data/aevascenes/.

Many scripts reference the environment variable AEVASCENES or data_root (see configs). Layout example:

data/aevascenes/
  metadata.json
  exclude.txt
  0b0a1559-9c8d-4b28-8b92-103f5e1d0051.tar.gz
  ...

Download the dataset from:

Precomputing dataset caches

Training all networks expects a dataset_cache_dir that stores per-frame tensors (LiDAR BEV, camera BEV features, CLIP semantic priors, masks, calibration, and future-occupancy targets). Use code/preprocess/build_cache.py: https://scenes.aeva.com/download

python -m code.preprocess.build_cache \
  --data-root /path/to/extracted/aevascenes \
  --cache-root /path/to/cache/ \
  --sensor-id front_wide \
  --deltas 0.3 0.6 \
  --target-dilation "0.3:1,0.6:2" \
  --device cuda \
  --feature-device cuda \
  --camera-backbone clip \
  --save-sem-prior \
  --bev-cam-dtype fp16 \
  --bev-cam-compressed-dim 64 \
  --sequences-file sequences.txt \
  --num-workers 0

Key options:

--data-root points to extracted sequences (untar before calling the script or use the SLURM helper in cluster/preprocess.sh).
--cache-root is where .pt tensors are produced (<frame_uuid>/lidar_bev.pt, bev_cam.pt, masks, target_delta*.pt, etc.).
--camera-backbone clip + --save-sem-prior builds CLIP-based semantic movability priors and confidences alongside BEV tensors.
--bev-cam-compressed-dim 64 projects high-dimensional camera features to a compact footprint, reducing cache size.
--target-dilation controls post-processing of occupancy targets per horizon.

Utilities for inspecting cached content:

python tools/inspect_cache_frame.py --frame-dir <cache_root>/<frame_uuid> --out-dir viz/inspect plots LiDAR BEV channels, semantic priors, and cached targets.

Training

Pick or duplicate a config (e.g. config.yaml). Important fields:
- data_root: path to extracted sequences (can be a temporary scratch directory).
- dataset_cache_dir: path to the cache built above. The trainer refuses to start if the placeholder "/path/to/cache-root" is left unchanged.
- train_seqs / val_seqs: UUID lists; reuse the provided YAML helpers or load from cluster/sequences.yaml.
- deltas, target_dilation: horizons and dilations that must match the cached targets.
- use_camera, use_lidar, use_sem_prior, sem_gate, use_velocity_channels: ablation switches for fusion inputs.
- batch_size, epochs, lr, weight_decay, workers, amp: training hyperparameters.
Launch training:
```
python -m code.train --config config.yaml
```
- Logs, checkpoints, and plots land in runs/<timestamp>/.
- Mixed precision (amp) and gradient clipping are enabled by default; override in the YAML if needed.

Evaluation

code/eval.py loads a checkpoint and reports IoU/AP/Brier/ECE metrics, optionally per-distance-band or per-class. Example:

python -m code.eval \
  --ckpt runs/checkpoints/best.pt \
  --root /path/to/extracted/aevascenes \
  --dataset-cache-dir /path/to/cache/clip_front_wide_compressed64 \
  --sequences-file cluster/sequences_val.txt \
  --sensor-id front_wide \
  --deltas 0.3 0.6 \
  --target-dilation "0.3:1,0.6:2" \
  --batch-size 2 \
  --use-anno-targets \
  --anno-root /path/to/gt_annotations \
  --x-bands "0:30,30:60,60:100"

Outputs include aggregated tables, TensorBoard scalars, PR/reliability plots, and optional per-class breakdowns.

Visualization

python tools/visualize_quali.py --root <data_root> --dataset-cache-dir <cache> --ckpt <checkpoint> --sequence <uuid> --out-dir runs/quali renders LiDAR BEV channels, semantic priors, fused features (PCA), and predicted occupancy masks for each Δ. Add --anno-root to overlay ground-truth occupancy BEVs if you have them.
tests/test.py includes point-cloud→image projection utilities useful during calibration debug sessions.

Tips & troubleshooting

Missing cache: ensure dataset_cache_dir points to the cache root and that its manifest matches the sequences requested by the loader. Use python tools/inspect_cache_frame.py on a few frames to confirm channel statistics.
Config placeholders: config.yaml ships with /path/to/cache-root; replace it before running. The trainer explicitly checks for this placeholder to prevent silent failures.
Worker deadlocks: when extracting CLIP features on GPU during preprocessing, set --num-workers 0 (PyTorch limitation). For CPU feature extraction you may increase workers.
Changing horizons: re-run preprocessing with the new --deltas/--target-dilation so the cache contains the necessary target_delta* tensors.
Ablations: to reproduce LiDAR-only or camera-only baselines, toggle use_camera / use_lidar / use_sem_prior in the config or through CLI overrides in code.eval.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
code		code
tools		tools
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion

Highlights

Repository layout

Environment setup

Dataset expectations

Precomputing dataset caches

Training

Evaluation

Visualization

Tips & troubleshooting

About

Uh oh!

Releases

Packages

Languages

rutulgandhi05/sgmf-fmcw

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion

Highlights

Repository layout

Environment setup

Dataset expectations

Precomputing dataset caches

Training

Evaluation

Visualization

Tips & troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages