Skip to content

rutulgandhi05/sgmf-fmcw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion

This repository contains the research code behind Unsupervised Future Occupancy Prediction from Single Frame FMCW LiDAR and Camera Fusion. The model ingests one FMCW LiDAR sweep (xyz + Doppler) plus synchronized camera views, fuses them with CLIP-derived semantic priors, and forecasts BEV occupancy/uncertainty at multiple horizons (Δ=0.3 s, 0.6 s in the default configuration). Training is supervision-free: future LiDAR sweeps provide the occupancy targets via ray casting.

Highlights

  • Single-frame inference with FMCW LiDAR radial velocity channels and CLIP image semantics.
  • Future occupancy & uncertainty heads trained with future-LiDAR supervision only.
  • Semantic priors & gating (Semantics-Conditioned Motion Prior) regularize the fusion backbone.

Repository layout

  • code/ – core Python package (datasets, preprocessing, fusion network, heads, metrics, training/eval loops).
  • tools/ – CLI utilities for inspecting cached frames and rendering qualitative predictions.
  • config.yaml – example training configs; edit these instead of hard-coding paths.
  • requirements.txt – pinned dependencies (PyTorch 2.7.1 + CUDA 12.8 wheels, Transformers, Open3D, etc.).

Environment setup

  1. Use Python 3.10+ with a CUDA-capable GPU. The pinned PyTorch wheels (torch==2.7.1+cu128) expect CUDA 12.8.
  2. Create a virtual environment and install dependencies:
    python -m venv venv
    source venv/bin/activate      # Windows: venv\Scripts\activate
    pip install --upgrade pip
    pip install -r requirements.txt
  3. (Optional) install the repo as a package to enable python -m code.* entry points from anywhere:
    pip install -e .

Dataset expectations

  • The code assumes access to the AevaScenes FMCW LiDAR dataset (not included). Each sequence is stored as a UUID.tar.gz bundle that contains LiDAR sweeps, synchronized camera frames, calibration, and metadata. Place them under a common root, e.g. data/aevascenes/.

  • Many scripts reference the environment variable AEVASCENES or data_root (see configs). Layout example:

    data/aevascenes/
      metadata.json
      exclude.txt
      0b0a1559-9c8d-4b28-8b92-103f5e1d0051.tar.gz
      ...
    
  • Download the dataset from:

Precomputing dataset caches

Training all networks expects a dataset_cache_dir that stores per-frame tensors (LiDAR BEV, camera BEV features, CLIP semantic priors, masks, calibration, and future-occupancy targets). Use code/preprocess/build_cache.py: https://scenes.aeva.com/download

python -m code.preprocess.build_cache \
  --data-root /path/to/extracted/aevascenes \
  --cache-root /path/to/cache/ \
  --sensor-id front_wide \
  --deltas 0.3 0.6 \
  --target-dilation "0.3:1,0.6:2" \
  --device cuda \
  --feature-device cuda \
  --camera-backbone clip \
  --save-sem-prior \
  --bev-cam-dtype fp16 \
  --bev-cam-compressed-dim 64 \
  --sequences-file sequences.txt \
  --num-workers 0

Key options:

  • --data-root points to extracted sequences (untar before calling the script or use the SLURM helper in cluster/preprocess.sh).
  • --cache-root is where .pt tensors are produced (<frame_uuid>/lidar_bev.pt, bev_cam.pt, masks, target_delta*.pt, etc.).
  • --camera-backbone clip + --save-sem-prior builds CLIP-based semantic movability priors and confidences alongside BEV tensors.
  • --bev-cam-compressed-dim 64 projects high-dimensional camera features to a compact footprint, reducing cache size.
  • --target-dilation controls post-processing of occupancy targets per horizon.

Utilities for inspecting cached content:

  • python tools/inspect_cache_frame.py --frame-dir <cache_root>/<frame_uuid> --out-dir viz/inspect plots LiDAR BEV channels, semantic priors, and cached targets.

Training

  1. Pick or duplicate a config (e.g. config.yaml). Important fields:
    • data_root: path to extracted sequences (can be a temporary scratch directory).
    • dataset_cache_dir: path to the cache built above. The trainer refuses to start if the placeholder "/path/to/cache-root" is left unchanged.
    • train_seqs / val_seqs: UUID lists; reuse the provided YAML helpers or load from cluster/sequences.yaml.
    • deltas, target_dilation: horizons and dilations that must match the cached targets.
    • use_camera, use_lidar, use_sem_prior, sem_gate, use_velocity_channels: ablation switches for fusion inputs.
    • batch_size, epochs, lr, weight_decay, workers, amp: training hyperparameters.
  2. Launch training:
    python -m code.train --config config.yaml
    • Logs, checkpoints, and plots land in runs/<timestamp>/.
    • Mixed precision (amp) and gradient clipping are enabled by default; override in the YAML if needed.

Evaluation

code/eval.py loads a checkpoint and reports IoU/AP/Brier/ECE metrics, optionally per-distance-band or per-class. Example:

python -m code.eval \
  --ckpt runs/checkpoints/best.pt \
  --root /path/to/extracted/aevascenes \
  --dataset-cache-dir /path/to/cache/clip_front_wide_compressed64 \
  --sequences-file cluster/sequences_val.txt \
  --sensor-id front_wide \
  --deltas 0.3 0.6 \
  --target-dilation "0.3:1,0.6:2" \
  --batch-size 2 \
  --use-anno-targets \
  --anno-root /path/to/gt_annotations \
  --x-bands "0:30,30:60,60:100"

Outputs include aggregated tables, TensorBoard scalars, PR/reliability plots, and optional per-class breakdowns.

Visualization

  • python tools/visualize_quali.py --root <data_root> --dataset-cache-dir <cache> --ckpt <checkpoint> --sequence <uuid> --out-dir runs/quali renders LiDAR BEV channels, semantic priors, fused features (PCA), and predicted occupancy masks for each Δ. Add --anno-root to overlay ground-truth occupancy BEVs if you have them.
  • tests/test.py includes point-cloud→image projection utilities useful during calibration debug sessions.

Tips & troubleshooting

  • Missing cache: ensure dataset_cache_dir points to the cache root and that its manifest matches the sequences requested by the loader. Use python tools/inspect_cache_frame.py on a few frames to confirm channel statistics.
  • Config placeholders: config.yaml ships with /path/to/cache-root; replace it before running. The trainer explicitly checks for this placeholder to prevent silent failures.
  • Worker deadlocks: when extracting CLIP features on GPU during preprocessing, set --num-workers 0 (PyTorch limitation). For CPU feature extraction you may increase workers.
  • Changing horizons: re-run preprocessing with the new --deltas/--target-dilation so the cache contains the necessary target_delta* tensors.
  • Ablations: to reproduce LiDAR-only or camera-only baselines, toggle use_camera / use_lidar / use_sem_prior in the config or through CLI overrides in code.eval.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages