Skip to content

Latest commit

 

History

History
154 lines (127 loc) · 5.78 KB

File metadata and controls

154 lines (127 loc) · 5.78 KB

TODO

Completed

  • Setup project structure
  • Add dependencies (OpenCV, nalgebra, bevy, serde)
  • Read video frames from file
  • Convert frames to grayscale
  • Detect ORB features
  • Create OrbDetector wrapper (modularized in src/feature/)
  • Write basic tests for feature detection
  • Match features between frames
  • Create FeatureMatcher with BFMatcher
  • Filter matches by distance
  • Create visualizer example
  • Extract matched point coordinates (Point2f)
  • Create CameraIntrinsics struct
  • Compute essential matrix from point pairs
  • Recover pose (R, t) from essential matrix
  • Convert OpenCV Mat to nalgebra types
  • Build 4x4 transformation matrix
  • Initialize global pose tracking
  • Update global pose (compose transformations)
  • Store trajectory points
  • Calculate total distance traveled
  • Save trajectory to JSON file
  • Add unit tests for pose recovery
  • Add example for full visual odometry pipeline with trajectory visualization
  • Implement keyframe selection (translation, rotation, match quality criteria)
  • Add 3D point triangulation/mapping
  • Create MapPoint struct for 3D points
  • Implement Triangulator for computing 3D points from 2D correspondences
  • Add point cloud export (PLY and JSON formats)
  • Create point cloud example with triangulation
  • Add real-time 3D visualization with Rerun
  • Map management - track points, deduplicate, prune outliers
  • Point reobservation - match against existing map points
  • Bundle adjustment - local BA for refining poses and points

Current Issues

  • Point cloud is sparse - only ~1000-2000 ORB features per frame
  • This is VO, not SLAM - missing loop closure and global optimization
  • Scale drift - no absolute scale, accumulates error over time
  • Bundle Adjustment optimize function really needs an optimization, LU isn't the way to go, perhaps a port from COLMAP could benefit.

SLAM Roadmap (Priority Order)

Basic SLAM Infrastructure (Current Focus)

  • Visual odometry (camera tracking)
  • Sparse 3D reconstruction (triangulation)
  • Real-time visualization with Rerun - see what's being mapped!
  • Map management - track which points exist, deduplicate, prune outliers
  • Point reobservation - match against existing map points, not just previous frame
  • Local bundle adjustment - optimize sliding window of keyframes and points
  • Monocular Depth Estimation - MonoDepth2 integration with tch-rs
  • Local mapping - maintain sliding window of recent keyframes and points with BA integration

Dense/Semi-Dense Reconstruction

  • Increase point density
    • Use SIFT/SURF (more features than ORB)
    • Semi-dense tracking (high gradient pixels, not just corners)
    • Depth map estimation between keyframes
  • Depth filtering - probabilistic depth estimation for each pixel
  • Depth fusion - merge depth estimates from multiple views

Loop Closure & Global Optimization

  • Place recognition - DBoW2/DBoW3 for detecting revisited locations
  • Loop closure detection - geometric verification of loop candidates
  • Pose graph optimization - correct drift when loop is detected
  • Global bundle adjustment - optimize all poses and points together (expand current local BA)

Robustness & Production

  • Relocalization - recover from tracking loss
  • Map saving/loading - persist maps between runs
  • Multi-threading** - separate tracking, local mapping, loop closing threads
  • IMU integration - use IMU for better tracking (VI-SLAM)
  • Camera calibration module - estimate intrinsics from video

Phase Future

  • Support stereo cameras (true scale from stereo baseline)
  • Support RGB-D cameras (direct depth from sensor)
  • Object detection integration
  • Semantic SLAM (label 3D points with objects)
  • Neural depth estimation (monocular depth networks)

Technical Debt

  • Handle degenerate cases (insufficient matches, pure rotation, etc.)
  • Better error handling throughout
  • More comprehensive tests
  • Benchmark on KITTI dataset with ground truth comparison
  • GPU acceleration (feature detection, matching)
  • Add more camera presets

Suggest TODO?

  • Contributor? Please submit a pull request or add a TODO here with a ticket.

How to Run

Run feature visualizer:

cargo run --example visualize_features /path/to/video.mp4

Run full visual odometry with trajectory:

# Use default KITTI intrinsics
cargo run --example visual_odometry /path/to/video.mp4

# Specify custom camera intrinsics
cargo run --example visual_odometry /path/to/video.mp4 -- --fx 500 --fy 500 --cx 320 --cy 240

Run point cloud generation with triangulation:

# With Rerun 3D viewer (shows map, trajectory, matches, video in real-time!)
cargo run --example point_cloud --features rerun /path/to/video.mp4 -- --rerun

# Or save to PLY file (default, no Rerun)
cargo run --example point_cloud /path/to/video.mp4 -- --save-ply

# With custom camera intrinsics
cargo run --example point_cloud --features rerun /path/to/video.mp4 -- --rerun --fx 718.856 --fy 718.856 --cx 607.1928 --cy 185.2157

Run bundle adjustment demo:

cargo run --example bundle_adjustment

Run depth estimation:

# Single image
cargo run --example depth_estimation --features depth -- test.jpg --encoder weights/encoder.pt --decoder weights/depth.pt

# Video with Rerun visualization
cargo run --example depth_estimation --features depth,rerun -- test.mp4 --cuda --rerun

# Video with OpenCV (no Rerun)
cargo run --example depth_estimation --features depth -- test.mp4 --cuda --save

See docs/Deep-Learning.md for model installation and setup instructions.

Run tests:

cargo test

Run main:

cargo run -- /path/to/video.mp4