Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age

Author: Cesar Cadena

Year: 2016

Notes:

Introduction:

Do robot need SLAM? Is SLAM solved?
(1986-2004) = classical age with introduction of basic probabilistic formulations
(2004-2015) = algorithmic analysis age study of observability, convergence and consistency, role of sparsity
recent odometry have very low drift (<0.5 % of the trajectory) do we need slam?
loop closure enables the robot to understand topology
Vision based SLAM with slowly moving robots can be considered as a mature research field
(2015-) = robust perception age self tuning capabilities, high level understanding, resource awareness

Anatomy:

Posterior = belief over X given the measurements
when there is no prior p(x) maximum a posteriori estimation becomes maximum likelihood estimation
unlike Kalman, MAP formulation doesn't require a distinction between motion and measurement model but Kalman and MAP returns the same estimate in the linear Gaussian Case
connectivity of the factor graph influences the sparsity of the SLAM problem
the mismatch between filtering and MAP estimation gets smaller when the linearization point is good enough

Long term autonomy robustness:

addressing failure modes for long term exploration
bag of word address the problem of brute force matching on long term data association
make the system failure aware? -> recovery mechanism with a tighter communication between front end and back end
for metric relocalization, using place recognition with a sensor (camera) but registration with another (lidar)
automatic parameter tuning

Long term autonomy scalability:

design SLAM systems with memory complexity bounded
sparsification (YEAH!) vs out of core algorithm (parrallel slam)
continuous time trajectory estimation
open pb: map representation (compressed known map), what can be forgotten?, severe computationnal constraint

Metric map models:

in 2D either occupancy grid or landmark based
TSDF truncated signed distance function space discretized into a set of voxels $v$ associated with $d(v)$ the distance of the 3D point on the nearest surface and a weight $w(v)$
direct methods require GPU for RT
how should we choose optimal representation?

Semantic Map models:

move from path planning to task planning
topological mapping: building a graph with distinguishable places
SLAM helps semantic: use of monoSLAM to boost object detection in videos, 3D map for classification
Semantic helps SLAM: use prior knowledge of object shape to improve the map
joint SLAM and semantic inference
semantic based reasonning, actionnability, levels of semantic... many concepts that comes with it

New theoretical tools for SLAM

factor graph optimization provides an elegant frame- work which is more amenable to analysis
Carlone show that 2D rotation estimation can be computed in closed form
Carlone and Dellaert show that in most cases ML estimate is unique and pose graph can be solved globally via convex semidefinite programming
computing a suitable initialization is also a good start
theoretical work done on posegraph, can it be extended to other type of factor graphs?

Active SLAM:

problem of controlling robots motion in order to minimize the uncertainty of its map representation and localisation
3 steps:
- selecting vantage points
- computing the utility of each action thinking about the evolution of the posterior over the robot pose and the map
- estimating if a task is over
predicting the effects of future action is an expensive task => when do we switch between active SLAM and passive SLAM

New frontiers:

new sensors: depth cameras, light field cameras (?)
event based: doesn't send frame, but only pixel changes, 1MHZ, 140dB of dynamic range => SLAM for high speed motion and high dynamic range
BUT high frame rate makes SLAM intractable because of amount of data, low spatial resolution
deep learning: depth estimation, 6DOF localization, inter frame pose with DNN