Optimize Levenberg Marquardt elimination by caching JunctionTree.#2340
Conversation
|
woohoo, that's a big win! Will review Thursday. |
There was a problem hiding this comment.
Pull request overview
This PR introduces an optimization to Levenberg-Marquardt iterations by caching the elimination tree structure across iterations. A new JunctionIndexEliminationTree class stores cluster roots using factor indices instead of full factor objects, avoiding repeated elimination tree construction.
Key changes:
- New
JunctionIndexEliminationTreeclass that caches elimination tree structure using factor indices - Modified
GaussianFactorGraph::optimize()to use cached elimination tree when available - Added caching mechanism in
LevenbergMarquardtOptimizerfor the damped system elimination tree
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| gtsam/linear/JunctionIndexEliminationTree.h | Header defining the new JunctionIndexEliminationTree class for cached elimination |
| gtsam/linear/JunctionIndexEliminationTree-inl.h | Internal IndexedCluster data structure implementation |
| gtsam/linear/JunctionIndexEliminationTree.cpp | Implementation of index-based elimination tree construction and elimination |
| gtsam/nonlinear/LevenbergMarquardtOptimizer.h | Added cached elimination tree member variable |
| gtsam/nonlinear/LevenbergMarquardtOptimizer.cpp | Cache creation and usage in tryLambda method |
| gtsam/linear/GaussianFactorGraph.h | Added setCachedClusterRoots method and cache member |
| gtsam/linear/GaussianFactorGraph.cpp | Modified optimize methods to use cached elimination tree |
| gtsam/linear/tests/testJunctionIndexEliminationTree.cpp | Comprehensive unit tests for JunctionIndexEliminationTree |
|
@tzvist I have not read the code in detail yet; it would help me to know the philosophy a little bit better. How do the new values of the damped system enter into the equation? Am I right to assume the system is still built as usual, but because you’re using indices, the new factors are accessed? |
Yes, the damped system is built exactly the same as before (the buildDampedSystem call at line 143) but instead of recomputing the elimination structure every time, we now cache a JunctionIndexEliminationTree that only stores the indices into the factor graph. |
|
I think you have the right idea here. I’ve actually used this idea as well recently in the new multifrontal solver. By the way, if you’re not aware of it, here is the link to the last PR that I just submitted: #2343 That new solver is on track to be integrated with the nonlinear optimizers, to get hopefully much more speed-up than 75% even. But I think that your idea can be used to speed up any eliminatable factor graph, including hybrid, so I’m interested in exploring this regardless. Also only linear systems without constraints can be sped up by the new multifrontal solver. That being said, I prefer if we could discuss the API and make it a little bit more functional. Would you be available to meet some time next week? |
Sure, I’d be happy to discuss the API. |
|
Awesome. Here is a link that knows about time zones and my calendar :-) https://calendly.com/dellaert/teams-meeting |
I have two more perforce oriented MR: Maybe it will be best if you can try taking a look at them also and then we can meet next week and discuss them all. |
|
Please take a look at the junction tree I built in the multi-frontal solver. I use a symbolic factor with some extra payload. I think it might be a good strategy to template the symbolic factor with an optional payload, so we can just use the strategy in both places. In terms of using this junction tree (or elimination tree), which will now store the extra payload in its clique factors, I'd rather see that used in a functional interface rather than storing something in a mutable field in the Gaussian FG |
I have added Refactor MultifrontalSolver to use JunctionIndexEliminationTree i started doing the templating but this was an simpler approach, is this good in you opinion? also do you think we should try to add the mergeSmallClusters also to existing LM? |
|
I think we're a little tripping over ourselves here :-) I would prefer to hold up on this one. I'm hesitant about adding entire junction trees and cluster trees. Basically, by only adapting the symbolic factor a tiny little bit we get all the machinery without touching those larger classes. I have another refactor in the works that splits out the SymbolicAnalysis for the new solvers. For that, I might try my hand at the templated approach first - potentially accommodating your case as well. When done, you can check that out and see whether we can take your idea and make that work in a functional API. |
ea2455d to
bbf7783
Compare
Could you review the last two commits? I think they’re close to what you had in mind. |
dellaert
left a comment
There was a problem hiding this comment.
Alright, I think we are starting to converge :-)
I still have structural comments and I'm hoping that you'll bear with me while we collaborate on making this a really cool addition to GTSAM :-)
c9b2998 to
9ee0d42
Compare
dellaert
left a comment
There was a problem hiding this comment.
Okay, this is brilliant. Can you feel the stars align? :-)
|
Super happy with this right now. I propose we also wait for the benchmarking PR by @ProfFan so we can merge with confidence. |
cool, maybe it will be interesting to compare the multifrontal solver with the LM after my changes. |
3bc79fa to
a3d41d8
Compare
|
/bench |
timeSFMBAL benchmark
Worker runs
|
|
/bench |
|
@dellaert Benchmark results look good :) |
They look mixed right? seems like 25% slower on Mac no? i’m wondering whether that BAL16 dataset is too small, as well. I’ve been using the 135 cameras. |
|
Also, I’m now thinking we should do with and without TBB |
|
I can add that |
|
/bench |
Refactor IndexedSymbolicFactor into shared IndexedJunctionTreeBuilder.h, enabling efficient reuse of junction tree structure across multiple nonlinear optimization iterations. Key changes: - Add IndexedJunctionTreeBuilder.h with buildIndexedJunctionTree() template - Add GaussianFactorGraph::optimize() overload using cached junction tree - Cache SymbolicJunctionTree in NonlinearOptimizer for multifrontal solve - Refactor MultifrontalSolver and NonlinearMultifrontalSolver to use shared builder functions Performance improvement: reduces runtime from ~120s to ~90s (~25% faster) in specific test cases by avoiding repeated elimination tree construction.
a3d41d8 to
8633b7b
Compare
rebased maybe this will help |
|
/bench |
|
btw bench results will be refreshed at the original comment |
|
Oops - does not look good with TBB on… @tzvist any theories? |
Turns out I had accidentally removed the parallel eliminate implementation. Luckily this benchmark exposed it. pushed a fix commit. |
|
/bench |
Now aligned with ClusterTree imp.
6ce0fa6 to
1cc5dec
Compare
|
My example running with TBB 12 cores After |
please rerun again |
|
/bench |
|
@ProfFan I cannot find the new benchmark results. |
They failed |
|
Oops. How can that be? |
|
CI fails as well. - @tzvist please check if you can compile (and build wrapper) locally? |
1f50b08 to
1cc5dec
Compare
|
removed problematic commit |
|
/bench |
|
@dellaert bench results is up |
Add JunctionIndexEliminationTree class that stores cluster roots using factor indices instead of full factor objects, enabling efficient reuse of elimination tree structure across multiple LevenbergMarquardt iterations.
Key changes:
Performance improvement: reduces runtime from ~120s to ~90s (~25% faster) in a specific test cases by avoiding repeated elimination tree construction.