Skip to content

Conversation

@cdtwigg
Copy link
Contributor

@cdtwigg cdtwigg commented Jan 21, 2026

Summary:
Changes the tJacobian_ and tResidual_ member variables in SolverFunctionT from
MatrixX/VectorX to ResizeableMatrix. This provides more efficient memory management:

  • ResizeableMatrix uses std::vector internally which doesn't reallocate when size
    decreases, only when capacity is exceeded
  • Uses resizeAndSetZero() which combines resize and zeroing in one operation
  • Provides aligned memory access via Eigen::Aligned16 maps

Benchmarks show ~1% improvement for double-precision workloads on the getJtJR()
path. Float performance is unchanged.

Also added ResizeableMatrix benchmarks to jtj_interface_benchmark.cpp for
comparison testing.

Reviewed By: jeongseok-meta

Differential Revision: D90813663

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 21, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Jan 21, 2026

@cdtwigg has exported this pull request. If you are a Meta employee, you can view the originating Diff in D90813663.

cdtwigg added a commit that referenced this pull request Jan 21, 2026
…976)

Summary:
Pull Request resolved: #976

Changes the tJacobian_ and tResidual_ member variables in SolverFunctionT from
MatrixX/VectorX to ResizeableMatrix. This provides more efficient memory management:

- ResizeableMatrix uses std::vector internally which doesn't reallocate when size
  decreases, only when capacity is exceeded
- Uses resizeAndSetZero() which combines resize and zeroing in one operation
- Provides aligned memory access via Eigen::Aligned16 maps

Benchmarks show ~1% improvement for double-precision workloads on the getJtJR()
path. Float performance is unchanged.

Also added ResizeableMatrix benchmarks to jtj_interface_benchmark.cpp for
comparison testing.

Reviewed By: jeongseok-meta

Differential Revision: D90813663
cdtwigg added a commit that referenced this pull request Jan 29, 2026
…976)

Summary:
Pull Request resolved: #976

Changes the tJacobian_ and tResidual_ member variables in SolverFunctionT from
MatrixX/VectorX to ResizeableMatrix. This provides more efficient memory management:

- ResizeableMatrix uses std::vector internally which doesn't reallocate when size
  decreases, only when capacity is exceeded
- Uses resizeAndSetZero() which combines resize and zeroing in one operation
- Provides aligned memory access via Eigen::Aligned16 maps

Benchmarks show ~1% improvement for double-precision workloads on the getJtJR()
path. Float performance is unchanged.

Also added ResizeableMatrix benchmarks to jtj_interface_benchmark.cpp for
comparison testing.

Reviewed By: jeongseok-meta

Differential Revision: D90813663
Summary:
Introduced a new block-wise interface for computing Jacobians that allows
solvers to access individual Jacobian blocks without requiring the full Jacobian
matrix to be materialized in memory. Previously, getJtJR() computed JtJ block-by-block
internally, which limited optimization opportunities in the solver. The new interface
exposes initializeJacobianComputation(), getJacobianBlockCount(), getJacobianBlockSize(),
and computeJacobianBlock() methods that enable the solver to control how blocks are
processed and combined.

Both getJacobian() and getJtJR() are now implemented in the parent SolverFunctionT
class using the block interface, removing code duplication from SkeletonSolverFunctionT.
The block interface becomes the primitive, with the full-matrix methods being derived
implementations. This provides flexibility for solvers to implement custom optimization
strategies while maintaining backward compatibility.

Differential Revision: D89145782
Summary:
Split the GaussNewtonSolver into two separate classes: GaussNewtonSolverT
(dense) and SparseGaussNewtonSolverT (sparse). Previously, GaussNewtonSolver automatically
switched between dense and sparse implementations based on sparseMatrixThreshold,
creating code complexity and mixing two distinct implementations in one class.

Changes:
- Created SparseGaussNewtonSolverT in gauss_newton_solver_sparse.{h,cpp}
- Moved all sparse-specific code (sparse matrix operations, SimplicialLLT, etc.) to sparse solver
- Removed sparse-specific options from GaussNewtonSolverOptions (useBlockJtJ, directSparseJtJ, sparseMatrixThreshold)
- Updated GaussNewtonSolverT to be dense-only, removed doIterationSparse() method
- Updated all tests to remove references to removed sparse options
- Updated transform_pose.cpp and other users to not set removed options
- Added SparseGaussNewtonSolverOptions with sparse-specific options

Benefits:
- Clearer separation of concerns - each solver focuses on one implementation
- Simpler code - no runtime switching between dense/sparse
- Easier to maintain and test each solver independently
- Users explicitly choose which solver to use based on their needs

Differential Revision: D89145780
Summary:
The base class SolverT::solve() already initializes lastError_ to max before
calling initializeSolver(), so the derived solvers don't need to do it again.

Differential Revision: D91158885
Summary:
The alpha_ member was intended to allow per-iteration adjustment of the
regularization parameter, but it was never actually modified during iterations.
It was simply copied from regularization_ at initialization and used unchanged.
This removes the unnecessary indirection and uses regularization_ directly.

Differential Revision: D91158884
Summary:
Refactors GaussNewtonSolverQRT to use the generic SolverFunctionT interface with
the new blockwise Jacobian API instead of directly depending on SkeletonSolverFunctionT.

Key changes:
- Constructor now takes SolverFunctionT* instead of SkeletonSolverFunctionT*
- Removed internal SkeletonState and MeshState management (now handled by SolverFunction
  via initializeJacobianComputation)
- Replaced direct iteration over error functions with the blockwise interface:
  getJacobianBlockCount(), getJacobianBlockSize(), computeJacobianBlock()
- Added vec() accessor to ResizeableMatrix for single-column matrix access

This makes the QR solver more generic and decoupled from skeleton-specific code,
allowing it to work with any SolverFunction that implements the blockwise interface.

Differential Revision: D90812388
…port

Summary:
Two related optimizations to GaussNewtonSolver:

**Chunked Jacobian Accumulation:**
- Uses a single large JᵀJ matrix multiplication per chunk instead of many small rankUpdate() calls
- Adds targetRowsPerChunk option to control memory vs speed trade-off:
  - SIZE_MAX (default): Process all rows in one chunk (fastest, most memory)
  - 0: Process each error function block separately (slowest, minimum memory)
  - Intermediate values: Group blocks via bin-packing up to target row count
- Uses ResizeableMatrix for Jacobian/residual storage to avoid reallocations

**BlockJtJ Support:**
- Adds useBlockJtJ option with proper handling of non-contiguous parameter subsets
- Simplified parameter mapping to a simple vector<int> enabledParameters_
- Added computeJtJFromBlockJtJ() with in-place row/column compaction, avoiding temporary allocation
- In-place approach shows 6-17% speedup over temporary-allocation approach

Performance results (ManyBlocks benchmark with 26 error functions, 10 iterations):
| Solver              | float (μs) | double (μs) |
|---------------------|------------|-------------|
| GaussNewton         | 244        | 257         |
| SubsetGaussNewton   | 243        | 256         |

GaussNewton now matches SubsetGaussNewton performance (within 1%).

Differential Revision: D89145781
…976)

Summary:
Pull Request resolved: #976

Changes the tJacobian_ and tResidual_ member variables in SolverFunctionT from
MatrixX/VectorX to ResizeableMatrix. This provides more efficient memory management:

- ResizeableMatrix uses std::vector internally which doesn't reallocate when size
  decreases, only when capacity is exceeded
- Uses resizeAndSetZero() which combines resize and zeroing in one operation
- Provides aligned memory access via Eigen::Aligned16 maps

Benchmarks show ~1% improvement for double-precision workloads on the getJtJR()
path. Float performance is unchanged.

Also added ResizeableMatrix benchmarks to jtj_interface_benchmark.cpp for
comparison testing.

Reviewed By: jeongseok-meta

Differential Revision: D90813663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants