Draft at NVIDIA cuDSS (GPU sparse linear solver) integration in Ipopt. #849

tonino102008 · 2025-10-29T21:17:54Z

Hi, I am currently trying to integrate NVIDIA cuDSS 0.7.0 using CUDA 13 (https://docs.nvidia.com/cuda/cudss/) in Ipopt as a further sparse linear solver option (working on GPU).

Would you be interested in adding such a feature to Ipopt?

Unfortunately I have no experience with autotools and how to integrate/compile correctly the new files using the existing toolchain (nvcc is needed to compile at least one file). For now I have added some code in the configure script and in the Makefiles to compile everything. The procedure that I am currently using is to first generate a shared library with nvcc (libCuDSS folder) and then to link this library directly to ipopt.

I was able to run the tests shipped with Ipopt and everything seems to work fine (correct results). There is still an issue when trying to re-solve (solve for two consecutive times) a problem (like it's done in hs071_c.c, for example). I am still working on this and I am performing more checks to verify the correctness of the implementation.

I would like to test it with the IPOPT_SINGLE option activated, too.

P.s.: In the future it would be nice to add the multi-gpu and the multi-gpu-multi-node support provided by cuDSS and maybe port ipopt to natively work on GPU (avoiding expensive device to host - and viceversa - data copies) as was done in https://arxiv.org/html/2405.14236v2.

…er options (not finished yet). Will need to modify also the configure and makefile

… cuDSS handles, config, data, etc... to actually call the factor/solve functions.

…echeck to be sure that everything is ok. Now trying to play with configure scripts to understand how to link correct libraries and provide necessary header files (CUDA and cuDSS).

…configuration scripts to install.

…pCuDSS linear solver.

…files to build with cuDSS! Still remains to be solved seg fault in c and with warm restart (maybe trying to reallocate matrix and vector 2 times?)

…) - cudss fails with error CUDSS_STATUS_ALLOC_FAILED (due to pointers being reallocated but not freed before ipopt 2nd restart?).

CLAassistant · 2025-10-29T21:18:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

svigerske · 2025-10-31T10:04:23Z

Would you be interested in adding such a feature to Ipopt?

Yes, definitely. There have been requests for such an interface before. It's great to have someone doing the actual work for this.

I'll need to find some time (and machine) to look into this in more detail and try it out.

tonino102008 · 2025-10-31T18:14:50Z

Great then! I will continue to work on this (testing with the provided examples and try to add the other available cuDSS features).
I am still waiting to sign the contributor agreement (I have asked the company for which I work for if it ok to contribute... I think that by the start of next week I will have a - positive! - answer).

amontoison · 2025-10-31T18:26:31Z

P.s.: In the future it would be nice to add the multi-gpu and the multi-gpu-multi-node support provided by cuDSS and maybe port ipopt to natively work on GPU (avoiding expensive device to host - and viceversa - data copies) as was done in https://arxiv.org/html/2405.14236v2.

It would actually require a complete revamp of Ipopt and a rewrite of the nonlinear solver. I developed the solver described in the paper you mentioned, and many components would need to be adapted, including the automatic differentiation and the way memory is allocated and provided from the GPU.

Even adapting the KKT systems to be solved without an LBL' factorization would already be challenging with your PR, since saddle-point systems are not guaranteed to admit such a factorization (unlike MA27, MA57, PARDISO, or MUMPS).
cuDSS only supports LDL' factorization (where D is diagonal) with 1x1 pivoting.

What the solver typically does internally is perturb the pivots to ensure they are nonzero, but that has a strong impact on the numerical factors and on the quality of the descent direction. As a result, more iterations might be needed, if convergence can even be reached.

For Ipopt, the main benefit would only come from solving the linear systems on GPU, but with all the caveats mentioned above and the additional CPU–GPU communication overhead. Note that solving sparse linear systems is mostly limited by communication rather than computation, so the main advantage of multi-GPU support would be having more memory to store all factors on the device.

I’m not saying it's impossible, but given the development effort, it would essentially mean building a new solver from scratch.

…docs.nvidia.com/cuda/cudss/types.html

tonino102008 · 2025-11-01T11:54:44Z

It would actually require a complete revamp of Ipopt and a rewrite of the nonlinear solver. I developed the solver described in the paper you mentioned, and many components would need to be adapted, including the automatic differentiation and the way memory is allocated and provided from the GPU.

Yes, I guess that than this could be more of a long long term project (especially for the AD part). Maybe it could be possible (and easier) to adapt some AD source-code transformation tool to evaluate the the gradients and hessian directly on GPU.

Even adapting the KKT systems to be solved without an LBL' factorization would already be challenging with your PR, since saddle-point systems are not guaranteed to admit such a factorization (unlike MA27, MA57, PARDISO, or MUMPS).
cuDSS only supports LDL' factorization (where D is diagonal) with 1x1 pivoting.

What the solver typically does internally is perturb the pivots to ensure they are nonzero, but that has a strong impact on the numerical factors and on the quality of the descent direction. As a result, more iterations might be needed, if convergence can even be reached.

Thank you very much for the help and the feedback! Do you think that using the LDU factorization (with global pivoting) avaible in cuDSS (even if this means passing from a symmetric matrix to a general matrix) will improve robustness?

amontoison · 2025-11-01T23:54:02Z

It would actually require a complete revamp of Ipopt and a rewrite of the nonlinear solver. I developed the solver described in the paper you mentioned, and many components would need to be adapted, including the automatic differentiation and the way memory is allocated and provided from the GPU.

Yes, I guess that than this could be more of a long long term project (especially for the AD part). Maybe it could be possible (and easier) to adapt some AD source-code transformation tool to evaluate the the gradients and hessian directly on GPU.

If we can evaluate the objective and constraints on the GPU, it should not be too hard to perform "forward" AD on them to compute gradients, Jacobians, and Hessians.
Note that we recently worked on a feature in Tapenade to compute the sparsity patterns of Jacobians and Hessians for any C or Fortran code.
We don't need a GPU version of the original code to recover the sparsity pattern on GPU, because this is a kind of boolean AD where the only operations involved are bitwise XOR, which can easily be GPU-kernelized.

The coloring itself is purely CPU-friendly (like the ordering for KKT systems), but the directional derivatives can be evaluated on the GPU. Vector-mode AD can easily exploit the GPU to compute compressed columns of Jacobians and Hessians. The same applies to the decompression phase, which can also be efficiently vectorized on GPU.

For reference, I'm adding a blog post about these topics: https://iclr-blogposts.github.io/2025/blog/sparse-autodiff/

Another paradigm would be to use a dedicated modeling language or DSL (like AMPL, JuMP, or GAMS) where we can define the objective and constraints as generators, such that we can "kernelize" the Jacobian and Hessian when we have redundant sub-expressions.
This happens quite often when the constraints are discretized ODEs or PDEs (optimal control is a nice example!), or in energy problems (DCOPF / ACOPF / SCOPF), where the physical equations (Kirchhoff's laws) lead to a small subset of operations that are well suited for SIMD parallelism.
This is what we did in ExaModels.jl, although of course it remains problem-dependent.

In both cases, one issue we will face is how to tell Ipopt whether the vectors are in host, device, or unified memory.
Ipopt was not originally designed for this, and everything passed as void* is expected to be host memory.

Even adapting the KKT systems to be solved without an LBL' factorization would already be challenging with your PR, since saddle-point systems are not guaranteed to admit such a factorization (unlike MA27, MA57, PARDISO, or MUMPS).
cuDSS only supports LDL' factorization (where D is diagonal) with 1x1 pivoting.
What the solver typically does internally is perturb the pivots to ensure they are nonzero, but that has a strong impact on the numerical factors and on the quality of the descent direction. As a result, more iterations might be needed, if convergence can even be reached.

Thank you very much for the help and the feedback! Do you think that using the LDU factorization (with global pivoting) avaible in cuDSS (even if this means passing from a symmetric matrix to a general matrix) will improve robustness?

Yes, I highly expect the LDU factorization of cuDSS to be more robust for the KKT systems in Ipopt.

tonino102008 · 2025-11-03T21:28:30Z

Yes, I highly expect the LDU factorization of cuDSS to be more robust for the KKT systems in Ipopt.

Great then I will proceed to change the interface so that the user can have the possibility to choose between the two algorithms (LDL' - symmetric matrix - and LDU - general matrix -) using Ipopt options.

In both cases, one issue we will face is how to tell Ipopt whether the vectors are in host, device, or unified memory.
Ipopt was not originally designed for this, and everything passed as void* is expected to be host memory.

I guees this remains the main issue to be solved for "porting" it on GPU... Maybe it will be simpler to just fork Ipopt and try to develop a GPU only version rather than trying to adapt everything to work on GPU.

… for appveyor tests).

… to work with cuDSS and avoid ambiguities (some calls need int64_t values for the matrix dimension for example... think it's better to always use int64_t)

…lls with same problem. Compiled and tested with IPOPT_SINGLE option.

tonino102008 and others added 12 commits October 19, 2025 20:03

Started working on cuDSS Sparse Solver Interface. Started adding solv…

f88545d

…er options (not finished yet). Will need to modify also the configure and makefile

Finished writing user modifiable solver options for cuDSS solver.

89d485c

Started working on Initialize Structure and creating all the required…

500a1b3

… cuDSS handles, config, data, etc... to actually call the factor/solve functions.

Finished InitializeImpl and InitializeStructure. Later will have to r…

c5961dd

…echeck to be sure that everything is ok. Now trying to play with configure scripts to understand how to link correct libraries and provide necessary header files (CUDA and cuDSS).

Updated configuration files to compile Ipopt with cuDSS. First try.

845a351

Created cuDSS wrapper library to interface cuDSS with Ipopt. Changed …

f64bb42

…configuration scripts to install.

Small fix to configure script to have everything working.

fa6134c

Updated cuDSS interface with necessary functions for Ipopt. Updated I…

c2b5732

…pCuDSS linear solver.

Fixed bug in cudss matrix update phase. Updated test problem and make…

ce1ebce

…files to build with cuDSS! Still remains to be solved seg fault in c and with warm restart (maybe trying to reallocate matrix and vector 2 times?)

Delete .vscode directory

ce93ff4

Small fix to cuDSS interface.

b78fd34

Updated interfaces. Still to solve 2nd call to ipopt solver (re-solve…

04abee9

…) - cudss fails with error CUDSS_STATUS_ALLOC_FAILED (due to pointers being reallocated but not freed before ipopt 2nd restart?).

Added error checking on cuDSS solver options as explained in https://…

fd4db3e

…docs.nvidia.com/cuda/cudss/types.html

tonino102008 added 3 commits November 4, 2025 19:18

Restored test as they were before (no linear_solver option with cuDSS…

27102ad

… for appveyor tests).

Added support for IPOPT_SINGLE. Switched all integer types to int64_t…

194b677

… to work with cuDSS and avoid ambiguities (some calls need int64_t values for the matrix dimension for example... think it's better to always use int64_t)

Resolved CUDA allocation error when attempting multiple IpoptSolve ca…

8c5fabf

…lls with same problem. Compiled and tested with IPOPT_SINGLE option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft at NVIDIA cuDSS (GPU sparse linear solver) integration in Ipopt. #849

Draft at NVIDIA cuDSS (GPU sparse linear solver) integration in Ipopt. #849

Uh oh!

tonino102008 commented Oct 29, 2025

Uh oh!

CLAassistant commented Oct 29, 2025

Uh oh!

svigerske commented Oct 31, 2025

Uh oh!

tonino102008 commented Oct 31, 2025

Uh oh!

amontoison commented Oct 31, 2025 •

edited

Loading

Uh oh!

tonino102008 commented Nov 1, 2025 •

edited

Loading

Uh oh!

amontoison commented Nov 1, 2025

Uh oh!

tonino102008 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Draft at NVIDIA cuDSS (GPU sparse linear solver) integration in Ipopt. #849

Are you sure you want to change the base?

Draft at NVIDIA cuDSS (GPU sparse linear solver) integration in Ipopt. #849

Uh oh!

Conversation

tonino102008 commented Oct 29, 2025

Uh oh!

CLAassistant commented Oct 29, 2025

Uh oh!

svigerske commented Oct 31, 2025

Uh oh!

tonino102008 commented Oct 31, 2025

Uh oh!

amontoison commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonino102008 commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amontoison commented Nov 1, 2025

Uh oh!

tonino102008 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amontoison commented Oct 31, 2025 •

edited

Loading

tonino102008 commented Nov 1, 2025 •

edited

Loading