CuPIQP is a GPU-accelerated convex Quadratic Programming (QP) solver implementing the PIQP (Proximal Interior Point Quadratic Programming) algorithm entirely on NVIDIA GPUs. Its core strength is solving large batches of small-to-medium QPs in a single GPU launch, while exposing the solve as a differentiable layer for PyTorch and JAX. It also scales to large-scale sparse and dense QPs, in the same class as GPU solvers such as cuClarabel, cuOpt, and QOCO-GPU.
cuPIQP solves convex QPs of the form:
where
-
Native batched solving — solve
$B$ independent QPs in parallel from a single solver instance by stacking inputs along a leading batch axis; the inner kernels operate on(B, …)tensors with no Python-side loop. Built for sampling-based control, RL rollouts, and parameter sweeps. - Differentiable — efficient computation of the VJPs via implicit differentiation by reusing the condensed factor from the forward solve. Integration into PyTorch and JAX are on the way!
- Scales to large QPs — the same solver handles large sparse and dense QPs, competing with GPU solvers such as cuClarabel, cuOpt, and QOQO-GPU.
- Fully GPU-resident solver — all iterations, KKT factorizations, and linear algebra run on the GPU with very few host–device synchronization during solve.
- CUDA Graph capture — solver iterations are recorded as CUDA graphs and replayed with near-zero kernel-launch overhead.
- Versatile problem types — supports general dense and sparse QPs, as well as multistage optimization problems like optimal control problems (OCPs).
- Python 3.10 or later.
- Linux with an NVIDIA GPU and a working CUDA driver/runtime stack.
- CUDA Python packages compatible with the installed CUDA stack. This repository defines extras for CUDA 12.x and CUDA 13.x, including CuPy and nvmath runtime libraries.
cuPIQP is not currently published on PyPI. From a local clone, install it with one CUDA extra:
git clone https://github.com/PREDICT-EPFL/cupiqp.git
cd cupiqp
python -m pip install ".[cuda12]" # choose for a CUDA 12.x CuPy environment
# or:
python -m pip install ".[cuda13]" # choose for a CUDA 13.x CuPy environmentIf an appropriate CuPy installation is already present in the environment, the base local install is:
python -m pip install .import cupy as cp
from cupiqp import DenseSolver
solver = DenseSolver()
solver.settings.verbose = True
solver.setup(P=cp.eye(3), c=cp.zeros(3))
solver.solve()Pulled automatically by the relevant extras above:
- CuPy — GPU array library (
cupy-cuda12xorcupy-cuda13x). - Warp — JIT-compiled CUDA kernels.
- nvmath-python — cuBLAS / cuSOLVER / cuSPARSE / cuDSS bindings and CUDA runtime packages via the selected CUDA extra.
- NVTX — profiling annotations.
- socu — required by the
MultistageSolveras the linear system solver.
Refer to this simple example to get started.
CuPIQP implements the same Proximal Interior Point algorithm as PIQP, targeting large-scale QPs on NVIDIA GPUs:
| PIQP (CPU) | CuPIQP (GPU) | |
|---|---|---|
| Language | C++ (with C / Python / Matlab / Julia / Rust bindings) | Python (CuPy + Warp) |
| Execution | CPU (multi-threaded via OpenMP) | Fully GPU-resident (CUDA) |
| Batched solving | Designed for single solves | Designed for batched solves with massive parallelism |
| Differentiable | No | Yes, via implicit differentiation |
If you use cuPIQP in academic work, please cite the underlying PIQP algorithm paper and this implementation. A BibTeX entry will be provided once a cuPIQP-specific publication is available.
BSD-2-Clause. See LICENSE.