Skip to content

Bug: SIGSEGV under memory pressure #779

@jberg5

Description

@jberg5

There are a number of places in the code where OSQP tries to allocate memory, but doesn't check whether the allocation succeeds. In a memory constrained environment, when the process is running near limits, malloc might return NULL, leading to segfaults in production. I've observed this personally running OSQP with an MKL backend, but a Claude-powered code audit suggests these vulnerabilities exist for builtin as well, and I've included a repro for the builtin backend below.

As with all such issues, reproducing in a deterministic way is difficult, but you can do it with an LD_PRELOAD that sets malloc to fail after a certain point:

repro_python_crash.py:

"""
Reproduce OSQP crash under allocation failure from Python.

Requires the failmalloc.so LD_PRELOAD library (see failmalloc.c).

Usage:
  # Single crash (SIGSEGV on unpatched OSQP):
  LD_PRELOAD=./failmalloc.so FAILMALLOC_AFTER=25810 python3 repro_python_crash.py

  # Sweep to find all crash points:
  for i in $(seq 25000 10 26100); do
    output=$(LD_PRELOAD=./failmalloc.so FAILMALLOC_AFTER=$i python3 repro_python_crash.py 2>&1; echo "EXIT:$?")
    exitcode=$(echo "$output" | grep -o "EXIT:[0-9]*" | cut -d: -f2)
    if [ "$exitcode" = "139" ] || [ "$exitcode" = "134" ]; then
      echo "AFTER=$i -> exit=$exitcode"
    fi
  done

The exact FAILMALLOC_AFTER values depend on the Python version, numpy/scipy
versions, and platform. The values above were found on Ubuntu 22.04 with
Python 3.10, numpy 2.2.6, scipy 1.15.3, and OSQP built with the builtin
algebra backend. A successful run uses ~26,087 malloc/calloc calls total.

On unpatched OSQP, several values in the 25800-26080 range produce either
SIGSEGV (exit 139) or SIGABRT (exit 134). SIGSEGV crashes are caused by
NULL pointer dereferences in OSQP's linear system solver initialization
when malloc returns NULL.
"""

import osqp
import numpy as np
from scipy import sparse

P = sparse.triu([[4, 1], [1, 2]], format='csc')
q = np.array([1., 1.])
A = sparse.csc_matrix([[1, 1], [1, 0], [0, 1]])
l = np.array([1., 0., 0.])
u = np.array([1., 0.7, 0.7])

solver = osqp.OSQP()
solver.setup(P, q, A, l, u, verbose=False)
result = solver.solve()
print(f"Solve status: {result.info.status}")

failmalloc.c:

/*
 * LD_PRELOAD library that makes malloc/calloc fail after N successful calls.
 *
 * Build:
 *   gcc -shared -fPIC -o failmalloc.so failmalloc.c -ldl
 *
 * Usage:
 *   LD_PRELOAD=./failmalloc.so FAILMALLOC_AFTER=57 ./repro
 *
 * Environment variables:
 *   FAILMALLOC_AFTER=N   Fail on the (N+1)th allocation and all subsequent.
 */

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <stdio.h>

static int alloc_count = 0;
static int fail_after = -1;
static int initialized = 0;

static void init_failmalloc(void) {
    if (initialized) return;
    initialized = 1;
    const char* env = getenv("FAILMALLOC_AFTER");
    if (env) fail_after = atoi(env);
}

static int should_fail(void) {
    init_failmalloc();
    if (fail_after < 0) return 0;
    alloc_count++;
    if (alloc_count > fail_after) return 1;
    return 0;
}

void* malloc(size_t size) {
    static void* (*real_malloc)(size_t) = NULL;
    if (!real_malloc) real_malloc = dlsym(RTLD_NEXT, "malloc");
    if (should_fail()) return NULL;
    return real_malloc(size);
}

void* calloc(size_t nmemb, size_t size) {
    static void* (*real_calloc)(size_t, size_t) = NULL;
    if (!real_calloc) real_calloc = dlsym(RTLD_NEXT, "calloc");
    if (should_fail()) return NULL;
    return real_calloc(nmemb, size);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions