Skip to content

API ergonomics: intervention signal + bulk request-header ABI #93

@jptosso

Description

@jptosso

While building a Python SDK on top of libcoraza (Flask / FastAPI / Starlette middleware) I ran into three API shapes that cost real cycles or correctness at runtime and tripped me up during development. They're unrelated enough to fix independently but landing them together is cheap and gives downstream bindings a much cleaner shape.

Profile (Ryzen 7 7800X3D, CRS v4.11, 550 rules, benign GET /search?q=hello+world, Python 3.13, cffi):

Phase Cost
coraza_process_request_body 224 µs
coraza_process_request_headers 124 µs
coraza_process_logging + coraza_intervention polling 17 µs
Other libcoraza calls (connection, uri, 5× add_request_header, new/free tx) ~4 µs
Python wrapper layer 39 µs
Total 408 µs/txn

None of the three items below is on the critical path, but all three are friction points a binding author hits immediately.


1. Interruption signaling is out-of-band

Each phase function returns 0 on success and 0 on disruption. The only way to detect that Coraza decided to block is to poll coraza_intervention() after every interruptible phase (1, 2, 3, 4). That:

  • Confused every binding author I know. When I first wrote the Python wrapper I assumed rc == 1 meant interrupted (the way coraza-proxy-wasm signals). Integration tests passed against benign traffic but silently failed to block attacks, because the middleware never detected the disruption. This is a bypass-shaped bug in every downstream binding that follows the same wrong assumption.
  • Forces a round-trip per phase. coraza_intervention() is a cgo boundary cross + a Go lookup + an alloc of the returned struct (plus a mandatory coraza_free_intervention). Each call is cheap (~5 µs) but we pay it on every phase on every request, including the 99% of requests that aren't disrupted.

Proposed fix — return 1 on disruption

ProcessRequestHeaders, ProcessRequestBody, ProcessResponseHeaders, ProcessResponseBody already have a thread-local disruption flag internally (tx.interrupted). Plumb it to the return code:

//export coraza_process_request_headers
func coraza_process_request_headers(t C.coraza_transaction_t) C.int {
    tx := fromRaw[types.Transaction](t)
    tx.ProcessRequestHeaders()
    if tx.IsInterrupted() {
        return 1
    }
    return 0
}

Happy path: zero behavior change, same alloc profile.
Disrupted path: caller learns at phase-return time, calls coraza_intervention() once, and can skip subsequent phases.

This is a breaking ABI change for return codes. Easiest rollout:

  • Add coraza_process_request_headers_v2 (returning 0/1 properly), deprecate the old one.
  • OR bump the major, document "return codes: 0 = ok, 1 = interrupted, -1 = error" like most C WAF wrappers do.

Either way, the cgo return value becomes meaningful and downstream bindings stop needing their per-phase polling loop.


2. Bulk request-header ABI

coraza_add_request_header is one call per header. For a typical request with 5 headers, that's 5 cgo boundary crossings + 5 C.GoString allocations. Each boundary is ~300 ns cgo overhead + a Go-heap string alloc of (name+value) bytes.

Relative cost in my profile is low (~2 µs / 1.5% of txn time) but the pattern forces binding authors to loop in Python/Ruby/whatever and pay N times the round-trip. A bulk API would let the binding send one packed buffer.

Proposed signature

int coraza_add_request_headers(
    coraza_transaction_t t,
    const char *packed,
    size_t packed_len,
    int count
);

Encoding (one byte-buffer, single alloc on the caller side):

[name_len u16][name_bytes][value_len u32][value_bytes]   × count

u16 for name (HTTP disallows >64KB header names; 255 would be tight) and u32 for value because some auth headers get huge.

Go side parses the buffer in one goroutine, calls the existing tx.AddRequestHeader(name, value) N times, returns 0 on success.

Additive only. No breaking change. Minor bump.

Same idea would apply to coraza_add_response_headers and coraza_add_get_args but #1 is the one that matters for hot-path workloads.


3. coraza_matched_rule_get_error_log has no paired coraza_free_string

coraza_matched_rule_get_error_log returns a char * with a comment stating that the caller is responsible for freeing it. No coraza_free_string is exported. Callers are expected to call libc free() on the returned pointer.

Problems this causes for binding authors:

  • Allocator mismatch on Windows. The Go runtime builds with the MinGW CRT when compiled as a c-shared library, while Python on Windows is built against the MSVC CRT. free() from MSVC on a pointer allocated by C.CString in MinGW's Go runtime is undefined behavior and crashes in release builds. Same trap exists for Rust bindings built with MSVC toolchain. This isn't theoretical — it's the single most common trap with Go c-shared DLLs on Windows.
  • Harder to bind correctly in safe languages. Rust's CString::from_raw wants a pointer that was allocated by Rust, not a Go-side C.CString. Same for managed languages that wrap the pointer in a SafeHandle-style finalizer — they need a known free function to call, not libc's.
  • No way to interpose. If an allocator is different (e.g. jemalloc preloaded, tcmalloc), the mismatch on free becomes a corrupted heap.

A paired free function also makes the ownership contract obvious at the symbol level — you see coraza_matched_rule_get_error_log in the header and immediately know to look for coraza_free_* next to it.

Proposed fix

Add a symmetric coraza_free_string(char *s) that calls Go's C.free(unsafe.Pointer(s)) (which in turn routes through Go's cgo-allocated memory correctly regardless of host CRT):

//export coraza_free_string
func coraza_free_string(s *C.char) {
    C.free(unsafe.Pointer(s))
}

Document coraza_matched_rule_get_error_log explicitly: "returned pointer must be released with coraza_free_string; do not call libc free directly."

Additive only. No breaking change. Minor bump.

Also consider: if there are other string-returning APIs we don't have today but might add (rule metadata, tag enumeration, debug dumps), having coraza_free_string on the table early saves a future compatibility break.


Why this matters for downstream bindings

coraza-node (WASM) hides most of this at the TS layer with a custom ABI. For C-linked bindings (Python/Ruby/Java/Rust via SWIG or FFI), we're on the public C surface, and every language ecosystem hits the same sharp edges above. #1 is correctness (bypass-shaped), #2 is ergonomics (~1% perf), #3 is portability (Windows + safe-language bindings).

I'm happy to prototype the three changes against the current main and send a PR if you'd like. Flag your preference for migration path on #1 (versioned symbol vs ABI major bump) and I'll match it. #2 and #3 are purely additive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions