Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 28 additions & 12 deletions coraza.i
Original file line number Diff line number Diff line change
Expand Up @@ -184,11 +184,14 @@ int coraza_set_debug_log_callback(coraza_waf_config_t cfg, PyObject *cb) {
%typemap(jstype) (const unsigned char *data, int length) "byte[]"
%typemap(javain) (const unsigned char *data, int length) "$javainput"
%typemap(in) (const unsigned char *data, int length) {
$1 = (unsigned char *)JCALL2(GetByteArrayElements, jenv, $input, NULL);
/* GetPrimitiveArrayCritical avoids a copy on most JVMs by pinning the
* heap array directly. The critical section ends in argout below; no
* other JNI calls may occur between Get and Release. */
$1 = (unsigned char *)JCALL2(GetPrimitiveArrayCritical, jenv, $input, NULL);
$2 = (int)JCALL1(GetArrayLength, jenv, $input);
}
%typemap(argout) (const unsigned char *data, int length) {
JCALL3(ReleaseByteArrayElements, jenv, $input, (jbyte *)$1, JNI_ABORT);
JCALL3(ReleasePrimitiveArrayCritical, jenv, $input, (jbyte *)$1, JNI_ABORT);
Comment on lines +190 to +194
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GetPrimitiveArrayCritical here keeps the Java byte[] pinned until the typemap(argout) runs (i.e., after the Go call returns). That can block/stall JVM GC for the full duration of request/response body processing, and the current code also calls GetArrayLength while the array is in a critical section (contradicting the comment that no other JNI calls may occur). Prefer a non-critical access pattern (e.g., GetByteArrayElements / GetByteArrayRegion) and ensure any critical region is entered only after length is known and held for the shortest possible time.

Copilot uses AI. Check for mistakes.
}
Comment on lines 186 to 195
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, locate and examine the coraza.i file around lines 186-195
find . -name "coraza.i" -type f

Repository: corazawaf/libcoraza

Length of output: 73


🏁 Script executed:

#!/bin/bash
# Check the structure to understand the codebase layout
git ls-files | head -30

Repository: corazawaf/libcoraza

Length of output: 613


🏁 Script executed:

#!/bin/bash
# Search for coraza_append_request_body and coraza_append_response_body
rg "coraza_append_(request|response)_body" --type c

Repository: corazawaf/libcoraza

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for callback registration patterns in Go files
rg "call_debug_log_cb|call_error_cb" --type go -A 5

Repository: corazawaf/libcoraza

Length of output: 1064


🏁 Script executed:

#!/bin/bash
# Read coraza.i around lines 186-195
sed -n '180,200p' coraza.i

Repository: corazawaf/libcoraza

Length of output: 1062


🏁 Script executed:

#!/bin/bash
# Search for body append functions more broadly
rg "append.*body|body.*append" --type-list | head -5
rg "append.*body|body.*append" -i

Repository: corazawaf/libcoraza

Length of output: 5219


🏁 Script executed:

#!/bin/bash
# Look at the Go file structure to understand callback registration
sed -n '100,180p' libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 3034


🏁 Script executed:

#!/bin/bash
# Check if there are any references to WAF body processing functions
rg "waf.*body|body.*waf" -i

Repository: corazawaf/libcoraza

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Get the full implementation of coraza_append_request_body
rg -A 20 "//export coraza_append_request_body" libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 872


🏁 Script executed:

#!/bin/bash
# Get the full implementation of coraza_append_response_body
rg -A 20 "//export coraza_append_response_body" libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 753


🏁 Script executed:

#!/bin/bash
# Check if AppendRequestBody or AppendResponseBody can trigger callbacks
rg -A 10 "AppendRequestBody|AppendResponseBody" libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Look at lines around where the callbacks are actually used
sed -n '250,350p' libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 3093


🏁 Script executed:

#!/bin/bash
# Check what the WAF does during body processing - search for callback invocations
rg "WithErrorCallback|WithDebugLogger" libcoraza/coraza.go -B 2 -A 10

Repository: corazawaf/libcoraza

Length of output: 1525


🏁 Script executed:

#!/bin/bash
# Check JNI documentation about GetArrayLength in critical sections
# Also search for what actually happens during WriteRequestBody/WriteResponseBody
# Check if there's Go interface or type definition showing what WriteRequestBody does
rg "WriteRequestBody|WriteResponseBody" libcoraza/coraza.go -B 5 -A 5

Repository: corazawaf/libcoraza

Length of output: 796


🏁 Script executed:

#!/bin/bash
# Check if the WAF processes/matches rules during WriteRequestBody or only during ProcessRequestBody
rg "WriteRequestBody|WriteResponseBody" -i --type go | head -20

Repository: corazawaf/libcoraza

Length of output: 368


🏁 Script executed:

#!/bin/bash
# Look at the coraza.i file more broadly to understand all the typemaps
sed -n '180,250p' coraza.i

Repository: corazawaf/libcoraza

Length of output: 3522


🏁 Script executed:

#!/bin/bash
# Check if there are any other JNI calls or callback invocations that happen during body writing
# Look for what the Java callback examples do
cat examples/java/CorazaDebugLogCallback.java

Repository: corazawaf/libcoraza

Length of output: 982


🏁 Script executed:

#!/bin/bash
# Check the error callback example too
cat examples/java/CorazaErrorCallback.java

Repository: corazawaf/libcoraza

Length of output: 1038


GetArrayLength inside GetPrimitiveArrayCritical violates the JNI critical-section contract.

Line 191 calls GetArrayLength, which is a JNI function. The JNI specification prohibits any JNI calls inside a critical section opened by GetPrimitiveArrayCritical. The comment on line 187–189 even acknowledges this restriction: "no other JNI calls may occur between Get and Release."

Minimal safe fallback
 %typemap(in)     (const unsigned char *data, int length) {
-    $1 = (unsigned char *)JCALL2(GetPrimitiveArrayCritical, jenv, $input, NULL);
     $2 = (int)JCALL1(GetArrayLength, jenv, $input);
+    $1 = (unsigned char *)JCALL2(GetByteArrayElements, jenv, $input, NULL);
 }
 %typemap(argout) (const unsigned char *data, int length) {
-    JCALL3(ReleasePrimitiveArrayCritical, jenv, $input, (jbyte *)$1, JNI_ABORT);
+    JCALL3(ReleaseByteArrayElements, jenv, $input, (jbyte *)$1, JNI_ABORT);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coraza.i` around lines 186 - 195, The code opens a JNI critical section with
GetPrimitiveArrayCritical in the %typemap(in) block and then calls
GetArrayLength inside that section, violating the JNI rule that no other JNI
calls may occur while the array is pinned; fix by calling GetArrayLength
(JCALL1(GetArrayLength, jenv, $input)) before calling
JCALL2(GetPrimitiveArrayCritical, jenv, $input, NULL) so $2 (length) is
populated prior to entering the critical region, leaving
ReleasePrimitiveArrayCritical (JCALL3(ReleasePrimitiveArrayCritical, jenv,
$input, (jbyte *)$1, JNI_ABORT)) unchanged in the %typemap(argout) to release
the pinned array.


/*
Expand All @@ -215,25 +218,21 @@ typedef struct {
static void _swig_java_error_trampoline(void *ctx, coraza_matched_rule_t rule) {
_swig_java_cb_ctx_t *jctx = (_swig_java_cb_ctx_t *)ctx;
JNIEnv *env = NULL;
jboolean attached = JNI_FALSE;
if ((*jctx->jvm)->GetEnv(jctx->jvm, (void **)&env, JNI_VERSION_1_6) == JNI_EDETACHED) {
/* Attach as a daemon if needed and leave the thread attached. Daemon
* threads are cleaned up by the JVM on exit, so DetachCurrentThread is
* unnecessary and adds ~microseconds of JVM registry overhead per call. */
if ((*jctx->jvm)->GetEnv(jctx->jvm, (void **)&env, JNI_VERSION_1_6) == JNI_EDETACHED)
(*jctx->jvm)->AttachCurrentThreadAsDaemon(jctx->jvm, (void **)&env, NULL);
Comment on lines +221 to 225
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callback trampoline now attaches threads to the JVM but never detaches them. JNI generally requires DetachCurrentThread before a native thread exits to avoid leaking JVM-side thread resources; with Go creating/destroying OS threads over time, this can accumulate. If the goal is to avoid per-callback attach/detach overhead, consider caching attachment per thread and detaching via a thread-exit hook (or keep the previous detach-once-attached behavior).

Copilot uses AI. Check for mistakes.
Comment on lines +221 to 225
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "coraza.i" -type f

Repository: corazawaf/libcoraza

Length of output: 73


🏁 Script executed:

cat -n coraza.i | sed -n '210,245p'

Repository: corazawaf/libcoraza

Length of output: 2131


🌐 Web query:

JNI AttachCurrentThreadAsDaemon detach requirement specification

💡 Result:

Yes, a native thread attached to the Java VM using AttachCurrentThreadAsDaemon must call DetachCurrentThread before terminating. This requirement applies identically to both AttachCurrentThread (non-daemon) and AttachCurrentThreadAsDaemon (daemon), as the JNI specification treats attached native threads uniformly regarding detachment. Key points from the specification: - "A native thread attached to the VM must call DetachCurrentThread to detach itself before terminating." (Explicitly stated multiple times.) - Failure to detach can cause memory leaks, prevent VM shutdown (for non-daemon threads), or crash the VM (e.g., in older Android Dalvik). - DetachCurrentThread releases held monitors, notifies waiting threads, and marks the native thread as terminated. - Cannot detach if Java methods are on the call stack; must unwind first. - Calling on already-detached or never-attached threads is a no-op. - Daemon status (set by AttachCurrentThreadAsDaemon) affects VM shutdown waiting (VM doesn't wait for daemon threads), but detachment is still mandatory. This is unchanged in recent Java versions (up to 22+), per official docs.

Citations:


🌐 Web query:

JNI ExceptionDescribe vs ExceptionClear pending exception behavior

💡 Result:

JNI ExceptionDescribe prints the pending exception and its stack trace to stderr (or similar), and clears the pending exception as a side effect. In contrast, ExceptionClear only clears the pending exception without printing or describing it. Key behaviors with pending exceptions: - ExceptionDescribe: Describes (prints) the exception + clears it. - ExceptionClear: Clears it (no description). Official JNI spec notes: "The pending exception is cleared as a side-effect of calling this function [ExceptionDescribe]." This is consistent across Java 7, 8, 18, and 21 documentation. When a pending exception exists, safe JNI calls are limited (e.g., ExceptionOccurred, ExceptionDescribe, ExceptionClear). ExceptionDescribe is for debugging and consumes the exception by clearing it.

Citations:


🌐 Web query:

JNI AttachCurrentThreadAsDaemon return value failure error handling

💡 Result:

The JNI function JavaVM::AttachCurrentThreadAsDaemon attaches the current native thread to the JVM as a daemon thread and returns a JNIEnv pointer via the p_env argument. Return value: - JNI_OK (0) on success: Thread attached successfully, p_env is set to valid JNIEnv. - Negative value (JNI error code) on failure: Common codes include: - JNI_ERR (-1): Unknown error. - JNI_EVERSION (-3): Unsupported JNI version. - JNI_ENOMEM (-4): Out of memory. - JNI_EINVAL (-6): Invalid arguments (e.g., invalid JavaVMAttachArgs). - Other implementation-specific errors. Error handling: Always check the return value before using the JNIEnv*: jint res = jvm->AttachCurrentThreadAsDaemon((void)&env, &attachArgs); if (res != JNI_OK) { // Handle failure: log error, abort operation, etc. // Do NOT use env, as it may be NULL or invalid. fprintf(stderr, "Attach failed: %d\n", res); return; // or appropriate error path } // Now safe to use env - On failure, *p_env may be NULL or invalid; never dereference it. - Common failure causes (from examples): - Insufficient thread stack space (e.g., < default Java thread stack). - Incompatible JNI_VERSION in JavaVMAttachArgs. - JVM shutting down or invalid state (late attach may return JNI_OK but invalid env->functions; fixed in later JDKs). - Thread already attached to another VM. - No exceptions thrown; errors returned via jint. - Attached threads must call DetachCurrentThread before exit to avoid leaks/crashes (Android/ART enforces strictly). - Best practice: Use GetEnv first to check attachment status before attaching. This matches AttachCurrentThread behavior.

Citations:


Check the return value of AttachCurrentThreadAsDaemon before dereferencing env.

AttachCurrentThreadAsDaemon can fail and return negative error codes, leaving env NULL or invalid. Lines 226 and 236 dereference env immediately without checking the return value, creating a potential NULL dereference bug. Additionally, the code comment on lines 221–223 is misleading—the JNI specification explicitly requires calling DetachCurrentThread before native thread exit, even for daemon threads. Daemon status only affects VM shutdown waiting; it does not eliminate the detach requirement. Leaving threads attached can accumulate JVM thread state. Please check the attach return value, ensure proper detach on thread exit, and correct the misleading comment.

Also applies to: 234-235

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coraza.i` around lines 221 - 225, The code calls
(*jctx->jvm)->AttachCurrentThreadAsDaemon(jctx->jvm, (void **)&env, NULL)
without checking its return value and then dereferences env; change this to
store the return code, only use env if the attach returned JNI_OK, and handle
attach failures (e.g., log/error-out or avoid dereferencing env); also ensure
any successful attach is paired with (*jctx->jvm)->DetachCurrentThread() on
native thread exit to avoid leaking JVM thread state (update the misleading
comment about daemon threads and DetachCurrentThread accordingly); apply the
same checks for the other AttachCurrentThreadAsDaemon usage that currently
dereferences env.

attached = JNI_TRUE;
}
(*env)->CallVoidMethod(env, jctx->obj, jctx->mid, (jlong)(uintptr_t)rule);
if ((*env)->ExceptionCheck(env)) (*env)->ExceptionDescribe(env);
Comment on lines 226 to 227
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After CallVoidMethod, ExceptionDescribe is invoked but the exception is not cleared. With the new “leave the thread attached” behavior, a pending exception can persist on this JNIEnv and cause subsequent JNI calls on the same thread to fail. Clear the exception (and consider returning early) after reporting it.

Copilot uses AI. Check for mistakes.
if (attached) (*jctx->jvm)->DetachCurrentThread(jctx->jvm);
}

static void _swig_java_debug_trampoline(void *ctx, coraza_debug_log_level_t level,
const char *msg, const char *fields) {
_swig_java_cb_ctx_t *jctx = (_swig_java_cb_ctx_t *)ctx;
JNIEnv *env = NULL;
jboolean attached = JNI_FALSE;
if ((*jctx->jvm)->GetEnv(jctx->jvm, (void **)&env, JNI_VERSION_1_6) == JNI_EDETACHED) {
if ((*jctx->jvm)->GetEnv(jctx->jvm, (void **)&env, JNI_VERSION_1_6) == JNI_EDETACHED)
(*jctx->jvm)->AttachCurrentThreadAsDaemon(jctx->jvm, (void **)&env, NULL);
attached = JNI_TRUE;
}
jstring jmsg = (*env)->NewStringUTF(env, msg ? msg : "");
Comment on lines 233 to 236
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetEnv/AttachCurrentThreadAsDaemon results aren’t checked here. If GetEnv returns JNI_EVERSION/JNI_ERR or Attach fails, env may remain NULL and will be dereferenced immediately. Please handle the non-OK cases and bail out safely if an env cannot be obtained.

Copilot uses AI. Check for mistakes.
jstring jfields = (*env)->NewStringUTF(env, fields ? fields : "");
/* NewStringUTF returns NULL on OOM; skip the call rather than crash. */
Expand All @@ -243,7 +242,6 @@ static void _swig_java_debug_trampoline(void *ctx, coraza_debug_log_level_t leve
if (jmsg) (*env)->DeleteLocalRef(env, jmsg);
if (jfields) (*env)->DeleteLocalRef(env, jfields);
if ((*env)->ExceptionCheck(env)) (*env)->ExceptionDescribe(env);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the error trampoline, ExceptionDescribe does not clear the pending exception. Since the thread may remain attached across callbacks now, leaving the exception pending can break later callbacks on the same thread. Clear the exception after describing it.

Copilot uses AI. Check for mistakes.
if (attached) (*jctx->jvm)->DetachCurrentThread(jctx->jvm);
}

JNIEXPORT jint JNICALL Java_coraza_coraza_1set_1error_1callback(
Expand Down Expand Up @@ -353,7 +351,25 @@ typedef enum coraza_severity_t {

/*
* Function declarations
*
* Release the Python GIL for calls that cross into Go and do real work, so
* other Python threads can run concurrently. Trivial accessors and free
* functions are excluded because the GIL release/reacquire overhead would
* exceed any benefit.
*/
#ifdef SWIGPYTHON
%thread coraza_new_waf;
%thread coraza_rules_merge;
%thread coraza_process_connection;
%thread coraza_process_uri;
%thread coraza_process_request_headers;
%thread coraza_process_request_body;
%thread coraza_append_request_body;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "coraza.i" | head -5

Repository: corazawaf/libcoraza

Length of output: 73


🏁 Script executed:

find . -type f -path "*libcoraza/coraza.go" | head -5

Repository: corazawaf/libcoraza

Length of output: 84


🏁 Script executed:

git ls-files | grep -E "(coraza\.i|coraza\.go)" | head -10

Repository: corazawaf/libcoraza

Length of output: 91


🏁 Script executed:

wc -l coraza.i

Repository: corazawaf/libcoraza

Length of output: 75


🏁 Script executed:

cat -n coraza.i | sed -n '60,90p'

Repository: corazawaf/libcoraza

Length of output: 1292


🏁 Script executed:

cat -n coraza.i | sed -n '360,380p'

Repository: corazawaf/libcoraza

Length of output: 1095


🏁 Script executed:

rg "coraza_append_request_body|coraza_append_response_body" coraza.i -A 3 -B 1

Repository: corazawaf/libcoraza

Length of output: 979


🏁 Script executed:

rg "coraza_process_request_body|coraza_process_response_body" coraza.i -A 3 -B 1

Repository: corazawaf/libcoraza

Length of output: 1000


🏁 Script executed:

wc -l libcoraza/coraza.go

Repository: corazawaf/libcoraza

Length of output: 86


🏁 Script executed:

rg "func.*append_request_body|func.*append_response_body" libcoraza/coraza.go -A 20

Repository: corazawaf/libcoraza

Length of output: 1554


🏁 Script executed:

cat -n libcoraza/coraza.go | sed -n '1,50p'

Repository: corazawaf/libcoraza

Length of output: 1570


🏁 Script executed:

rg "%thread" coraza.i -B 5 -A 5 | head -40

Repository: corazawaf/libcoraza

Length of output: 843


🏁 Script executed:

cat -n coraza.i | sed -n '67,90p'

Repository: corazawaf/libcoraza

Length of output: 1028


🏁 Script executed:

python3 << 'EOF'
# Check Python's bytearray behavior regarding pointer validity after resize
# This is to understand the hazard

print("C.GoBytes behavior:")
print("- Makes a memcpy of data from C pointer")
print("- Happens AFTER GIL is released (due to %thread)")
print("")
print("PyByteArray_AS_STRING behavior:")
print("- Returns direct pointer to internal buffer")
print("- If bytearray is resized, pointer becomes invalid")
print("- If GIL is released, another thread can resize")
print("")
print("Race condition window:")
print("1. SWIG typemap calls PyByteArray_AS_STRING (GIL held)")
print("2. %thread decorator releases GIL")
print("3. Another Python thread resizes/reallocates bytearray")
print("4. Pointer extracted in step 1 now points to freed memory")
print("5. C.GoBytes attempts memcpy from invalid pointer")
print("6. Result: undefined behavior (crash, data corruption)")
EOF

Repository: corazawaf/libcoraza

Length of output: 678


Remove %thread from coraza_append_request_body and coraza_append_response_body or copy the buffer first.

These functions use the typemap that extracts a raw pointer from bytearray via PyByteArray_AS_STRING (line 77). When %thread releases the GIL, another Python thread can resize or reallocate that same bytearray, invalidating the pointer. The subsequent C.GoBytes memcpy then reads from freed memory. PyBytes are immutable and safe; the other %thread additions take no buffer parameters and are safe.

Minimal fix
 %thread coraza_process_request_body;
-%thread coraza_append_request_body;
 %thread coraza_process_response_headers;
 %thread coraza_process_response_body;
-%thread coraza_append_response_body;
 %thread coraza_process_logging;

Also applies to: 370-370

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coraza.i` at line 367, The wrappers coraza_append_request_body and
coraza_append_response_body use a typemap that extracts a raw pointer via
PyByteArray_AS_STRING and therefore must not release the GIL with %thread;
either remove the %thread directive for these two functions or modify the call
so you make a GIL-protected copy of the bytearray buffer (e.g., convert/copy the
contents to a new bytes/heap buffer while the GIL is held) before any C.GoBytes
or memcpy occurs; ensure the fix targets the typemap usage around
PyByteArray_AS_STRING and the subsequent C.GoBytes usage so the pointer cannot
be invalidated by another Python thread.

%thread coraza_process_response_headers;
%thread coraza_process_response_body;
%thread coraza_append_response_body;
%thread coraza_process_logging;
#endif

extern coraza_waf_config_t coraza_new_waf_config();
extern int coraza_rules_add_file(coraza_waf_config_t c, const char *file);
Expand Down
Loading