perf: fix JNI/CGo correctness issues and reduce allocation overhead#83
perf: fix JNI/CGo correctness issues and reduce allocation overhead#83jptosso wants to merge 5 commits into
Conversation
📝 WalkthroughWalkthroughOptimizations to SWIG bindings improve memory handling and thread management across Java, Python, and Go language bindings. Java interop gains efficient byte-buffer access via critical sections and thread-safe JNI environment caching with pthread destructors. Python bindings acquire GIL release for selected WAF operations. Go bindings eliminate unnecessary memory copying and optimize buffer allocation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
coraza.i (1)
396-406: The new%threadlist still misses the long rule/file paths.
coraza_rules_add,coraza_rules_add_file, andcoraza_request_body_from_filestill hold the GIL even though they parse directives or, inlibcoraza/coraza.go, Line 385-399, do file I/O plus repeated body writes. If the defaultconst char *typemap is safe under%thread, these probably belong in the same list.Possible follow-up
`#ifdef` SWIGPYTHON %thread coraza_new_waf; +%thread coraza_rules_add_file; +%thread coraza_rules_add; %thread coraza_rules_merge; %thread coraza_process_connection; %thread coraza_process_uri; %thread coraza_process_request_headers; %thread coraza_process_request_body; %thread coraza_process_response_headers; %thread coraza_process_response_body; %thread coraza_process_logging; +%thread coraza_request_body_from_file; `#endif`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@coraza.i` around lines 396 - 406, The %thread declaration is missing long-running/file I/O functions which hold the GIL; add coraza_rules_add, coraza_rules_add_file, and coraza_request_body_from_file to the SWIG %thread list so they release the GIL during rule parsing and file I/O (matching the existing entries like coraza_new_waf and coraza_process_*); also verify that the default const char * typemap used by SWIG is safe under %thread or adjust the typemap to ensure thread-safety for those functions if needed.libcoraza/coraza.go (1)
297-299: Please lock the zero-copy lifetime contract down with a regression test.
Line 299,Line 322, and the reused buffer atLine 390are only safe ifWriteRequestBody/WriteResponseBodyfully consume the caller slice before returning. If a futuretypes.Transactionimplementation starts retaining that memory, these paths become buffer-aliasing bugs. A small test that mutates or reuses the source buffer immediately after the call would make that assumption explicit.Also applies to: 320-322, 390-399
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libcoraza/coraza.go` around lines 297 - 299, Add regression tests that assert WriteRequestBody and WriteResponseBody do not retain the caller-provided slice: implement tests (e.g., TestWriteRequestBodyDoesNotRetainCallerSlice and TestWriteResponseBodyDoesNotRetainCallerSlice) in the libcoraza package that (1) allocate a distinct byte buffer, (2) call tx.WriteRequestBody / tx.WriteResponseBody with unsafe.Slice of that buffer, (3) immediately mutate/overwrite the original buffer, and (4) verify the transaction’s stored body (via the transaction implementation or a read accessor) is unaffected by the mutation. Use a concrete types.Transaction instance (or a test fake that exposes the stored body) so the test will fail if WriteRequestBody / WriteResponseBody retain the caller memory; this locks down the zero-copy lifetime contract referenced at WriteRequestBody, WriteResponseBody and the reused buffer sites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@coraza.i`:
- Around line 233-236: The C++-style call uses member access on a JavaVM
pointer; change _swig_jni_detach_thread to use the C JNI invocation form: cast
jvm_ptr to JavaVM * (e.g. JavaVM *jvm = (JavaVM *)jvm_ptr;), check jvm, then
call the function via the function-pointer table:
(*jvm)->DetachCurrentThread(jvm); referencing _swig_jni_detach_thread and
JavaVM/DetachCurrentThread to locate where to make this change.
---
Nitpick comments:
In `@coraza.i`:
- Around line 396-406: The %thread declaration is missing long-running/file I/O
functions which hold the GIL; add coraza_rules_add, coraza_rules_add_file, and
coraza_request_body_from_file to the SWIG %thread list so they release the GIL
during rule parsing and file I/O (matching the existing entries like
coraza_new_waf and coraza_process_*); also verify that the default const char *
typemap used by SWIG is safe under %thread or adjust the typemap to ensure
thread-safety for those functions if needed.
In `@libcoraza/coraza.go`:
- Around line 297-299: Add regression tests that assert WriteRequestBody and
WriteResponseBody do not retain the caller-provided slice: implement tests
(e.g., TestWriteRequestBodyDoesNotRetainCallerSlice and
TestWriteResponseBodyDoesNotRetainCallerSlice) in the libcoraza package that (1)
allocate a distinct byte buffer, (2) call tx.WriteRequestBody /
tx.WriteResponseBody with unsafe.Slice of that buffer, (3) immediately
mutate/overwrite the original buffer, and (4) verify the transaction’s stored
body (via the transaction implementation or a read accessor) is unaffected by
the mutation. Use a concrete types.Transaction instance (or a test fake that
exposes the stored body) so the test will fail if WriteRequestBody /
WriteResponseBody retain the caller memory; this locks down the zero-copy
lifetime contract referenced at WriteRequestBody, WriteResponseBody and the
reused buffer sites.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e02a98b5-8064-4f3e-a688-a8c5358b5f6a
📒 Files selected for processing (2)
coraza.ilibcoraza/coraza.go
| %typemap(in) (const unsigned char *data, int length) { | ||
| $1 = (unsigned char *)JCALL2(GetByteArrayElements, jenv, $input, NULL); | ||
| /* GetArrayLength must be called before GetPrimitiveArrayCritical: no other | ||
| * JNI calls are permitted while a critical section is open. */ | ||
| $2 = (int)JCALL1(GetArrayLength, jenv, $input); | ||
| $1 = (unsigned char *)JCALL2(GetPrimitiveArrayCritical, jenv, $input, NULL); | ||
| } | ||
| %typemap(argout) (const unsigned char *data, int length) { | ||
| JCALL3(ReleaseByteArrayElements, jenv, $input, (jbyte *)$1, JNI_ABORT); | ||
| JCALL3(ReleasePrimitiveArrayCritical, jenv, $input, (jbyte *)$1, JNI_ABORT); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "coraza.i" -o -name "coraza.go" | head -20Repository: corazawaf/libcoraza
Length of output: 95
🏁 Script executed:
sed -n '180,200p' ./coraza.iRepository: corazawaf/libcoraza
Length of output: 1051
🏁 Script executed:
sed -n '295,305p' ./libcoraza/coraza.goRepository: corazawaf/libcoraza
Length of output: 509
🏁 Script executed:
sed -n '318,328p' ./libcoraza/coraza.goRepository: corazawaf/libcoraza
Length of output: 403
🏁 Script executed:
sed -n '1,50p' ./libcoraza/coraza.go | grep -A 5 "WriteRequestBody\|WriteResponseBody"Repository: corazawaf/libcoraza
Length of output: 45
🏁 Script executed:
grep -n "GetPrimitiveArrayCritical" ./coraza.iRepository: corazawaf/libcoraza
Length of output: 231
🏁 Script executed:
sed -n '186,210p' ./coraza.iRepository: corazawaf/libcoraza
Length of output: 1130
🏁 Script executed:
sed -n '175,200p' ./coraza.iRepository: corazawaf/libcoraza
Length of output: 1137
🏁 Script executed:
rg -n "coraza_append_request_body|coraza_append_response_body" ./coraza.i -A 10Repository: corazawaf/libcoraza
Length of output: 1933
🏁 Script executed:
rg -n "NULL" ./coraza.i | head -30Repository: corazawaf/libcoraza
Length of output: 768
🏁 Script executed:
sed -n '186,210p' ./coraza.i && echo "---" && sed -n '290,310p' ./libcoraza/coraza.goRepository: corazawaf/libcoraza
Length of output: 1851
Guard GetPrimitiveArrayCritical against NULL and prevent JNI callback re-entrancy during array pinning.
Line 190 does not check if GetPrimitiveArrayCritical returns NULL. In Go 1.20+, unsafe.Slice(nil, length) where length > 0 panics; non-empty input will hit this path in libcoraza/coraza.go lines 299 and 322. Additionally, the array remains pinned from line 190 to the argout release, and if WriteRequestBody or WriteResponseBody trigger Java callbacks (error or debug logging), those callbacks invoke JNI while the critical section is open, violating JNI rules. Either check for NULL and fail fast, or use a copying approach to avoid holding the critical section across callbacks.
| static void _swig_jni_detach_thread(void *jvm_ptr) { | ||
| if (jvm_ptr) | ||
| ((JavaVM *)jvm_ptr)->DetachCurrentThread((JavaVM *)jvm_ptr); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the coraza.i file
find . -name "coraza.i" -type fRepository: corazawaf/libcoraza
Length of output: 73
🏁 Script executed:
# Read the coraza.i file around lines 233-236
sed -n '230,250p' coraza.iRepository: corazawaf/libcoraza
Length of output: 797
🏁 Script executed:
# Search for other JNI invocation patterns to understand the code style
rg 'DetachCurrentThread|AttachCurrentThread' -B 2 -A 2 coraza.iRepository: corazawaf/libcoraza
Length of output: 582
Use the C JNI invocation form in _swig_jni_detach_thread.
Line 235 uses C++ member-access syntax (->) on a cast JavaVM pointer. The correct C JNI ABI requires dereferencing the function pointer structure: (*jvm)->MethodName(jvm, ...). This matches the pattern used in _swig_ensure_jni_env and is required for C compilation.
Proposed fix
static void _swig_jni_detach_thread(void *jvm_ptr) {
- if (jvm_ptr)
- ((JavaVM *)jvm_ptr)->DetachCurrentThread((JavaVM *)jvm_ptr);
+ if (jvm_ptr) {
+ JavaVM *jvm = (JavaVM *)jvm_ptr;
+ (*jvm)->DetachCurrentThread(jvm);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static void _swig_jni_detach_thread(void *jvm_ptr) { | |
| if (jvm_ptr) | |
| ((JavaVM *)jvm_ptr)->DetachCurrentThread((JavaVM *)jvm_ptr); | |
| } | |
| static void _swig_jni_detach_thread(void *jvm_ptr) { | |
| if (jvm_ptr) { | |
| JavaVM *jvm = (JavaVM *)jvm_ptr; | |
| (*jvm)->DetachCurrentThread(jvm); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@coraza.i` around lines 233 - 236, The C++-style call uses member access on a
JavaVM pointer; change _swig_jni_detach_thread to use the C JNI invocation form:
cast jvm_ptr to JavaVM * (e.g. JavaVM *jvm = (JavaVM *)jvm_ptr;), check jvm,
then call the function via the function-pointer table:
(*jvm)->DetachCurrentThread(jvm); referencing _swig_jni_detach_thread and
JavaVM/DetachCurrentThread to locate where to make this change.
Summary
Addresses correctness issues identified in #82 review and adds Go-layer allocation improvements. Each fix is a separate commit.
GetArrayLengthwas called inside theGetPrimitiveArrayCriticalcritical section, violating the JNI spec (no other JNI calls permitted while an array is pinned). Fixed by callingGetArrayLengthfirst.AttachCurrentThreadAsDaemonreturn value was unchecked; on failure,envwas NULL and immediately dereferenced. Replaced the ad-hoc attach/detach pattern with a_swig_ensure_jni_envhelper that checks the return value and registersDetachCurrentThreadas a pthread destructor so attached threads are cleaned up when the OS thread exits.%threadoncoraza_append_request_bodyandcoraza_append_response_bodywas unsafe: the typemap holds a raw pointer into the Python buffer viaPyBytes_AS_STRING/PyByteArray_AS_STRING, which can be invalidated if the GIL is released and another thread resizes or deallocates the object. These two functions are excluded from the%threadlist; the remaining Go-crossing calls still release the GIL.C.GoBytesincoraza_append_request_bodyandcoraza_append_response_bodyallocated and copied the entire buffer on every call. Replaced withunsafe.Sliceto construct a Go slice header over the C buffer in-place. Safe becauseWriteRequestBody/WriteResponseBodydo not retain the slice.coraza_request_body_from_filewas allocating a fresh 1 KB buffer on every loop iteration. Moved the allocation before the loop and increased the size to 32 KB (matchingio.Copy).Test plan
make -C examples/python runpasses all tests locally before pushSummary by CodeRabbit