fix(node): fix critical bugs in node_client WebSocket client#4
Conversation
❌ CLA Signature Required@skyxiaobai Some contributors need to sign the CLA:
Please:
📋 View detailed check results: Action Run #24448380475 💡 Tip: All contributors must sign the CLA before the PR can be merged. |
There was a problem hiding this comment.
Pull request overview
Fixes multiple stability and safety issues in the node_client WebSocket client, focusing on OOM safety, entropy failure handling, and safer teardown to avoid races with in-flight TLS reads/writes.
Changes:
- Add NULL check for
cJSON_PrintUnformatted()insend_json_request()to avoid crash on OOM. - Introduce mutex + joinable worker thread and implement a two-phase disconnect (shutdown fd first, free TLS after join).
- Check
entropy_func()return values at all call sites where random bytes are required.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| s_running = false; | ||
| do_disconnect(); | ||
| /* Phase 1: close fd to unblock any blocking recv in the worker */ | ||
| do_shutdown_fd(); | ||
| /* Wait for worker thread to fully exit */ | ||
| if (s_thread_valid) { | ||
| pthread_join(s_thread, NULL); | ||
| s_thread_valid = false; |
There was a problem hiding this comment.
s_running is used for cross-thread coordination but is only declared as volatile and is read/written without synchronization. volatile does not provide atomicity or a happens-before relationship, so this is a data race between the controller thread (start/stop) and node_client_thread(). Use an atomic (e.g., atomic_bool with atomic_load/atomic_store) or guard s_running accesses with a mutex/condition variable to make thread termination reliable.
| pthread_mutex_lock(&s_mutex); | ||
| if (!ws->connected) { | ||
| pthread_mutex_unlock(&s_mutex); | ||
| return -1; | ||
| } | ||
|
|
||
| int result = -1; |
There was a problem hiding this comment.
ws_send_text() holds s_mutex across the entire function, including potentially blocking raw_write()/mbedtls_ssl_write() calls. This can stall node_client_stop()/node_client_start() because do_shutdown_fd() also needs s_mutex to close the fd and unblock the worker, creating a shutdown hang under backpressure or a stuck socket. Consider narrowing the lock to only validate/copy the connection state (or using a refcount/RW-lock for TLS object lifetime) and performing the actual I/O without holding the global mutex; alternatively set a send timeout / non-blocking mode so raw_write cannot block indefinitely while the mutex is held.
| pthread_mutex_lock(&s_mutex); | ||
| s_ws.connected = true; | ||
| pthread_mutex_unlock(&s_mutex); |
There was a problem hiding this comment.
s_ws.connected is written under s_mutex here, but it is read elsewhere (e.g., the recv loop condition) without holding the mutex. This is a data race and can lead to undefined behavior on some architectures. Either make connected an atomic (e.g., atomic_bool) or consistently guard all reads/writes of s_ws.connected with s_mutex (or copy it to a local under the mutex before using it in loop conditions).
- P0: Add NULL check for cJSON_PrintUnformatted() return value in send_json_request() to prevent crash on OOM - P0: Add pthread_mutex to protect shared state (s_ws, config globals), use two-phase disconnect (shutdown fd first, free TLS after join) to eliminate race between teardown and active mbedtls_ssl_read, replace detached thread with joinable thread + pthread_join - P1: Check entropy_func() return value at all call sites (ws_send_text mask, do_connect WS key, ws_recv_frame pong, main loop ping) to fail safely on entropy failure Signed-off-by: zhouwenjie1 <zhouwenjie1@xiaomi.com>
❌ CLA Signature Required@skyxiaobai Some contributors need to sign the CLA:
Please:
📋 View detailed check results: Action Run #24448647800 💡 Tip: All contributors must sign the CLA before the PR can be merged. |
❌ CLA Signature Required@skyxiaobai Some contributors need to sign the CLA:
Please:
📋 View detailed check results: Action Run #24448380475 💡 Tip: All contributors must sign the CLA before the PR can be merged. |
|
/check-cla |
❌ CLA Signature Still Required@skyxiaobai Some contributors still need to sign the CLA:
Please:
📋 View detailed check results: Action Run #24452319581 💡 Tip: All contributors must sign the CLA before the PR can be merged. |
|
/check-cla |
✅ CLA Verification Complete@skyxiaobai All contributors have signed the CLA!
📋 View detailed check results: Action Run #24452453012 Your pull request can now proceed with the review process! 🎉 |
✅ CLA Verification Complete@skyxiaobai All contributors have signed the CLA!
📋 View detailed check results: Action Run #24452319581 Your pull request can now proceed with the review process! 🎉 |
Signed-off-by: zhouwenjie1 zhouwenjie1@xiaomi.com"