Skip to content

fix(node): fix critical bugs in node_client WebSocket client#4

Merged
TangMeng12 merged 1 commit into
open-vela:devfrom
skyxiaobai:dev
Apr 15, 2026
Merged

fix(node): fix critical bugs in node_client WebSocket client#4
TangMeng12 merged 1 commit into
open-vela:devfrom
skyxiaobai:dev

Conversation

@skyxiaobai
Copy link
Copy Markdown
Collaborator

@skyxiaobai skyxiaobai commented Apr 15, 2026

  • P0: Add NULL check for cJSON_PrintUnformatted() return value in send_json_request() to prevent crash on OOM
  • P0: Add pthread_mutex to protect shared state (s_ws, config globals), use two-phase disconnect (shutdown fd first, free TLS after join) to eliminate race between teardown and active mbedtls_ssl_read, replace detached thread with joinable thread + pthread_join
  • P1: Check entropy_func() return value at all call sites (ws_send_text mask, do_connect WS key, ws_recv_frame pong, main loop ping) to fail safely on entropy failure

Signed-off-by: zhouwenjie1 zhouwenjie1@xiaomi.com"

@skyxiaobai skyxiaobai requested a review from TangMeng12 as a code owner April 15, 2026 10:01
Copilot AI review requested due to automatic review settings April 15, 2026 10:01
@github-actions
Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448380475


💡 Tip: All contributors must sign the CLA before the PR can be merged.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multiple stability and safety issues in the node_client WebSocket client, focusing on OOM safety, entropy failure handling, and safer teardown to avoid races with in-flight TLS reads/writes.

Changes:

  • Add NULL check for cJSON_PrintUnformatted() in send_json_request() to avoid crash on OOM.
  • Introduce mutex + joinable worker thread and implement a two-phase disconnect (shutdown fd first, free TLS after join).
  • Check entropy_func() return values at all call sites where random bytes are required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/node/node_client.c
Comment on lines 839 to +845
s_running = false;
do_disconnect();
/* Phase 1: close fd to unblock any blocking recv in the worker */
do_shutdown_fd();
/* Wait for worker thread to fully exit */
if (s_thread_valid) {
pthread_join(s_thread, NULL);
s_thread_valid = false;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_running is used for cross-thread coordination but is only declared as volatile and is read/written without synchronization. volatile does not provide atomicity or a happens-before relationship, so this is a data race between the controller thread (start/stop) and node_client_thread(). Use an atomic (e.g., atomic_bool with atomic_load/atomic_store) or guard s_running accesses with a mutex/condition variable to make thread termination reliable.

Copilot uses AI. Check for mistakes.
Comment thread src/node/node_client.c
Comment on lines +130 to +136
pthread_mutex_lock(&s_mutex);
if (!ws->connected) {
pthread_mutex_unlock(&s_mutex);
return -1;
}

int result = -1;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ws_send_text() holds s_mutex across the entire function, including potentially blocking raw_write()/mbedtls_ssl_write() calls. This can stall node_client_stop()/node_client_start() because do_shutdown_fd() also needs s_mutex to close the fd and unblock the worker, creating a shutdown hang under backpressure or a stuck socket. Consider narrowing the lock to only validate/copy the connection state (or using a refcount/RW-lock for TLS object lifetime) and performing the actual I/O without holding the global mutex; alternatively set a send timeout / non-blocking mode so raw_write cannot block indefinitely while the mutex is held.

Copilot uses AI. Check for mistakes.
Comment thread src/node/node_client.c
Comment on lines +454 to +456
pthread_mutex_lock(&s_mutex);
s_ws.connected = true;
pthread_mutex_unlock(&s_mutex);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_ws.connected is written under s_mutex here, but it is read elsewhere (e.g., the recv loop condition) without holding the mutex. This is a data race and can lead to undefined behavior on some architectures. Either make connected an atomic (e.g., atomic_bool) or consistently guard all reads/writes of s_ws.connected with s_mutex (or copy it to a local under the mutex before using it in loop conditions).

Copilot uses AI. Check for mistakes.
- P0: Add NULL check for cJSON_PrintUnformatted() return value
  in send_json_request() to prevent crash on OOM
- P0: Add pthread_mutex to protect shared state (s_ws, config globals),
  use two-phase disconnect (shutdown fd first, free TLS after join)
  to eliminate race between teardown and active mbedtls_ssl_read,
  replace detached thread with joinable thread + pthread_join
- P1: Check entropy_func() return value at all call sites
  (ws_send_text mask, do_connect WS key, ws_recv_frame pong,
  main loop ping) to fail safely on entropy failure

Signed-off-by: zhouwenjie1 <zhouwenjie1@xiaomi.com>
@github-actions
Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448647800


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@github-actions
Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448380475


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@TangMeng12
Copy link
Copy Markdown
Collaborator

/check-cla

@github-actions
Copy link
Copy Markdown

❌ CLA Signature Still Required

@skyxiaobai Some contributors still need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24452319581


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@TangMeng12
Copy link
Copy Markdown
Collaborator

/check-cla

@github-actions
Copy link
Copy Markdown

✅ CLA Verification Complete

@skyxiaobai All contributors have signed the CLA!

  • zhouwenjie1@xiaomi.com

📋 View detailed check results: Action Run #24452453012

Your pull request can now proceed with the review process! 🎉

@github-actions
Copy link
Copy Markdown

✅ CLA Verification Complete

@skyxiaobai All contributors have signed the CLA!

  • zhouwenjie1@xiaomi.com

📋 View detailed check results: Action Run #24452319581

Your pull request can now proceed with the review process! 🎉

@TangMeng12 TangMeng12 closed this Apr 15, 2026
@TangMeng12 TangMeng12 reopened this Apr 15, 2026
@TangMeng12 TangMeng12 merged commit b3bfd6a into open-vela:dev Apr 15, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants