Skip to content

fix(node): fix critical bugs in node_client WebSocket client#4

Merged
TangMeng12 merged 1 commit into
open-vela:devfrom
skyxiaobai:dev
Apr 15, 2026
Merged

fix(node): fix critical bugs in node_client WebSocket client#4
TangMeng12 merged 1 commit into
open-vela:devfrom
skyxiaobai:dev

Conversation

@skyxiaobai

@skyxiaobai skyxiaobai commented Apr 15, 2026

Copy link
Copy Markdown
Collaborator
  • P0: Add NULL check for cJSON_PrintUnformatted() return value in send_json_request() to prevent crash on OOM
  • P0: Add pthread_mutex to protect shared state (s_ws, config globals), use two-phase disconnect (shutdown fd first, free TLS after join) to eliminate race between teardown and active mbedtls_ssl_read, replace detached thread with joinable thread + pthread_join
  • P1: Check entropy_func() return value at all call sites (ws_send_text mask, do_connect WS key, ws_recv_frame pong, main loop ping) to fail safely on entropy failure

Signed-off-by: zhouwenjie1 zhouwenjie1@xiaomi.com"

@skyxiaobai skyxiaobai requested a review from TangMeng12 as a code owner April 15, 2026 10:01
Copilot AI review requested due to automatic review settings April 15, 2026 10:01
@github-actions

Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448380475


💡 Tip: All contributors must sign the CLA before the PR can be merged.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multiple stability and safety issues in the node_client WebSocket client, focusing on OOM safety, entropy failure handling, and safer teardown to avoid races with in-flight TLS reads/writes.

Changes:

  • Add NULL check for cJSON_PrintUnformatted() in send_json_request() to avoid crash on OOM.
  • Introduce mutex + joinable worker thread and implement a two-phase disconnect (shutdown fd first, free TLS after join).
  • Check entropy_func() return values at all call sites where random bytes are required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/node/node_client.c
Comment on lines 839 to +845
s_running = false;
do_disconnect();
/* Phase 1: close fd to unblock any blocking recv in the worker */
do_shutdown_fd();
/* Wait for worker thread to fully exit */
if (s_thread_valid) {
pthread_join(s_thread, NULL);
s_thread_valid = false;

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_running is used for cross-thread coordination but is only declared as volatile and is read/written without synchronization. volatile does not provide atomicity or a happens-before relationship, so this is a data race between the controller thread (start/stop) and node_client_thread(). Use an atomic (e.g., atomic_bool with atomic_load/atomic_store) or guard s_running accesses with a mutex/condition variable to make thread termination reliable.

Copilot uses AI. Check for mistakes.
Comment thread src/node/node_client.c
Comment on lines +130 to +136
pthread_mutex_lock(&s_mutex);
if (!ws->connected) {
pthread_mutex_unlock(&s_mutex);
return -1;
}

int result = -1;

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ws_send_text() holds s_mutex across the entire function, including potentially blocking raw_write()/mbedtls_ssl_write() calls. This can stall node_client_stop()/node_client_start() because do_shutdown_fd() also needs s_mutex to close the fd and unblock the worker, creating a shutdown hang under backpressure or a stuck socket. Consider narrowing the lock to only validate/copy the connection state (or using a refcount/RW-lock for TLS object lifetime) and performing the actual I/O without holding the global mutex; alternatively set a send timeout / non-blocking mode so raw_write cannot block indefinitely while the mutex is held.

Copilot uses AI. Check for mistakes.
Comment thread src/node/node_client.c
Comment on lines +454 to +456
pthread_mutex_lock(&s_mutex);
s_ws.connected = true;
pthread_mutex_unlock(&s_mutex);

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_ws.connected is written under s_mutex here, but it is read elsewhere (e.g., the recv loop condition) without holding the mutex. This is a data race and can lead to undefined behavior on some architectures. Either make connected an atomic (e.g., atomic_bool) or consistently guard all reads/writes of s_ws.connected with s_mutex (or copy it to a local under the mutex before using it in loop conditions).

Copilot uses AI. Check for mistakes.
- P0: Add NULL check for cJSON_PrintUnformatted() return value
  in send_json_request() to prevent crash on OOM
- P0: Add pthread_mutex to protect shared state (s_ws, config globals),
  use two-phase disconnect (shutdown fd first, free TLS after join)
  to eliminate race between teardown and active mbedtls_ssl_read,
  replace detached thread with joinable thread + pthread_join
- P1: Check entropy_func() return value at all call sites
  (ws_send_text mask, do_connect WS key, ws_recv_frame pong,
  main loop ping) to fail safely on entropy failure

Signed-off-by: zhouwenjie1 <zhouwenjie1@xiaomi.com>
@github-actions

Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448647800


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@github-actions

Copy link
Copy Markdown

❌ CLA Signature Required

@skyxiaobai Some contributors need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24448380475


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@TangMeng12

Copy link
Copy Markdown
Collaborator

/check-cla

@github-actions

Copy link
Copy Markdown

❌ CLA Signature Still Required

@skyxiaobai Some contributors still need to sign the CLA:

  • zhouwenjie1@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24452319581


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@TangMeng12

Copy link
Copy Markdown
Collaborator

/check-cla

@github-actions

Copy link
Copy Markdown

✅ CLA Verification Complete

@skyxiaobai All contributors have signed the CLA!

  • zhouwenjie1@xiaomi.com

📋 View detailed check results: Action Run #24452453012

Your pull request can now proceed with the review process! 🎉

@github-actions

Copy link
Copy Markdown

✅ CLA Verification Complete

@skyxiaobai All contributors have signed the CLA!

  • zhouwenjie1@xiaomi.com

📋 View detailed check results: Action Run #24452319581

Your pull request can now proceed with the review process! 🎉

@TangMeng12 TangMeng12 closed this Apr 15, 2026
@TangMeng12 TangMeng12 reopened this Apr 15, 2026
@TangMeng12 TangMeng12 merged commit b3bfd6a into open-vela:dev Apr 15, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants