-
Notifications
You must be signed in to change notification settings - Fork 174
fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
67bd7e3 to
46871bf
Compare
| #else | ||
| // When separate TX lock is not configured, we already hold client->lock | ||
| // which protects the transport, so we can send PONG directly | ||
| esp_transport_ws_send_raw(client->transport, WS_TRANSPORT_OPCODES_PONG | WS_TRANSPORT_OPCODES_FIN, data, client->payload_len, |
Check warning
Code scanning / clang-tidy
The value '138' provided to the cast expression is not in the valid range of values for 'ws_transport_opcodes' [clang-analyzer-optin.core.EnumCastOutOfRange] Warning
46871bf to
5577e03
Compare
ca2956e to
0e58789
Compare
| } | ||
| ESP_LOGD(TAG, "Calling abort_connection due to send error"); | ||
| #ifdef CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK | ||
| xSemaphoreGiveRecursive(client->tx_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to move this verification to abort connection function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@euripedesrocha I am not sure about it, as abort connection is used in 5 different places, if we move it inside abort_connection it adds complex detection logic "which lock am I holding?" Only 1 place (send error path) needs lock switching (tx_lock → client->lock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right in that sense but looking at it again it might be better to release the tx_lock after acquiring the main lock maybe at the end of this section after we complete the abort process to avoid task to be preempted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thank you! Updated accordingly
0e58789 to
62925a5
Compare
15dcb35 to
f474654
Compare
a6e4259 to
50e3068
Compare
50e3068 to
22eb17e
Compare
52abfc0 to
30778c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
- Add state check in abort_connection to prevent double-close - Fix memory leak: free errormsg_buffer on disconnect - Reset connection state on reconnect to prevent stale data - Implement lock ordering for separate TX lock mode - Read buffered data immediately after connection to prevent data loss - Added sdkconfig.ci.tx_lock config
30778c0 to
d202ae4
Compare
| esp_event_loop_run(client->event_handle, 0); | ||
| } | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing error handling for recv failure after connect
The new immediate poll code after connection calls esp_websocket_client_recv but does not handle the ESP_FAIL return case. When recv_result is not ESP_OK, the code simply skips the event loop run without logging an error or calling esp_websocket_client_abort_connection. This is inconsistent with the existing error handling at lines 1293-1296, which properly logs the error and aborts the connection when esp_websocket_client_recv fails. Since WEBSOCKET_EVENT_CONNECTED is already dispatched at line 1206, failing silently here could leave the client in an inconsistent state where the user believes the connection succeeded but data reception failed.
TODO - Will remove comments after the review ( left it for easier review)
Description
This PR fixes critical memory leaks and crashes in the ESP WebSocket client that occur during reconnection scenarios(CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK = y).
Changes Made:
Related
#898
Checklist
Before submitting a Pull Request, please ensure the following:
Note
Harden websocket client abort/reconnect paths with proper state checks and lock ordering, fix error-buffer leak, reset/handle early data on connect, and add tx-lock CI config.
components/esp_websocket_client/esp_websocket_client.c):esp_websocket_client_abort_connection()with state checks to skip redundant aborts and close transport safely.client->errormsg_bufferon disconnect to prevent leaks.rx_bufferis freed on early returns.tx_lock, abort connection underclient->lock, and exit cleanly.payload_len/offset,last_fin,last_opcodeand non-blocking poll to process data arriving during handshake.client->lockbeforerecvwhen data is ready; avoid double-locking.client->transport_listandclient->transporttoNULL.components/esp_websocket_client/examples/target/sdkconfig.ci.tx_lockenablingCONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCKwith timeout and Ethernet settings.Written by Cursor Bugbot for commit d202ae4. This will update automatically on new commits. Configure here.