zephyr: fix pthread stack pool overflow on reconnect (#1064) #1122
+95
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On Zephyr, _z_task_init() assigns preallocated pthread stacks from thread_stack_area[]. During repeated link resets/reconnects, zenoh-pico recreates the read/lease threads.
Previous implementation used a constantly increasing thread_index++, which eventually indexed past thread_stack_area[], corrupting memory and causing crashes.
Replace thread_index++ with a stack-slot pool. When attr == NULL, pick a free slot in thread_stack_area[], set it with pthread_attr_setstack(), and start the thread.
Slot release is now done with a thread-specific data key destructor.
Add CONFIG_ZENOH_PICO_ZEPHYR_THREADS_NUM (default 4) to configure the stack pool size.
Fixes: #1064
Reproducing
Verify messages arrive:
[sub] demo/example/h753/sub = ...Reproduce reconnection:
Expected result before the fix (fail)
After several reconnects, the firmware will crash due to corrupted stack like in issue #1064.
Expected result (pass)
alive tick=...[sub] ...messages.-DZENOH_LOG_DEBUG, you may also see zenoh-pico debug logs; there should be no "slot OOM" errors.🏷️ Label-Based Checklist
Based on the labels applied to this PR, please complete these additional requirements:
Labels:
bug🐛 Bug Fix Requirements
Since this PR is labeled as a bug fix, please ensure:
Why this matters: Bugs without tests often reoccur.
Instructions:
- [ ]to- [x])This checklist updates automatically when labels change, but preserves your checked boxes.