Skip to content

Conversation

HoZHel
Copy link
Contributor

@HoZHel HoZHel commented May 5, 2025

Increase main stack size when Bluetooth is enabled to avoid stack overflow.
samples/bluetooth/beacon was considered for this change. It has also been tested with samples/bluetooth/peripheral_hr.

@HoZHel HoZHel marked this pull request as ready for review May 5, 2025 14:16
@HoZHel HoZHel requested a review from erwango May 5, 2025 14:25
erwango
erwango previously approved these changes May 5, 2025
@cvinayak
Copy link
Contributor

cvinayak commented May 6, 2025

FYI, #74345

Have you done some extensive Bluetooth use case testing on 96b_carbon board? This is using hci_spi on the nRF51 SoC, don't you see assertions in the Controller?

@cvinayak cvinayak self-assigned this May 6, 2025
@cvinayak cvinayak requested review from jhedberg and nordicjm May 6, 2025 07:43
nordicjm
nordicjm previously approved these changes May 6, 2025
@jhedberg
Copy link
Member

jhedberg commented May 6, 2025

Increase main stack size when Bluetooth is enabled to avoid stack overflow. samples/bluetooth/beacon was considered for this change. It has also been tested with samples/bluetooth/peripheral_hr.

I can understand peripheral_hr since it passes a NULL callback to bt_enable() (i.e. runs the init in the main thread), however beacon passes an async callback to bt_enable(), i.e. the init runs in a separate context. It'd be good to understand where exactly the increased stack consumption comes from in that case.

@HoZHel
Copy link
Contributor Author

HoZHel commented May 6, 2025

I can understand peripheral_hr since it passes a NULL callback to bt_enable() (i.e. runs the init in the main thread), however beacon passes an async callback to bt_enable(), i.e. the init runs in a separate context. It'd be good to understand where exactly the increased stack consumption comes from in that case.

After migrating from tinycript to mbedtls, this happened to at least ST SoCs which don't have a TRNG peripheral.

@HoZHel
Copy link
Contributor Author

HoZHel commented May 6, 2025

FYI, #74345

Have you done some extensive Bluetooth use case testing on 96b_carbon board? This is using hci_spi on the nRF51 SoC, don't you see assertions in the Controller?

I tested the aforementioned samples and they worked fine. I flashed the upstream hci_spi sample on nRF51. I think if there were a problem with the controller, it wouldn't be responsive to the host.

@HoZHel
Copy link
Contributor Author

HoZHel commented Jun 16, 2025

@cvinayak, PTAL

Copy link
Member

@jhedberg jhedberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. The Zephyr Bluetooth stack doesn't inherently use any main thread stack at all - it all depends on the application, which could e.g. do Bluetooth initialization in some completely different thread. Such changes should be done on a per-application level. Either update the main stack for the app in prj.conf (if it's justifiable) or add a board-specific overlay to the app.

@cvinayak
Copy link
Contributor

update the main stack for the app in prj.conf (if it's justifiable) or add a board-specific overlay to the app.

I agree with @jhedberg here. In the case of the use of hci_spi sample, it will have to be a board-specific extra config file be placed in the boards folder of the sample.

@msmttchr
Copy link
Contributor

msmttchr commented Aug 5, 2025

@HoZHel, I understand the carbon board is a two chip platform with BLE host running on STM32F4 and BLE controller in nRF51. I also understand that the problem you are reporting is located in the host, so it’s unlikely that increasing main stack size in the host could fix or mask a problem in the controller.
Did you initially discover the problem with beacon? If yes in which function did you observe stack overflow?

Copy link

github-actions bot commented Oct 5, 2025

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Oct 5, 2025
@HoZHel HoZHel removed the Stale label Oct 9, 2025
@HoZHel
Copy link
Contributor Author

HoZHel commented Oct 10, 2025

@HoZHel, I understand the carbon board is a two chip platform with BLE host running on STM32F4 and BLE controller in nRF51. I also understand that the problem you are reporting is located in the host, so it’s unlikely that increasing main stack size in the host could fix or mask a problem in the controller. Did you initially discover the problem with beacon? If yes in which function did you observe stack overflow?

Yes. The application is beacon and mbedtls_internal_sha256_process causes stack overflow for the main stack. The attached screenshots depict clearly.

#if defined(CONFIG_MBEDTLS_INIT)
SYS_INIT(_mbedtls_init, POST_KERNEL, CONFIG_KERNEL_INIT_PRIORITY_DEFAULT);
#endif

In order to be sure that calling _mbedtls_init function results in calling mbedtls_internal_sha256_process function, and then contributes to stack overflow, I disabled CONFIG_MBEDTLS_INIT and called the _mbedtls_init inside the main function of beacon sample.

image

Disassembled zephyr.elf:
image

0x20005240 - 0x20004e40 = 0x400 which corresponds to CONFIG_MAIN_STACK_SIZE of 1024 bytes.
image

image

Therefore, any Bluetooth sample on this board requires the main stack size of greater than 1024 bytes.

@msmttchr
Copy link
Contributor

Therefore, any Bluetooth sample on this board requires the main stack size of greater than 1024 bytes.

@HoZHel, thanks a lot for your detailed explanation and the reason why main stack should be increased, so we can say that if mbedtls is needed, main stack size should be increased. Possibly the check in KConfig should be:
if MBEDTLS rather than if BT

@HoZHel
Copy link
Contributor Author

HoZHel commented Oct 10, 2025

update the main stack for the app in prj.conf (if it's justifiable) or add a board-specific overlay to the app.

I agree with @jhedberg here. In the case of the use of hci_spi sample, it will have to be a board-specific extra config file be placed in the boards folder of the sample.

What you are suggesting is related to hci_spi sample that is for the controller MCU (nRF51). I'm talking about the main MCU (STM32F4) which uses other samples. Please, see the comment above.

@cvinayak
Copy link
Contributor

cvinayak commented Oct 10, 2025

KConfig should be:
if MBEDTLS rather than if BT

Agree to above. The stack size increase is due to inclusion of MBEDTLS (in the main MCU, STM32F4, firmware) which could be any other subsystem depending on MBEDTLS (hence, BT is not the true dependency for the required increase).

(@HoZHel, thank you for the thorough analysis)

Increase main stack size when MBEDTLS_INIT is enabled to avoid stack
overflow as it consumes 1536 bytes.

Signed-off-by: Ali Hozhabri <[email protected]>
@HoZHel HoZHel dismissed stale reviews from nordicjm and erwango via 274aade October 10, 2025 16:17
@HoZHel HoZHel force-pushed the 96b_carbon_main_stack_inc branch from 2b8766b to 274aade Compare October 10, 2025 16:17
@HoZHel
Copy link
Contributor Author

HoZHel commented Oct 10, 2025

Changed if BT to if MBEDTLS_INIT.

Copy link

Copy link
Contributor

@msmttchr msmttchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cvinayak cvinayak requested a review from jhedberg October 13, 2025 11:11
@jhedberg
Copy link
Member

The change looks more reasonable now, but is it really only this board that's affected by the increased stack consumption when MBEDTLS_INIT is enabled?

@msmttchr
Copy link
Contributor

The change looks more reasonable now, but is it really only this board that's affected by the increased stack consumption when MBEDTLS_INIT is enabled?

That’s good point in my opinion, it can be linked to all STM32F4 or to all CM4 based SoC without embedded RNG. We could accept the currency proposal at board level, as first step and investigate later or we could limit to STM32F4. Any suggestion or preference? In general, I think the C stack sizing is a difficult problem and I don’t know whether a standard policy exists in Zephyr.

@jhedberg
Copy link
Member

The change looks more reasonable now, but is it really only this board that's affected by the increased stack consumption when MBEDTLS_INIT is enabled?

That’s good point in my opinion, it can be linked to all STM32F4 or to all CM4 based SoC without embedded RNG. We could accept the currency proposal at board level, as first step and investigate later or we could limit to STM32F4. Any suggestion or preference? In general, I think the C stack sizing is a difficult problem and I don’t know whether a standard policy exists in Zephyr.

My worry is that we already have a bunch of these and there are more on their way , e.g. in #96692 there's this:

config MAIN_STACK_SIZE
	default 2048 if MBEDTLS

The above would likely need to be to be changed to use MBEDTLS_INIT as well.

I'm not 100% objecting to the current solution in this PR, but the more we let into the tree the harder it will be to track everything down later when we want to make a cleaner and more generic solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants