Skip to content

Conversation

ludvigsj
Copy link
Contributor

According to MshPrt 5.4.4, The Provisioner, upon receiving the Provisioning Failed PDU, shall assume that the provisioning failed and immediately disconnect the provisioning bearer.

Also changes the link close upon to success to use the prov_link_close helper function instead of doing it manually, as minor cleanup.

@PavelVPV
Copy link
Contributor

The Also part should be moved into a separate commit. Otherwise looks fine.

According to MshPrt 5.4.4, The Provisioner, upon receiving the
Provisioning Failed PDU, shall assume that the provisioning failed and
immediately disconnect the provisioning bearer.

Signed-off-by: Ludvig Jordet <[email protected]>
Changes the link close upon success to use the `prov_link_close` helper
function instead of doing it manually, as minor cleanup.

Signed-off-by: Ludvig Jordet <[email protected]>
@ludvigsj ludvigsj force-pushed the develop/prov_close_link_on_failed branch from 210402f to a93a15a Compare October 14, 2025 07:47
send_random();
}

static void prov_failed(const uint8_t *data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess after changes, the function just duplicates this: prov_fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. I mean, the behavior is identical, but one of them is a PDU handler, the other is a generic function that is called from various locations. I'd argue to keep it like this for clarity (handler function vs. actual behavior) in my opinion.

static void prov_failed(const uint8_t *data)
{
LOG_WRN("Error: 0x%02x", data[0]);
reset_state();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why provisioner does not clear CDB, state flags and doesn't release Diffie-Hellman key pair upon receiving provisioning failed frame.
Seems incorrect to me.
It should be:

prov_fail(PROV_BEARER_LINK_STATUS_FAIL);
reset_state();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look in the prov_link_closed callback, the reset is done there.

Copy link
Contributor

@alxelax alxelax Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee this callback is called. For pb gatt this will never cause this callback at all.
Cleaning data in this callback is used mostly for unpredictable link closing like timeout Host for gatt communication and closing over API by application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simple case with pd adv will cause zombie node in CDB and heap leakage over mbedtls once mesh cannot find free adv structure, that might happen at any time and depends on customer configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_state will be called after link is closed.
prov_link_close -> bearer->link_close:

For PB_ADV:
prov_link_close:pb_adv.c -> send Link Close PDU -> buf_sent -> close_link -> role->link_closed -> prov_link_closed:provisioner.c -> reset_state.

For PB-GATT:
prov_link_close:pb_gatt.c -> bt_conn_disconnect -> gatt_disconnected:pb_gatt_srv.c -> bt_mesh_pb_gatt_close:pb_gatt.c -> link_closed -> role->link_closed -> prov_link_closed:provisioner.c -> reset_state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm pretty sure this change breaks cleaning provisioner that works above rpr clent. When it receives link report "Closed by device" it doesn't clear cdb and keys anymore

Pretty sure -> can you elaborate because I don't really see this. As I showed above, for PB-Adv, reset_state is eventually called after sending Link Close PDU. For PB-GATT, Link is a regular BLE link and thus closed through BLE API, once it is closed, reset_state is called. I don't see how it can not be called. The only difference between old and new behavior is message sending (in case of PB-Adv) and link close (in case of PB-GATT).

Also, PB-GATT is not supported by RPR Client at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you meant error on bt_conn_disconnect? Then yes, it wasn't actually clear from the comment that this is what you mean. Then yes, we need to fix this, I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, in case of timeout (which will happen here as there is no communication between devices), reset_state will eventually be called:

static void protocol_timeout(struct k_work *work)
{
if (!atomic_test_bit(link.flags, ADV_LINK_ACTIVE)) {
return;
}
LOG_DBG("");
link.rx.seg = 0U;
prov_link_close(PROV_BEARER_LINK_STATUS_TIMEOUT);

static void link_closed(enum prov_bearer_link_status status)
{
const struct prov_bearer_cb *cb = link.cb;
void *cb_data = link.cb_data;
reset_state();
cb->link_closed(&bt_mesh_pb_gatt, cb_data, status);
}
static void protocol_timeout(struct k_work *work)
{
if (!atomic_test_bit(bt_mesh_prov_link.flags, LINK_ACTIVE)) {
return;
}
/* If connection failed or timeout, not allow establish connection */
if (IS_ENABLED(CONFIG_BT_MESH_PB_GATT_CLIENT) &&
atomic_test_bit(bt_mesh_prov_link.flags, PROVISIONER)) {
if (link.conn) {
(void)bt_conn_disconnect(link.conn,
BT_HCI_ERR_REMOTE_USER_TERM_CONN);
} else {
(void)bt_mesh_pb_gatt_cli_setup(NULL);
}
}
LOG_DBG("Protocol timeout");
link_closed(PROV_BEARER_LINK_STATUS_TIMEOUT);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm pretty sure this change breaks cleaning provisioner that works above rpr clent. When it receives link report "Closed by device" it doesn't clear cdb and keys anymore

Pretty sure -> can you elaborate because I don't really see this. As I showed above, for PB-Adv, reset_state is eventually called after sending Link Close PDU. For PB-GATT, Link is a regular BLE link and thus closed through BLE API, once it is closed, reset_state is called. I don't see how it can not be called. The only difference between old and new behavior is message sending (in case of PB-Adv) and link close (in case of PB-GATT).

Also, PB-GATT is not supported by RPR Client at the moment.

You described correctly, but this is applicable only to rpr server. Further rpr server handles pb_link_closed that will initiate link report with status BT_MESH_RPR_ERR_LINK_CLOSED_BY_DEVICE (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_srv.c#L478)
Then rpr client receives it and calls closed callback for its own provisioner: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_cli.c#L74
that will cause calling the same code but already on rpr client side: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/provisioner.c#L698-L706
you removed reset functionality here and rely on rpr client bearer callback that is called for some reason again after this PR (it seems weird by self). But follow further:
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/mesh/rpr_cli.c#L742-L750

this implementation does not assume any calbacks at all. Neither about success nor fail.
Finally, provisioner based on RPR Client gets zombie node and lost keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You described correctly, but this is applicable only to rpr server

Why? provisioner.c runs on RPR Client, not RPR server.

you removed reset functionality here and rely on rpr client bearer callback that is called for some reason again after this PR (it seems weird by self).

This I don't get. This discussion points to reset_state -> prov_link_close change in prov_failed callback which is called when Provisioning Failed PDU is received. I don't understand your sentence really, what does that is called for some reason again after this PR mean (where is again)?

In case of RPR, when Provisiong Failed PDU is received, RPR Client will with this new change have this call flow:
prov_link_close:provisioner.c -> pb_link_close:rpr_cli.c -> link_close -> link_timeout / handle_link_status -> bearer.cb->link_closed which is prov_link_closed:prov.c -> role->link_closedwhich isprov_link_closed:provisioner.c->reset_state`.

Copy link

@ludvigsj ludvigsj requested a review from alxelax October 14, 2025 08:29
Copy link
Contributor

@PavelVPV PavelVPV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is not stated in contribution guidelines, it is generally better to start commit head with verb. Approving anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants