Skip to content

Linux chip-tool network recovery feature example #38421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

dengpan-liu
Copy link

@dengpan-liu dengpan-liu commented Apr 16, 2025

This is the Linux platform chip-tool example supports the Network Recovery feature defined in Matter 1.5.

Usage:

  • For discovering nodes who advertising network recovery message in BLE
    chip-tool networkrecovery discover --timeout 10
    If don't specific the timeout value, it will take 30 seconds as default for Network Recovery scan.
    Once scan completed, the RecoveryIdentifier value from each recoverable device will be listed in the chip-tool log, for example:
[1744899521.197] [1746:1751] [CTL] Discovery timed out
[1744899521.198] [1746:1751] [TOO] Find recoverable devices:
[1744899521.198] [1746:1751] [TOO] 0x8877665544332211
[1744899521.199] [1746:1747] [BLE] BLE removing known devices
[1744899521.202] [1746:1747] [BLE] BLE initiating scan
[1744899521.217] [1746:1751] [BLE] ChipDeviceScanner has started scanning!
[1744899521.218] [1746:1746] [CTL] Shutting down the commissioner
  • For recovering a node's Wi-Fi network in BLE
    chip-tool networkrecovery recover-wifi node-id recovery-identifier ssid password breadcrumb, for example:
    chip-tool networkrecovery recover-wifi 1 9833440827789222417 my-ssid my-password 0
  • For recovering a node's Thread network in BLE
    chip-tool networkrecovery recover-thread node-id recovery-identifier operational-dataset breadcrumb, for example:
    chip-tool networkrecovery recover-thread 1 9833440827789222417 hex:0e080000000000010000000300001535060004001fffe0020824228a3f6322559a0708fd9917d14c5f5ea1051062ab56ee4a14977b28eaddda12339592030f4f70656e5468726561642d333239640102329d04107bbc1fd5bd41afd8321398d0e17607f20c0402a0fff8 0
    The chip-tool log when network recovery success:
[1744907638.711] [164591:164602] [DMG] },
[1744907638.711] [164591:164602] [DMG] Received Command Response Data, Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0005
[1744907638.711] [164591:164602] [CTL] Received CommissioningComplete response, errorCode=0
[1744907638.711] [164591:164602] [CTL] Network Recovery succeeded.
[1744907638.711] [164591:164602] [TOO] Recovery complete for devicice 0000000000000001

Testing

manually validated in WSL2 (Ubuntu 22.04) with sample ESP dongle act as recoverable node.

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2025

CLA assistant check
All committers have signed the CLA.

@dengpan-liu dengpan-liu changed the title chip-tool example chip-tool newtork recovery example Apr 16, 2025
Copy link

PR #38421: Size comparison from fe31731 to a510121

Full report (1 build for stm32)
platform target config section fe31731 a510121 change % change
stm32 light STM32WB5MM-DK FLASH 461384 461464 80 0.0
RAM 141488 141520 32 0.0

mState = State::ResolvingAddress;
mTransportPayloadCapability = transportPayloadCapability;

UpdateDeviceData(params.GetPeerAddress(), params.GetMRPConfig());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just trying to understand this: we are establishing CASE here, right? For a device that is presumably not on-network? So are we doing CASE over some non-IP transport in this case?

We should put some thought into how to make this zero-cost or close to it for nodes that never perform network recovery but do need CASE establishment in general (e.g. for bindings).

@@ -190,6 +191,17 @@ void OperationalSessionSetup::Connect(Callback::Callback<OnDeviceConnected> * on
Connect(onConnection, onFailure, nullptr, transportPayloadCapability);
}

void OperationalSessionSetup::Connect(Callback::Callback<OnDeviceConnected> * onConnection, Callback::Callback<OnDeviceConnectionFailure> * onFailure,
RendezvousParameters & params,
TransportPayloadCapability transportPayloadCapability)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does all this stuff have TransportPayloadCapability arguments? I don't expect anything to require TCP for network recovery....

Copy link
Author

@dengpan-liu dengpan-liu Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, the "TransportPayloadCapability" arguments are specified for all OperationalSessionSetup::Connect functions, just want to keep the format is same.

Copy link
Contributor

@bzbarsky-apple bzbarsky-apple Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're there for the normal case because some things need to happen over TCP. But network recovery does not, in fact, happen over TCP; if you can do TCP you're already on the network, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: running network recovery protocols over IP still makes sense if there is a proxy in the chain somewhere. The proper analogy is that any commissioning transport (anything that's used to setup and conduct a PASE session) should be usable for network recovery; so if we ever define commissioning over TCP, then we would be doing network recovery over TCP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the real question is: would we ever require TCP for network recovery? If not, we can just ask for a CASE session over UDP for network recovery purposes....

But OK, if network recovery can happen over IP then in theory someone could decide they want to do it over TCP, not UDP....

// cancel that attempt before we can update the address.
if (mState == State::Connecting)
{
CancelSessionSetupReattempt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancelSessionSetupReattempt() does not in fact take us out of the Connecting state, right? It just cancels a timer...

@@ -834,6 +858,20 @@ class DLL_EXPORT DeviceCommissioner : public DeviceController,
/* fireAndForget = */ true);
}

void RegisterNetworkRecoverDelegate(NetworkRecoverDelegate * delegate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API naming makes it sound like multiple delegates can be registered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is used for each platform to register its own NetworkRecoverDelegate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is only one delegate that can be registered at a time in the implementation, as far as I could see.

/**
* @brief
* Discover all devices advertising as recoverable.
* Should be called on main loop thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't generally prescribe anything about threads on which things should be called. The rule is that you have to use whatever synchronization you use for the rest of the Matter API (which may be locks, using a single thread, or some other things).

I know you copied this comment from elsewhere in the file, but we should just fix those bits instead of copying them.

{
public:
NetworkRecover(DeviceCommissioner * commissioner) : mCommissioner(commissioner) {}
virtual ~NetworkRecover() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so what happens if a NetworkRecover is destroyed while its async callbacks are waiting to be called? When those get called we get use-after-free, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NetworkRecover instance will never be destroyed because it is constructed when DeviceController initialize, should be only one NetworkRecover object all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NetworkRecover instance will never be destroyed because it is constructed when DeviceController initialize

Yes, and DeviceControllers can be destroyed....

@dengpan-liu dengpan-liu changed the title chip-tool newtork recovery example Linux chip-tool network recovery feature example Apr 16, 2025
Copy link

PR #38421: Size comparison from fe31731 to b17a037

Full report (3 builds for cc32xx, stm32)
platform target config section fe31731 b17a037 change % change
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 542142 542334 192 0.0
RAM 205144 205144 0 0.0
lock CC3235SF_LAUNCHXL FLASH 576170 576298 128 0.0
RAM 205392 205392 0 0.0
stm32 light STM32WB5MM-DK FLASH 461384 461008 -376 -0.1
RAM 141488 141488 0 0.0

@github-actions github-actions bot added the tizen For Tizen platform label Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

6 participants