controller: add commissioner-side commissioning building blocks#456
controller: add commissioner-side commissioning building blocks#456glennswest wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the Matter controller role, adding end-to-end commissioning orchestration, onboarding payload decoders for manual and QR codes, and persistence support for fabric credentials. The review feedback identifies several critical issues regarding TLV encoding and decoding, specifically the need to use OctetString types for nonces and certificates to avoid interoperability and validation failures. Further improvements were suggested to ensure proper response verification during the fail-safe arming process and to increase certificate buffer sizes to prevent capacity errors.
| // CommandFields struct at CmdDataTag::Data | ||
| // containing field 0: CSRNonce (32-byte octstr). | ||
| w.start_struct(&TLVTag::Context(CmdDataTag::Data as u8))?; | ||
| w.str(&TLVTag::Context(0), csr_nonce)?; |
There was a problem hiding this comment.
The CSRNonce field in Matter is an OctetString, not a UTF8String. Using w.str (which typically encodes as UTF-8 in this crate) for a random 32-byte nonce will likely fail if the nonce contains non-UTF-8 sequences, or it will encode the wrong TLV type tag, causing interoperability issues with strict responders.
| w.str(&TLVTag::Context(0), csr_nonce)?; | |
| w.octets(&TLVTag::Context(0), csr_nonce)?; |
| w.str(&TLVTag::Context(0), noc)?; | ||
| w.str(&TLVTag::Context(1), icac)?; | ||
| w.str(&TLVTag::Context(2), ipk)?; |
There was a problem hiding this comment.
The fields NOCValue, ICACValue, and IPKValue are defined as OctetString in the Matter specification (§11.18.6.8). They should be encoded using w.octets rather than w.str to ensure the correct TLV type tag is used and to avoid potential UTF-8 validation errors on binary certificate data.
| w.str(&TLVTag::Context(0), noc)?; | |
| w.str(&TLVTag::Context(1), icac)?; | |
| w.str(&TLVTag::Context(2), ipk)?; | |
| w.octets(&TLVTag::Context(0), noc)?; | |
| w.octets(&TLVTag::Context(1), icac)?; | |
| w.octets(&TLVTag::Context(2), ipk)?; |
| // CommandFields struct at CmdDataTag::Data | ||
| // containing field 0: RootCACertificate (octstr). | ||
| w.start_struct(&TLVTag::Context(CmdDataTag::Data as u8))?; | ||
| w.str(&TLVTag::Context(0), rcac_tlv)?; |
There was a problem hiding this comment.
| .map_err(ControllerError::from)?; | ||
| let csr_nonce = root | ||
| .ctx(2) | ||
| .map_err(ControllerError::from)? | ||
| .str() |
There was a problem hiding this comment.
The csr and CSRNonce fields in the NOCSRElements structure are OctetString types. Calling .str() on a TLVElement that contains an OctetString will return an InvalidDataType error in rs-matter because it expects a UTF8String tag. Use .octets() instead.
let csr_der = root
.ctx(1)
.map_err(ControllerError::from)?
.octets()
.map_err(ControllerError::from)?;
let csr_nonce = root
.ctx(2)
.map_err(ControllerError::from)?
.octets()
.map_err(ControllerError::from)?;| let mut got_response = false; | ||
| loop { | ||
| if !got_response { | ||
| if let Some(resp) = chunk.response().map_err(ControllerError::from)? { | ||
| for (_endpoint, r) in resp.responses::<ArmFailSafeResponse>( | ||
| CL_GENERAL_COMMISSIONING, | ||
| // InvokeResponseIB carries the *response* command ID, | ||
| // not the request — ArmFailSafe(0x00) → | ||
| // ArmFailSafeResponse(0x01). | ||
| CMD_ARM_FAIL_SAFE_RESPONSE, | ||
| ) { | ||
| match r { | ||
| Ok(afs) => { | ||
| let code = afs.error_code().map_err(ControllerError::from)?; | ||
| if (code as u8) != 0 { | ||
| return Err(ControllerError::FailSafeExpired); | ||
| } | ||
| got_response = true; | ||
| break; | ||
| } | ||
| Err(e) => return Err(ControllerError::Inner(e)), | ||
| } | ||
| } | ||
| } | ||
| } | ||
| match chunk.complete().await.map_err(ControllerError::from)? { | ||
| Some(next) => chunk = next, | ||
| None => break, | ||
| } | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
The arm_fail_safe function does not verify that a valid ArmFailSafeResponse was actually received and processed. If the device returns a StatusResponse or an unexpected command ID, the loop will terminate and the function will return Ok(()), leading the controller to believe the fail-safe was armed when it might not have been. Since ArmFailSafe is not a DefaultSuccess command, it must return a specific response struct.
let mut got_response = false;
loop {
if !got_response {
if let Some(resp) = chunk.response().map_err(ControllerError::from)? {
for (_endpoint, r) in resp.responses::<ArmFailSafeResponse>(
CL_GENERAL_COMMISSIONING,
CMD_ARM_FAIL_SAFE_RESPONSE,
) {
match r {
Ok(afs) => {
let code = afs.error_code().map_err(ControllerError::from)?;
if (code as u8) != 0 {
return Err(ControllerError::FailSafeExpired);
}
got_response = true;
break;
}
Err(e) => return Err(ControllerError::Inner(e)),
}
}
}
}
match chunk.complete().await.map_err(ControllerError::from)? {
Some(next) => chunk = next,
None => break,
}
}
if got_response {
Ok(())
} else {
Err(ControllerError::PaseFailed)
}| pub noc_der: heapless::Vec<u8, 400>, | ||
| pub icac_der: heapless::Vec<u8, 400>, |
There was a problem hiding this comment.
While the Matter specification mentions a 400-byte limit for NOCs, actual certificates (especially with CATs or vendor extensions) can exceed this. The crate uses MAX_CERT_TLV_LEN (600 bytes) internally for certificate storage. Using a smaller limit here may cause BufferTooSmall errors during commissioning for valid certificates issued by the controller's own CA.
| pub noc_der: heapless::Vec<u8, 400>, | |
| pub icac_der: heapless::Vec<u8, 400>, | |
| pub noc_der: heapless::Vec<u8, { crate::cert::MAX_CERT_TLV_LEN }>, | |
| pub icac_der: heapless::Vec<u8, { crate::cert::MAX_CERT_TLV_LEN }>, |
| let mut noc_out: heapless::Vec<u8, 400> = heapless::Vec::new(); | ||
| noc_out.extend_from_slice(&device_creds.noc).map_err(|_| { | ||
| ControllerError::Inner(crate::error::Error::new( | ||
| crate::error::ErrorCode::BufferTooSmall, | ||
| )) | ||
| })?; | ||
| let mut icac_out: heapless::Vec<u8, 400> = heapless::Vec::new(); |
There was a problem hiding this comment.
The output buffers for noc_der and icac_der should match the internal MAX_CERT_TLV_LEN to ensure that valid certificates generated by the FabricCredentials can be returned without truncation or capacity errors.
| let mut noc_out: heapless::Vec<u8, 400> = heapless::Vec::new(); | |
| noc_out.extend_from_slice(&device_creds.noc).map_err(|_| { | |
| ControllerError::Inner(crate::error::Error::new( | |
| crate::error::ErrorCode::BufferTooSmall, | |
| )) | |
| })?; | |
| let mut icac_out: heapless::Vec<u8, 400> = heapless::Vec::new(); | |
| let mut noc_out: heapless::Vec<u8, { crate::cert::MAX_CERT_TLV_LEN }> = heapless::Vec::new(); | |
| noc_out.extend_from_slice(&device_creds.noc).map_err(|_| { | |
| ControllerError::Inner(crate::error::Error::new( | |
| crate::error::ErrorCode::BufferTooSmall, | |
| )) | |
| })?; | |
| let mut icac_out: heapless::Vec<u8, { crate::cert::MAX_CERT_TLV_LEN }> = heapless::Vec::new(); |
4e6917b to
dabfefc
Compare
Introduces a new `controller` module that provides the IM-invoke and
orchestration primitives a controller needs to drive a Matter accessory
from a freshly-established PASE session through to
CommissioningComplete.
The accessory role is already well-covered in this crate. This module
is the inverse role — the thing that *commissions* accessories — and
adds a deliberately small, validated surface:
Public API (controller::commissioner):
- arm_fail_safe(matter, expiry_seconds, breadcrumb)
GeneralCommissioning::ArmFailSafe over PASE, decodes
ArmFailSafeResponse.errorCode → maps non-OK to FailSafeExpired.
- csr_request(matter, &csr_nonce) -> CsrPayload
OperationalCredentials::CSRRequest with full CSRResponse decode.
Returns NOCSRElements (octstr<400>) + AttestationSignature (64 B).
- decode_nocsr_elements(blob) -> DecodedNocsr<'_>
TLV decoder for the NOCSRElements struct (§11.18.6.5.2) — pulls
the PKCS#10 CSR DER and the device's nonce-echo so the controller
can verify nonce freshness before issuing a NOC.
- add_noc(matter, noc, icac, ipk, admin_subject, admin_vendor_id)
OperationalCredentials::AddNOC with NOCResponse status +
FabricIndex decode. Non-OK status → AddNocRejected.
- commission_pase(matter, crypto, fabric_creds, admin_subject,
admin_vendor_id, fail_safe_secs) -> PaseCommissionResult
Orchestrator that chains the above plus the two unexposed steps
(AddTrustedRootCertificate, CommissioningComplete) and calls
FabricCredentials::generate_device_credentials to mint the NOC.
Returns the device-assigned FabricIndex + NodeID + cert chain.
Public API (controller::setup_code):
- Manual pairing-code + QR-code (MT:...) parser per Matter spec §5.1.4,
with version / vendor-id / product-id / discriminator / passcode /
discovery-capabilities-bitmask decoding.
Supporting additions to existing files:
- commissioner::FabricCredentials gains `root_secret_key()`,
`rcac_id()`, and `from_persisted(...)` so a controller can
persist its CA material and reload it on next boot.
- commissioner::NocGenerator gains the matching `root_secret_key()`
accessor.
What is intentionally *not* in this PR (planned follow-ups):
- BLE central + BTP framing (the bootstrap transport).
- DCL fetch + Device Attestation chain verification.
- NetworkCommissioning cluster (Thread / Wi-Fi credential delivery).
- Operational discovery (_matter._tcp mDNS-SD client) + CASE.
- A higher-level Commissioner state machine wrapping all of the above.
Each of those is its own design conversation and they don't belong in
a single 3000-line PR.
Validation
----------
End-to-end test: examples/src/bin/pase_smoke_test.rs. With this PR
plus project-chip#454 (cert: NotBefore=0 sentinel fix — required for CHIP interop)
applied, the smoke test commissions chip-tool's all-clusters-app
end-to-end:
✓ PASE handshake
✓ ArmFailSafe(60s)
✓ CSRRequest → 243B NOCSRElements + 64B AttestationSignature
✓ NOCSRElements decoded; nonce verified
✓ AddTrustedRootCertificate
✓ AddNOC → device returns fabric_index=1
✓ CommissioningComplete
Responder log (chip-tool all-clusters-app):
OpCreds: successfully created fabric index 0x1 via AddNOC
`cargo test -p rs-matter --lib --features=os,zbus` passes (532
existing tests; no behavior change to existing crates).
Without project-chip#454 the smoke test will still complete PASE/ArmFailSafe/
CSRRequest but AddTrustedRootCertificate is rejected by CHIP as
'Invalid signature' — see project-chip#454 for the root cause.
dabfefc to
f7fe853
Compare
|
PR #456: Size comparison from da2d79e to f7fe853 Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
|
|
Same here - please assess the code review feedback from Gemini, then I'll review. |
|
Also a review from Opus 4.7 below. I think some of the comments from Opus are a bit nit-picking, but quite a few are good. My feeling is this PR needs а few more iterations, but given that the author fronting the LLM that created this PR is not replying to any pings from my side, and not addressing even the initial Gemini code review, I'll likely close.
================= PR #456: Controller-Side Commissioning Building BlocksWhat it doesAdds an inverse-direction module ( Files added:
Files touched:
The flow (
Returns My pros
My cons
|
Similar in spirit to #456. This adds an (initial) commissioning flow to rs-matter. My feeling is that for the controller/commissioner use-case we should initially only provide "the building blocks" (unlike for devices where we are almost at a plug-your-RGB-hardware-and-play). That's because for now it seems to me that each controller and commissioner use case might be a bit unique, and I don't know to what extent we want to decide how the controller/commissioner should be arranged end-to-end on behalf of the user (meaning, we don't want to). The commissioning flow (`Commissioner` and `NocGenerator`) is covered with: - An extended e2e test (the pre-existing `commissioning.rs` module in test/ which - however - only did exercise a handful of the commissioning steps previously. - A new integration test against the "all-clusters-app" executable from the C++ SDK - `commissioning_tests` - where this test completes successfully only if the app reports it had been successfully commissioned. I also took the opportunity to rename a few modules: - `commissioning` became `onboard` even if the latter is less precise because it is much shorter - `credentials` - which is really about **attestation** credentials (and not only credentials but really about attestation) became `attest` as this is more concrete and shorter ==== Forgot to add: there were some heavy `heapless::Vec`s returned and allocated on-stack throughout the commissioning code. These are now gone.
Summary
Introduces a new
controllermodule that provides the IM-invoke and orchestration primitives a controller needs to drive a Matter accessory from a freshly-established PASE session through toCommissioningComplete.The accessory role is already well-covered in this crate. This PR adds the inverse — the role that commissions accessories — with a deliberately small, validated surface. Larger follow-ups (BLE, attestation chain, network commissioning, operational discovery + CASE, a state-machine wrapper) are explicitly scoped out so this can land as a focused review.
Public API
controller::commissionerarm_fail_safeinvokesGeneralCommissioning::ArmFailSafeover PASE and decodesArmFailSafeResponse.errorCode(non-OK →FailSafeExpired).csr_requestdoesOperationalCredentials::CSRRequestand decodesCSRResponse(returnsNOCSRElements+AttestationSignature).decode_nocsr_elementswalks the TLV struct from §11.18.6.5.2 and returns the borrowed PKCS#10 CSR DER + the device's nonce-echo so callers can verify freshness.add_nocdoesOperationalCredentials::AddNOCand decodesNOCResponse(non-OK status →AddNocRejected); returns the device-assignedFabricIndex.commission_pasechains all of the above plusAddTrustedRootCertificateandCommissioningComplete, callingcrate::commissioner::FabricCredentials::generate_device_credentialsto mint the NOC.controller::setup_code— manual pairing-code + QR-code (MT:…) parser per Matter spec §5.1.4.Existing-file additions (small):
commissioner::FabricCredentials::root_secret_key(),::rcac_id(),::from_persisted(...)— persistence accessors so a controller can store CA material and reload it on next boot.commissioner::NocGenerator::root_secret_key()— matching accessor on the inner type.Out of scope (planned follow-ups)
Each of these is its own design conversation and doesn't belong in a single PR:
AttestationRequest/CertificateChainRequest).NetworkCommissioningcluster (Thread / Wi-Fi credential delivery)._matter._tcpmDNS-SD client) + CASE establishment.Validation
examples/src/bin/pase_smoke_test.rsis included as both an example and an interop smoke test. Drives PASE +commission_paseagainst an external responder on[::1]:5540with passcode 20202021.With this PR plus #454 (
fix(cert): NotBefore=0 collides with CHIP no-expiry sentinel— required for CHIP signature interop), the smoke test commissions chip-tool'sall-clusters-append-to-end:Responder log:
Without #454 the smoke test still completes PASE / ArmFailSafe / CSRRequest, but
AddTrustedRootCertificateis rejected by CHIP asInvalid signature— see #454 for the root cause.Test plan
cargo build -p rs-matter --features=os,zbus— clean, zero warnings on the controller crate.cargo test -p rs-matter --lib --features=os,zbus— 532 tests pass (no behavior change to existing modules).cargo build --release -p rs-matter-examples --bin pase_smoke_test— clean.chip-toolall-clusters-app(with fix(cert): NotBefore=0 collides with CHIP no-expiry sentinel #454 layered in).