Skip to content

fix: substraction issue in transport#460

Closed
skoky wants to merge 2 commits into
project-chip:mainfrom
skoky:subst_wrapping_fix
Closed

fix: substraction issue in transport#460
skoky wants to merge 2 commits into
project-chip:mainfrom
skoky:subst_wrapping_fix

Conversation

@skoky
Copy link
Copy Markdown

@skoky skoky commented May 23, 2026

Please cnsider this small AI generated fix that has solved my problem with commissioning. Thanks.

Fix: Wrapping Subtraction in Message Counter Dedup

Problem

RxCtrState::post_recv in src/transport/dedup.rs computed the signed difference
between two message counters using plain as i32 casts followed by subtraction:

let idiff = (msg_ctr as i32) - (self.max_ctr as i32);

Matter message counters are u32 values initialised to a random number. When
max_ctr is larger than i32::MAX (2 147 483 647), casting it to i32 produces
a negative value. Subtracting that negative from a small positive msg_ctr can
exceed i32::MAX, which causes a panic in Rust debug builds
(attempt to subtract with overflow).

In practice this was triggered during CASE (Sigma) session establishment right
after a successful commissioning sequence (ArmFailSafe ? CSR ? AddTrustedRootCert
? AddNOC), because the new operational session started with a fresh random counter
that happened to land in the problematic range relative to the stored max_ctr.

Fix

Changed the subtraction to use wrapping_sub, which matches the intended modular
counter arithmetic and is safe for all u32 input values:

let idiff = (msg_ctr as i32).wrapping_sub(self.max_ctr as i32);

This produces the correct signed distance between the two counters across the
entire u32 range without panicking in debug mode or producing undefined behaviour
in release mode.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 23, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the message deduplication logic in rs-matter to use wrapping subtraction for message counter differences, preventing potential panics in debug builds. The reviewer suggested a more idiomatic implementation of the wrapping arithmetic and recommended updating the forward-check logic to correctly handle 32-bit counter wrap-around by utilizing the signed difference.

Comment thread rs-matter/src/transport/dedup.rs Outdated
@github-actions
Copy link
Copy Markdown

PR #460: Size comparison from b23a6ab to c733487

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
platform target config section b23a6ab c733487 change % change
(core) riscv32imac-unknown-none-elf infodefmt-optz-ltofat FLASH 446972 446972 0 0.0
RAM 71056 71056 0 0.0
thumbv6m-none-eabi infodefmt-optz-ltofat FLASH 361528 361528 0 0.0
RAM 66628 66628 0 0.0
thumbv7em-none-eabi infodefmt-optz-ltofat FLASH 339296 339296 0 0.0
RAM 66404 66404 0 0.0
x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 875171 875171 0 0.0
RAM 71386 71386 0 0.0
dimmable-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 2054256 2054256 0 0.0
RAM 60368 60368 0 0.0
onoff-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1981256 1981256 0 0.0
RAM 59536 59536 0 0.0
onoff-light-bt x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 3364912 3364912 0 0.0
RAM 5776 5776 0 0.0
speaker x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 2016808 2016808 0 0.0
RAM 5472 5472 0 0.0

@ivmarkov
Copy link
Copy Markdown
Contributor

@skoky Please look at the suggestions from Gemini in its code review. Gemini sometimes suggests wrong things, but in either case, all comments must be addressed:

  • If comments are valid -> fix the code
  • If comments are invalid -> explain why

The above can also be done by whatever llm you are using, but must be done.

@ivmarkov
Copy link
Copy Markdown
Contributor

Also, you must sign the CLA, or else I cannot merge your changes.

Improving subst logic

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

PR #460: Size comparison from b23a6ab to f3d3261

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
platform target config section b23a6ab f3d3261 change % change
(core) riscv32imac-unknown-none-elf infodefmt-optz-ltofat FLASH 446972 446972 0 0.0
RAM 71056 71056 0 0.0
thumbv6m-none-eabi infodefmt-optz-ltofat FLASH 361528 361528 0 0.0
RAM 66628 66628 0 0.0
thumbv7em-none-eabi infodefmt-optz-ltofat FLASH 339296 339296 0 0.0
RAM 66404 66404 0 0.0
x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 875171 875171 0 0.0
RAM 71386 71386 0 0.0
dimmable-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 2054256 2054256 0 0.0
RAM 60368 60368 0 0.0
onoff-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1981256 1981256 0 0.0
RAM 59536 59536 0 0.0
onoff-light-bt x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 3364912 3364912 0 0.0
RAM 5776 5776 0 0.0
speaker x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 2016808 2016808 0 0.0
RAM 5472 5472 0 0.0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a debug-build overflow/panic in transport message-counter deduplication by changing how the signed counter delta is computed in RxCtrState::post_recv, making the subtraction safe across the full u32 range.

Changes:

  • Compute the counter difference using u32::wrapping_sub before casting to i32, avoiding debug overflow panics in counter delta calculation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs-matter/src/transport/dedup.rs
@ivmarkov
Copy link
Copy Markdown
Contributor

@ivmarkov ivmarkov closed this May 25, 2026
@ivmarkov
Copy link
Copy Markdown
Contributor

ivmarkov commented May 25, 2026

@skoky One more thing and this thing is VERY important!:

What controller/commissioner do you use?

Why?

Because hitting the integer overflow in debug - while it should not happen - means likely the controller/commissioner has a bug as well.

The fact that you have such a big message counter (that it causes an i32 overflow) is just not normal.

I ran my concern with the AI, and here's the outcome:

===

My question:

Look what another AI produced as a diagnostics of the original overflow-in-debug:

"'
... (your subject)
"

What I'm really surprised by, is that it says
"Matter message counters are u32 values initialised to a random number".

Yes, they SHOULD be initialized with a random number, but MUST always be &-ed with 0x0fff_ffff, right? So to me getting a so-big msg_ctr or self.max_ctr sounds almost impossible?

What do you think?

Its answer:

You're right and the diagnostic is incomplete. Let me lay out what's actually true.

What the spec mandates (§4.6.1.1)

All message counters SHALL be initialized with a random value using the Crypto_DRBG(len = 28) + 1 primitive. The random initializer ranges from 1 to 2²⁸.

So 0x0fffffff isn't arbitrary — it's what the spec requires. A spec-compliant peer's initial counter is always in [1, 2²⁸]. The diagnostic saying "initialised to a random number" without the 28-bit qualifier is misleading.

Why max_ctr on the receiver should also stay small

A fresh CASE/PASE session initializes RxCtrState::new(0). That matches spec §11422 / §11568 ("set the synchronized max_message_counter of the peer to 0"). So at session start: max_ctr = 0 on our side, and the peer's first message brings it to whatever the peer sent — which for a compliant peer is in [1, 2²⁸].

For the panic (msg_ctr as i32) - (max_ctr as i32) to overflow i32, we need one side ≥ 2³¹ and the other near zero — e.g. msg_ctr = 0x8000_0000, max_ctr = 1 gives i32::MIN - 1, panic.

So under a strict spec-compliant scenario, you're right — this should be unreachable in commissioning.

The diagnostic's narrative ("the new operational session started with a fresh random counter that happened to land in the problematic range") is implausible given a [1, 2²⁸] initial range. Either the diagnostic is wrong about the exact trigger, or the peer isn't fully spec-compliant.

So what likely caused the original report

A guess, not a claim: it was probably a non-compliant peer (or a test/fuzz harness, or a packet replay with crafted counters during commissioning debugging). The diagnostic's framing of "the new session started with a fresh random counter that happened to land in the problematic range" is the AI rationalising what it saw — but a spec-compliant DRBG(28)+1 literally cannot land there. Worth being skeptical of that part of the diagnostic, while accepting that the panic itself is real and the fix is correct.

My next question, a correctness of a wrong assumption on its side

You are missing that we actually do not initialize the counter with 0 always, but with a true random value between 0 and 0x0fff_ffff. Look at Sessions::add body and how it is called.

But I think even if we correctly initialize our sessions with a random value between 0 and 0x0fff_ffff, your analysis from above that we should not get an i32 overflow if the other peer is compliant still holds, right?

Answer:

Right, I see — Sessions::add passes rand.next_u32() and Session::init masks it with MATTER_MSG_CTR_RANGE, so our local tx counter is uniform in [0, 2²⁸). (Off-by-one vs the spec's DRBG(28)+1 = [1, 2²⁸], but within tolerance.)

And yes, my analysis holds. Here's why this changes nothing for the panic reachability:

The i32 overflow in RxCtrState::post_recv depends only on the peer's counters, not ours:

msg_ctr is whatever the peer transmitted.
self.max_ctr is the running max of the peer's transmitted counters (seeded at 0, advanced by post_recv).
Our own tx counter never feeds into RxCtrState on this side — it goes to the peer's dedup state, not ours. So whether we use rand.next_u32() & 0x0fff_ffff or any other masking scheme on our outgoing side has zero effect on whether our RxCtrState can overflow i32.

The constraint that matters is: does the peer keep its counter in [1, 2²⁸] initially and increment from there? If yes (spec-compliant), then within a single session max_ctr stays ≤ 2²⁸ + N (where N is the message count), and you'd need ~2³¹ - 2²⁸ ≈ 1.9 billion messages on one session before the overflow becomes reachable. Not happening during commissioning.

So the conclusion is the same: with a spec-compliant peer, the original i32 arithmetic doesn't overflow in unicast RxCtrState during any realistic session lifetime.

@skoky
Copy link
Copy Markdown
Author

skoky commented May 25, 2026

@ivmarkov I am using rust-matc crate custom app similar to example provided with the crate. I will check of there is any issue in there. Thanks for noticing.

@skoky
Copy link
Copy Markdown
Author

skoky commented May 28, 2026

@ivmarkov It was fixed in the controller rust-matc now here tom-code/rust-matc#10 Thanks for pointing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants