fix: substraction issue in transport by skoky · Pull Request #460 · project-chip/rs-matter

skoky · 2026-05-23T14:37:28Z

Please cnsider this small AI generated fix that has solved my problem with commissioning. Thanks.

Fix: Wrapping Subtraction in Message Counter Dedup

Problem

RxCtrState::post_recv in src/transport/dedup.rs computed the signed difference
between two message counters using plain as i32 casts followed by subtraction:

let idiff = (msg_ctr as i32) - (self.max_ctr as i32);

Matter message counters are u32 values initialised to a random number. When
max_ctr is larger than i32::MAX (2 147 483 647), casting it to i32 produces
a negative value. Subtracting that negative from a small positive msg_ctr can
exceed i32::MAX, which causes a panic in Rust debug builds
(attempt to subtract with overflow).

In practice this was triggered during CASE (Sigma) session establishment right
after a successful commissioning sequence (ArmFailSafe ? CSR ? AddTrustedRootCert
? AddNOC), because the new operational session started with a fresh random counter
that happened to land in the problematic range relative to the stored max_ctr.

Fix

Changed the subtraction to use wrapping_sub, which matches the intended modular
counter arithmetic and is safe for all u32 input values:

let idiff = (msg_ctr as i32).wrapping_sub(self.max_ctr as i32);

This produces the correct signed distance between the two counters across the
entire u32 range without panicking in debug mode or producing undefined behaviour
in release mode.

CLAassistant · 2026-05-23T14:37:35Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request updates the message deduplication logic in rs-matter to use wrapping subtraction for message counter differences, preventing potential panics in debug builds. The reviewer suggested a more idiomatic implementation of the wrapping arithmetic and recommended updating the forward-check logic to correctly handle 32-bit counter wrap-around by utilizing the signed difference.

github-actions · 2026-05-23T15:16:22Z

PR #460: Size comparison from b23a6ab to c733487

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)

platform	target	config	section	`b23a6ab`	`c733487`
(core)	riscv32imac-unknown-none-elf	infodefmt-optz-ltofat	FLASH	446972	446972
			RAM	71056	71056
	thumbv6m-none-eabi	infodefmt-optz-ltofat	FLASH	361528	361528
			RAM	66628	66628
	thumbv7em-none-eabi	infodefmt-optz-ltofat	FLASH	339296	339296
			RAM	66404	66404
	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	875171	875171
			RAM	71386	71386
dimmable-light	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	2054256	2054256
			RAM	60368	60368
onoff-light	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	1981256	1981256
			RAM	59536	59536
onoff-light-bt	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	3364912	3364912
			RAM	5776	5776
speaker	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	2016808	2016808
			RAM	5472	5472

ivmarkov · 2026-05-24T10:35:44Z

@skoky Please look at the suggestions from Gemini in its code review. Gemini sometimes suggests wrong things, but in either case, all comments must be addressed:

If comments are valid -> fix the code
If comments are invalid -> explain why

The above can also be done by whatever llm you are using, but must be done.

ivmarkov · 2026-05-24T10:36:59Z

Also, you must sign the CLA, or else I cannot merge your changes.

Improving subst logic Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

github-actions · 2026-05-24T19:36:49Z

PR #460: Size comparison from b23a6ab to f3d3261

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)

platform	target	config	section	`b23a6ab`	`f3d3261`
(core)	riscv32imac-unknown-none-elf	infodefmt-optz-ltofat	FLASH	446972	446972
			RAM	71056	71056
	thumbv6m-none-eabi	infodefmt-optz-ltofat	FLASH	361528	361528
			RAM	66628	66628
	thumbv7em-none-eabi	infodefmt-optz-ltofat	FLASH	339296	339296
			RAM	66404	66404
	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	875171	875171
			RAM	71386	71386
dimmable-light	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	2054256	2054256
			RAM	60368	60368
onoff-light	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	1981256	1981256
			RAM	59536	59536
onoff-light-bt	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	3364912	3364912
			RAM	5776	5776
speaker	x86_64-unknown-linux-gnu	infologs-optz-ltofat	FLASH	2016808	2016808
			RAM	5472	5472

Copilot

Pull request overview

Fixes a debug-build overflow/panic in transport message-counter deduplication by changing how the signed counter delta is computed in RxCtrState::post_recv, making the subtraction safe across the full u32 range.

Changes:

Compute the counter difference using u32::wrapping_sub before casting to i32, avoiding debug overflow panics in counter delta calculation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ivmarkov · 2026-05-25T12:25:19Z

@skoky I'm closing in favor of #462:

Fix potential panic on sub of max_ctr and msg_ctr; fix logic for group counters which can rollover #462 avoids using i32 completely
Fix potential panic on sub of max_ctr and msg_ctr; fix logic for group counters which can rollover #462 fixes another problem, specific to group counters (they can roll-over)

ivmarkov · 2026-05-25T12:34:10Z

@skoky One more thing and this thing is VERY important!:

What controller/commissioner do you use?

Why?

Because hitting the integer overflow in debug - while it should not happen - means likely the controller/commissioner has a bug as well.

The fact that you have such a big message counter (that it causes an i32 overflow) is just not normal.

I ran my concern with the AI, and here's the outcome:

===

My question:

Look what another AI produced as a diagnostics of the original overflow-in-debug:

"'
... (your subject)
"

What I'm really surprised by, is that it says
"Matter message counters are u32 values initialised to a random number".

Yes, they SHOULD be initialized with a random number, but MUST always be &-ed with 0x0fff_ffff, right? So to me getting a so-big msg_ctr or self.max_ctr sounds almost impossible?

What do you think?

Its answer:

You're right and the diagnostic is incomplete. Let me lay out what's actually true.

What the spec mandates (§4.6.1.1)

All message counters SHALL be initialized with a random value using the Crypto_DRBG(len = 28) + 1 primitive. The random initializer ranges from 1 to 2²⁸.

So 0x0fffffff isn't arbitrary — it's what the spec requires. A spec-compliant peer's initial counter is always in [1, 2²⁸]. The diagnostic saying "initialised to a random number" without the 28-bit qualifier is misleading.

Why max_ctr on the receiver should also stay small

A fresh CASE/PASE session initializes RxCtrState::new(0). That matches spec §11422 / §11568 ("set the synchronized max_message_counter of the peer to 0"). So at session start: max_ctr = 0 on our side, and the peer's first message brings it to whatever the peer sent — which for a compliant peer is in [1, 2²⁸].

For the panic (msg_ctr as i32) - (max_ctr as i32) to overflow i32, we need one side ≥ 2³¹ and the other near zero — e.g. msg_ctr = 0x8000_0000, max_ctr = 1 gives i32::MIN - 1, panic.

So under a strict spec-compliant scenario, you're right — this should be unreachable in commissioning.

The diagnostic's narrative ("the new operational session started with a fresh random counter that happened to land in the problematic range") is implausible given a [1, 2²⁸] initial range. Either the diagnostic is wrong about the exact trigger, or the peer isn't fully spec-compliant.

So what likely caused the original report

A guess, not a claim: it was probably a non-compliant peer (or a test/fuzz harness, or a packet replay with crafted counters during commissioning debugging). The diagnostic's framing of "the new session started with a fresh random counter that happened to land in the problematic range" is the AI rationalising what it saw — but a spec-compliant DRBG(28)+1 literally cannot land there. Worth being skeptical of that part of the diagnostic, while accepting that the panic itself is real and the fix is correct.

My next question, a correctness of a wrong assumption on its side

You are missing that we actually do not initialize the counter with 0 always, but with a true random value between 0 and 0x0fff_ffff. Look at Sessions::add body and how it is called.

But I think even if we correctly initialize our sessions with a random value between 0 and 0x0fff_ffff, your analysis from above that we should not get an i32 overflow if the other peer is compliant still holds, right?

Answer:

Right, I see — Sessions::add passes rand.next_u32() and Session::init masks it with MATTER_MSG_CTR_RANGE, so our local tx counter is uniform in [0, 2²⁸). (Off-by-one vs the spec's DRBG(28)+1 = [1, 2²⁸], but within tolerance.)

And yes, my analysis holds. Here's why this changes nothing for the panic reachability:

The i32 overflow in RxCtrState::post_recv depends only on the peer's counters, not ours:

msg_ctr is whatever the peer transmitted.
self.max_ctr is the running max of the peer's transmitted counters (seeded at 0, advanced by post_recv).
Our own tx counter never feeds into RxCtrState on this side — it goes to the peer's dedup state, not ours. So whether we use rand.next_u32() & 0x0fff_ffff or any other masking scheme on our outgoing side has zero effect on whether our RxCtrState can overflow i32.

The constraint that matters is: does the peer keep its counter in [1, 2²⁸] initially and increment from there? If yes (spec-compliant), then within a single session max_ctr stays ≤ 2²⁸ + N (where N is the message count), and you'd need ~2³¹ - 2²⁸ ≈ 1.9 billion messages on one session before the overflow becomes reachable. Not happening during commissioning.

So the conclusion is the same: with a spec-compliant peer, the original i32 arithmetic doesn't overflow in unicast RxCtrState during any realistic session lifetime.

skoky · 2026-05-25T19:43:47Z

@ivmarkov I am using rust-matc crate custom app similar to example provided with the crate. I will check of there is any issue in there. Thanks for noticing.

skoky · 2026-05-28T06:55:04Z

@ivmarkov It was fixed in the controller rust-matc now here tom-code/rust-matc#10 Thanks for pointing out.

fix: substraction issue in transport

c733487

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

Comment thread rs-matter/src/transport/dedup.rs Outdated

Update rs-matter/src/transport/dedup.rs

f3d3261

Improving subst logic Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ivmarkov requested a review from Copilot May 25, 2026 07:52

Copilot started reviewing on behalf of ivmarkov May 25, 2026 07:52 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread rs-matter/src/transport/dedup.rs

ivmarkov closed this May 25, 2026

skoky mentioned this pull request May 26, 2026

Potential issue reported from rs-matter tom-code/rust-matc#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: substraction issue in transport#460

fix: substraction issue in transport#460
skoky wants to merge 2 commits into
project-chip:mainfrom
skoky:subst_wrapping_fix

skoky commented May 23, 2026

Uh oh!

CLAassistant commented May 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

ivmarkov commented May 24, 2026

Uh oh!

ivmarkov commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

ivmarkov commented May 25, 2026

Uh oh!

ivmarkov commented May 25, 2026 •

edited

Loading

Uh oh!

skoky commented May 25, 2026 •

edited

Loading

Uh oh!

skoky commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

skoky commented May 23, 2026

Fix: Wrapping Subtraction in Message Counter Dedup

Problem

Fix

Uh oh!

CLAassistant commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

ivmarkov commented May 24, 2026

Uh oh!

ivmarkov commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

ivmarkov commented May 25, 2026

Uh oh!

ivmarkov commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What the spec mandates (§4.6.1.1)

Why max_ctr on the receiver should also stay small

So what likely caused the original report

Uh oh!

skoky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skoky commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented May 23, 2026 •

edited

Loading

ivmarkov commented May 25, 2026 •

edited

Loading

skoky commented May 25, 2026 •

edited

Loading