Skip to content

Conversation

@mdecourcy
Copy link

ROUTER_LATE nodes intentionally delay rebroadcasts to provide coverage. This delay can be up to ~7 seconds (using getTxDelayMsecWeightedWorst with pow_of_2(CWmax) * slotTimeMsec).

The retransmission timeout was previously ~5-6 seconds, causing the sender to hit MAX_RETRANSMIT and report failure before ROUTER_LATE nodes could rebroadcast. This resulted in false negative acknowledgments where messages successfully propagated but the sender incorrectly reported delivery failure.

Extended the timeout calculation to account for worst-case ROUTER_LATE delay by changing from pow_of_2((CWmax+CWmin)/2) to pow_of_2(CWmax), adding approximately 4.5-11 seconds depending on modem preset.

This prevents clients from implementing unnecessary retry logic that creates duplicate messages on the mesh.

Closes #9185

🤝 Attestations

  • [ x ] I have tested that my proposed changes behave as described.

  • [ x ] I have tested that my proposed changes do not cause any obvious regressions on the following devices:

    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • [ x ] Other (please specify below)

    Rak Wismesh Tag, Seeed Solar Node

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

@mdecourcy, Welcome to Meshtastic!

Thanks for opening your first pull request. We really appreciate it.

We discuss work as a team in discord, please join us in the #firmware channel.
There's a big backlog of patches at the moment. If you have time,
please help us with some code review and testing of other PRs!

Welcome to the team 😄

@mdecourcy mdecourcy force-pushed the fix/router-late-implicit-ack-timeout branch 2 times, most recently from e972293 to 9a6bf44 Compare January 5, 2026 18:36
@thebentern thebentern requested a review from GUVWAF January 5, 2026 18:37
@mdecourcy mdecourcy force-pushed the fix/router-late-implicit-ack-timeout branch from 9a6bf44 to 5ebf52d Compare January 5, 2026 18:52
@thebentern thebentern added the bugfix Pull request that fixes bugs label Jan 5, 2026
@GUVWAF
Copy link
Member

GUVWAF commented Jan 5, 2026

I'm not in favor of this. It will result in very long waiting times before retransmissions for all packets even though ROUTER_LATE is not common.

And more importantly, ROUTER_LATE will first use the normal CLIENT delay window, and only when someone else already rebroadcasts before this, it will defer to the late rebroadcast window. Hence, this situation will only happen when for some reason the original transmitter did not receive the first rebroadcast (while the ROUTER_LATE did).

@mdecourcy
Copy link
Author

mdecourcy commented Jan 5, 2026

I'm not in favor of this. It will result in very long waiting times before retransmissions for all packets even though ROUTER_LATE is not common.

And more importantly, ROUTER_LATE will first use the normal CLIENT delay window, and only when someone else already rebroadcasts before this, it will defer to the late rebroadcast window. Hence, this situation will only happen when for some reason the original transmitter did not receive the first rebroadcast (while the ROUTER_LATE did).

ROUTER_LATE window applies to all CLIENT_BASE since #8567 was merged, and until running this PR on my clients I have been getting a lot of MAX_RETRANSMIT errors.

Looking at the code

https://github.com/meshtastic/firmware/blob/develop/src/mesh/RadioInterface.cpp#L304-L322

shouldRebroadcastEarlyLikeRouter is only utilized by router. getTxDelayMsec is only used for local packets. Unless I'm missing something, the late window is always utilized for CLIENT_BASE, as intended in #8567, which I agree with.

The only issue I see is all of my CLIENT_MUTES hitting that max retransmit as the wait time is too short for the ROUTER_LATE (and now CLIENT_BASE) window.

@nullrouten0
Copy link

nullrouten0 commented Jan 5, 2026

Router Late + Client Base = 20% of our mesh (165 nodes out of 815). 3.8% of the mesh is Router Late (31 nodes)

@GUVWAF
Copy link
Member

GUVWAF commented Jan 6, 2026

Okay, so it's actually for CLIENT_BASE nodes, which indeed now always use the late rebroadcast window.

Then I tend to agree with it, although it's a rather impactful change. It means that for each packet you'll have to wait longer until you get the MAX_RETRANSMISSON error.

@h3lix1
Copy link
Contributor

h3lix1 commented Jan 6, 2026

This MR will work, but I'm concerned it will increase timeouts for all packets (not just ROUTER_LATE) based on getRetransmissionMsec, so everything starts to use the worst-case delay.

The added delay is actually more at 2 * pow_of_2(CWmax) + 2 * CWmax
(note 2 * on RadioInterface.cpp line 252ish?)

Removing airTime->channelUtilizationPercent() does not adapt due to channel load, even for node roles that aren't ROUTER_LATE.. Might be unintentional?

The fix for this gets a little hairy since it requires re-doing retransmission logic.

@h3lix1
Copy link
Contributor

h3lix1 commented Jan 6, 2026

I have clanker written code (mostly because trying to wrap my head around timeout logic is painful) at https://github.com/h3lix1/mesh-firmware-sunl/tree/fix/router-late-implicit-ack-timeout if you want to pull it in and call it your own.

@GUVWAF
Copy link
Member

GUVWAF commented Jan 6, 2026

I think @h3lix1's approach is the less impactful one, but it does not solve the issue of unnecessary retransmissions.

We could probably also do some more intelligent things like detecting whether we have other 0-hop neighbors than a CLIENT_BASE node, and if so, do not use the extended retransmission delay.

@mdecourcy
Copy link
Author

I think @h3lix1's approach is the less impactful one, but it does not solve the issue of unnecessary retransmissions.

We could probably also do some more intelligent things like detecting whether we have other 0-hop neighbors than a CLIENT_BASE node, and if so, do not use the extended retransmission delay.

After running h3lix branch on my wismesh tag, I still see the issue.

I can try my approach, but more targeted as per your suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs first-contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: MAX_RETRANSMIT false positives with router_late broadcast window

5 participants