Skip to content

Fix countBefore regression: replace sentinel with getOutboundTotal()#1877

Open
DanielNovak wants to merge 3 commits intomeshcore-dev:devfrom
DanielNovak:fix-countbefore-sentinel-regression
Open

Fix countBefore regression: replace sentinel with getOutboundTotal()#1877
DanielNovak wants to merge 3 commits intomeshcore-dev:devfrom
DanielNovak:fix-countbefore-sentinel-regression

Conversation

@DanielNovak
Copy link

@DanielNovak DanielNovak commented Feb 28, 2026

Summary

PR #1795 introduced a millis wraparound fix in PacketQueue::countBefore(), changing the comparison from unsigned to signed 2's complement arithmetic:

if ((int32_t)(_schedule_table[j] - now) > 0) continue;

This correctly handles millis wraparound (~49.7 day rollover), but broke the 0xFFFFFFFF sentinel pattern used by callers to mean "count all packets regardless of schedule". With the signed comparison, countBefore(0xFFFFFFFF) always returns 0.

As noted in #1795 and #1687, this causes:

  • hasPendingWork() to always return false — repeaters sleep with packets still queued
  • Stats reporting to always show queue_len = 0

Fix

This PR adds a dedicated getOutboundTotal() method to the PacketManager interface that returns the total queue size via PacketQueue::count(), without any time filtering. This makes the two operations explicit in the API:

  • getOutboundCount(now) — count packets scheduled at or before time now
  • getOutboundTotal() — count all queued packets

All 5 call sites that previously passed 0xFFFFFFFF now use the new method:

  • examples/simple_repeater/MyMesh.cpphasPendingWork() + repeater stats
  • examples/simple_room_server/MyMesh.cpp — server status
  • examples/companion_radio/MyMesh.cpp — companion stats
  • src/helpers/StatsFormatHelper.h — core stats JSON

Additionally, countBefore(0xFFFFFFFF) itself is fixed to return _num, so the function is correct for all inputs even if called directly with the max value.

Impact

  • Performance: None — getOutboundTotal() returns _num directly, avoiding the loop
  • Backwards compatibility: Adds a new pure virtual method to PacketManager. Out-of-tree implementations will get a compile error guiding them to add it. StaticPoolPacketManager is the only implementation in-tree.

@DanielNovak
Copy link
Author

Sorry for breaking this, I should have checked the callers of that function for any special constants :-/.

PR meshcore-dev#1795 changed PacketQueue::countBefore() to use signed 2's complement
arithmetic for millis wraparound safety. However, this broke the
0xFFFFFFFF sentinel pattern used by callers to mean "count all packets
regardless of schedule".

With the signed comparison, countBefore(0xFFFFFFFF) always returns 0,
causing hasPendingWork() to report false and repeaters to sleep with
packets still queued. Stats reporting also shows queue_len as 0.

Add an early-return for the sentinel value before the loop, and document
the sentinel convention on the virtual interface and implementation.
@fschrempf
Copy link
Contributor

Instead of continuing to use the error prone magic number 0xFFFFFFFF, I suggest to introduce:

int PacketQueue::count() const {
  return _num;
}

And then replacing countBefore(0xFFFFFFFF) with count() everywhere. This is much more readable and less likely to cause problems in the future.

@DanielNovak
Copy link
Author

Definitely better. I just didn't want to make the surface of the change bigger - but I guess it makes sense to do it now because someone else will step on this again :). I will push the commit in few minutes.

Instead of overloading getOutboundCount() with a magic sentinel value,
add a dedicated getOutboundTotal() method to the PacketManager interface
that returns the total queue size without time filtering.

This eliminates the fragile convention that caused the regression and
makes the two operations — time-filtered count vs total count —
explicitly separate in the API.
@DanielNovak DanielNovak changed the title Fix countBefore sentinel regression from millis wraparound fix Fix countBefore regression: replace sentinel with getOutboundTotal() Feb 28, 2026
@fschrempf
Copy link
Contributor

Even better: count() already exists.

@DanielNovak Thanks! Looks good to me now!

@IoTThinks
Copy link
Contributor

IoTThinks commented Mar 1, 2026

I agree we should use the new getOutboundTotal() as it is easier to understand.

However, countBefore(0xFFFFFFFF) should return the same value as getOutboundTotal();
Using another method does not fix the bug in the method countBefore itself.

I suggest to use getOutboundTotal() and to fix the countBefore(0xFFFFFFFF) to return _num too.

int PacketQueue::countBefore(uint32_t now) const {
  if (now == 0xFFFFFFFF) return _num;  // sentinel: count all entries regardless of schedule

  int n = 0;
  for (int j = 0; j < _num; j++) {
    if ((int32_t)(_schedule_table[j] - now) > 0) continue;   // scheduled for future... ignore for now
    n++;
  }
  return n;
}

Have a nice day.

The signed comparison in countBefore breaks for the max uint32_t value.
Even though callers now use getOutboundTotal(), the function itself
should be correct for all inputs.
@DanielNovak DanielNovak force-pushed the fix-countbefore-sentinel-regression branch from 4ee5752 to 0d87dcc Compare March 1, 2026 06:41
@DanielNovak
Copy link
Author

@IoTThinks I added the handling of 0xFFFFFFFF to the original countBefore.

Btw. these functions are great candidates for unit tests - this would prevent these regression as we would have tests documenting and testing some of the specific behaviour.

@IoTThinks
Copy link
Contributor

Thanks a lot for quick fix.

I hope this PR to get merged soon.

Have a nice weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants