noderoster: configurable host cache size and eviction duration #1246

benbroadaway · 2025-12-18T20:18:06Z

Main Issue

Noderoster processors can lag and never catch up.

Findings

#1206 helps speed up the query to list events, but processing them is still an issue. Specifically, we end up making a large amount of queries to look existing up hosts IDs in the DB (or create them).

Evicting cache too aggressively

Currently, the host id cache is hard-coded to evict entries 1 minute after last access.

Non-singleton cache

The cache gets created per processor--so, currently that's 3 copies. Which may end up repeating host lookups multiple times for the same event, and makes the cache less effective.

Unbounded cache size

Very unlikely to actually be an issue since the cache evicts so aggressively, but technically it's possible to put an unlimited number of entries in the cache which may lead to OOM in the extremest of extreme cases. This is more important when we get to fixing the other issues to keep memory usage predictable.

Per-lookup (non-singleton) cache loaders

This doesn't really impact lookup performance, but is generally not great hygiene if if it can be avoided. It does have impact in the realm of garbage collection--not critically, but might as well be cleaned up. e.g. create one loader instance for the life of the server rather than millions per hour for normal usage.

Fixes

Make noderoster host id cache @Singleton
Increase default cache eviction duration (1 hour after last access)
Set max size for noderoster host id cache (default 10,000)
Make cache settings configurable in server config
Use singleton cache loader
- Get-or-create logic remains in HostManager, but moves out of the cache loader

Benchmarks

Method

Pre-load DB with unprocessed events.
Start server and wait for all events to be processed.
Collect task run time.

Synthetic data

Unique Hosts: 200,090 (re-use 90 hosts and generate a unique hostname for 10% of the events)
Total Events: 2,000,000

Short story without wasting time with charts: the new implementation was ~71% faster. Just skip to the next section because it's more valid.

Real Data

Modeled after 1 hour of production data.

Unique Hosts: 34,080
Total Events: 792,715

Metric	Current	New
Total Time (ms)	237,586	292,870
Cache Hit Rate	0.953497789	0.957008509
GC Count	131	107
GC Time (ms)	507	315

That's not an improvement, it's a 23% drop!. BUT this is with the default settings and <1ms latency. Tweaking the cache size from 10,000 entries to 50,000 (expect ~15MB of heap usage at full cap) changes the picture.

Metric	Current	New
Total Time (ms)	237,586	229,884
Cache Hit Rate	0.953497789	0.957008509
GC Count	131	107
GC Time (ms)	507	315

Ok, that is an improvement, but not much. Since latency is so low, the current-implementation cache stays busy enough to keep warm for the most part.

Now, lets introduce 10ms of latency which is less than some real-world latency depending on server and db locations..

Metric	Current	New
Total Time (ms)	3,475,388	884,653
Cache Hit Rate	0.776712942	0.957008509
GC Count	161	115
GC Time (ms)	1402	533

Aaaah, now the new fixes get to shine. That's a 74% reduction in processing time.

The real-world execution of the Noderoster event processors is itermittent. A production multi-server Concord instance (say, 5) will see the event processing switch between servers regularly so the likelihood of the host current-implementation cache getting cold is much higher. Coupled that with cross-region latency for re-loading the cache and it just can't keep up with a busy environment (with regard to ANSIBLE process events). The current-implementation cache never held more than ~5,000 entries while running.

…ttings

noderoster: configurable cache size and eviction duration

2991af6

benbroadaway requested review from a team, brig and ibodrov December 18, 2025 20:18

Merge remote-tracking branch 'oss/master' into bb/noderoster-cache-se…

3e8aa98

…ttings

brig approved these changes Dec 22, 2025

View reviewed changes

benbroadaway changed the title ~~noderoster: configurable cache size and eviction duration~~ noderoster: configurable host cache size and eviction duration Dec 22, 2025

benbroadaway merged commit 4923686 into master Dec 22, 2025
4 checks passed

benbroadaway deleted the bb/noderoster-cache-settings branch December 22, 2025 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

noderoster: configurable host cache size and eviction duration #1246

noderoster: configurable host cache size and eviction duration #1246

Uh oh!

benbroadaway commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

noderoster: configurable host cache size and eviction duration #1246

noderoster: configurable host cache size and eviction duration #1246

Uh oh!

Conversation

benbroadaway commented Dec 18, 2025

Main Issue

Findings

Evicting cache too aggressively

Non-singleton cache

Unbounded cache size

Per-lookup (non-singleton) cache loaders

Fixes

Benchmarks

Method

Synthetic data

Real Data

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants