Skip to content

Conversation

@iblancasa
Copy link
Contributor

Description

Linear scanning to match files by fingerprint became a bottleneck at high file counts. The reason is each poll iterated through all readers for every match operation.

I implemented some changes that saw could help to reduce the CPU usage like:

  • Common fingerprint sizes (~1000 bytes) get bucket maps preallocated with capacity 64.
  • Replaced reflect.ValueOf() comparison dispatch (allocating ~48 bytes per call) with simple CompareMode enum.
  • Fingerprint bytes converted to strings using unsafe.String without copying. Results cached per fingerprint.
  • Files indexed in buckets by fingerprint length and prefix. Match operations now do two map lookups instead of thousands of comparisons.

Maybe there are more improvements we can do. Or maybe this PR brings some ideas about some other enhancements.

Link to tracking issue

Fixes #27404

Testing

  • Added some testing and benchmarks

go test ./pkg/stanza/fileconsumer-bench BenchmarkPollManyFiles -benchmem

Files watched Baseline ns/op Optimized ns/op CPU improvement
100 2,124,092 1,941,299 +8.6 %
500 13,721,717 8,458,379 +38.4 %
1,000 38,285,195 18,616,022 +51.4 %
2,000 116,922,611 38,755,824 +66.9 %
2,500 170,480,583 49,986,882 +70.7 %
3,000 226,774,400 60,671,268 +73.2 %

@atoulme
Copy link
Contributor

atoulme commented Jan 1, 2026

Please address the CI and mark ready to review.

@atoulme atoulme marked this pull request as draft January 1, 2026 23:04
…provement at high file counts

Signed-off-by: Israel Blancas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[receiver/filelog] CPU consumption increases (roughly) linearly with number of files watched

3 participants