flatbasemodel: remove CacheProxyDb wrapper from _rebuild_filter#2361
Open
dsblank wants to merge 1 commit into
Open
flatbasemodel: remove CacheProxyDb wrapper from _rebuild_filter#2361dsblank wants to merge 1 commit into
dsblank wants to merge 1 commit into
Conversation
CacheProxyDb accumulates all deserialized objects in an LRU during the filter loop. On a 100K-person database every filter visit each person exactly once, so there are no cache hits — only the cost of keeping ~100K live Python objects in memory simultaneously, which causes GC pressure that adds ~3s to every filter application. Benchmarking with five configurations confirmed the overhead comes entirely from object accumulation: a plain dict cache (same memory pressure, no LRU overhead) is equally slow, while a no-op proxy (no accumulation) matches the raw-db baseline. Even rules that do secondary cross-object lookups (MissingParent, HaveChildren) are faster without the cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c31e93b to
a2ee99b
Compare
Member
Author
|
@Nick-Hall you can consider this a bug, or an enhancement:
Let me know what branch to target. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Remove the
CacheProxyDbwrapper from_rebuild_filterinFlatBaseModel, passingself.dbdirectly tosearch.apply()instead. Removes the now-unused import.Benchmark results (101,518-person database)
Times in seconds. "Without cache" is the raw db baseline; "With cache" is the current
CacheProxyDbpath.Standard filters
Complex and multi-rule compound filters
These are the cases most likely to benefit from caching: rules that do heavy secondary lookups (events, families) and compound AND filters where multiple rules share secondary objects.
HasEvent(98% match rate) andProbablyAlive(recursive ancestor traversal, repeated lookups on shared ancestors) are the best-case scenarios for cache effectiveness. Both are still neutral-to-negative.Why the cache hurts
CacheProxyDbwraps the database with an LRU cache (size 131,071)._rebuild_filtervisits each person handle exactly once in the outer loop, so there are no cache hits on the primary person objects — only the cost of accumulating all 100K+ fully-deserialized Person objects in memory simultaneously. This causes significant GC pressure throughout the loop.To confirm it is object accumulation (not LRU linked-list overhead), a plain
dict-based cache was also benchmarked: it showed the same ~3s overhead asCacheProxyDb, while a no-op passthrough proxy matched the raw-db baseline exactly.Pros of removal
Cons of removal
ProbablyAlive(recursive ancestor traversal) and compound AND filters with shared family lookups — shows this is still net positive in every case.Test plan
🤖 Generated with Claude Code