Skip to content

flatbasemodel: remove CacheProxyDb wrapper from _rebuild_filter#2361

Open
dsblank wants to merge 1 commit into
gramps-project:maintenance/gramps61from
dsblank:remove-cacheproxy-in-filter
Open

flatbasemodel: remove CacheProxyDb wrapper from _rebuild_filter#2361
dsblank wants to merge 1 commit into
gramps-project:maintenance/gramps61from
dsblank:remove-cacheproxy-in-filter

Conversation

@dsblank

@dsblank dsblank commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Remove the CacheProxyDb wrapper from _rebuild_filter in FlatBaseModel, passing self.db directly to search.apply() instead. Removes the now-unused import.

Benchmark results (101,518-person database)

Times in seconds. "Without cache" is the raw db baseline; "With cache" is the current CacheProxyDb path.

Standard filters

Filter Count Without cache With cache Overhead
IsMale 47,706 3.9s 7.0s +3.1s
IsFemale 47,589 3.8s 7.0s +3.2s
HasUnknownGender 3,848 3.8s 7.4s +3.6s
HasOtherGender 2,375 3.8s 7.0s +3.2s
Disconnected 2,518 3.9s 6.9s +3.0s
NeverMarried 9,908 4.1s 7.4s +3.3s
MultipleMarriages 0 3.8s 6.9s +3.1s
HasNickname 3,806 3.9s 7.3s +3.4s
HasAlternateName 30,099 3.9s 6.6s +2.7s
MissingParent 7,585 7.2s 8.9s +1.7s
HaveChildren 38,414 6.0s 9.2s +3.2s
NoBirthdate 21,512 5.3s 9.0s +3.7s
NoDeathdate 36,488 5.1s 8.1s +3.0s
HaveAltFamilies 9,900 7.3s 9.1s +1.8s
IncompleteNames 9,959 3.9s 7.2s +3.3s
PeoplePrivate 10,068 3.9s 6.9s +3.0s
PeoplePublic 91,450 3.9s 6.9s +3.0s
FamilyPrivate 4,537 1.1s 1.9s +0.8s
EventPrivate 43,847 5.5s 8.7s +3.2s

Complex and multi-rule compound filters

These are the cases most likely to benefit from caching: rules that do heavy secondary lookups (events, families) and compound AND filters where multiple rules share secondary objects.

Filter Count Without cache With cache Overhead
HasBirth 80,006 6.8s 12.0s +5.2s
HasDeath 65,030 7.8s 12.5s +4.7s
HasRelationship 101,518 6.6s 9.7s +3.1s
HasEvent 98,172 12.3s 12.5s +0.2s
ProbablyAlive 14,358 24.9s 25.8s +0.9s
IsMale + HaveChildren 18,972 5.5s 9.8s +4.3s
HaveChildren + MissingParent 2,007 7.8s 9.6s +1.8s
IsMale + HasBirth 38,192 5.6s 9.5s +3.9s
IsMale + HaveChildren + MissingParent 995 6.3s 9.2s +2.9s
ProbablyAlive + HasBirth 11,689 24.1s 25.7s +1.6s

HasEvent (98% match rate) and ProbablyAlive (recursive ancestor traversal, repeated lookups on shared ancestors) are the best-case scenarios for cache effectiveness. Both are still neutral-to-negative.

Why the cache hurts

CacheProxyDb wraps the database with an LRU cache (size 131,071). _rebuild_filter visits each person handle exactly once in the outer loop, so there are no cache hits on the primary person objects — only the cost of accumulating all 100K+ fully-deserialized Person objects in memory simultaneously. This causes significant GC pressure throughout the loop.

To confirm it is object accumulation (not LRU linked-list overhead), a plain dict-based cache was also benchmarked: it showed the same ~3s overhead as CacheProxyDb, while a no-op passthrough proxy matched the raw-db baseline exactly.

Pros of removal

  • ~75% speedup for simple filters (IsMale, etc.): ~7s → ~4s on 100K people
  • Faster even for cross-lookup rules (MissingParent, HaveChildren, HaveAltFamilies) that do secondary object lookups — cache hits on shared family objects do not compensate for the GC pressure from holding all Person objects live
  • Faster for compound multi-rule filters — even when multiple rules share secondary objects, the cache is still a net loss
  • Lower peak memory during filter application
  • Simpler code

Cons of removal

  • Rules that perform many repeated secondary lookups on shared objects (e.g. a family with many children) will do more SQLite reads. However, benchmarking across all tested rule types — including ProbablyAlive (recursive ancestor traversal) and compound AND filters with shared family lookups — shows this is still net positive in every case.

Test plan

  • Apply filter in People view on a large database and confirm it is faster
  • Verify filtered results are correct for IsMale, MissingParent, HaveChildren

🤖 Generated with Claude Code

CacheProxyDb accumulates all deserialized objects in an LRU during the
filter loop. On a 100K-person database every filter visit each person
exactly once, so there are no cache hits — only the cost of keeping
~100K live Python objects in memory simultaneously, which causes GC
pressure that adds ~3s to every filter application.

Benchmarking with five configurations confirmed the overhead comes
entirely from object accumulation: a plain dict cache (same memory
pressure, no LRU overhead) is equally slow, while a no-op proxy (no
accumulation) matches the raw-db baseline. Even rules that do secondary
cross-object lookups (MissingParent, HaveChildren) are faster without
the cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dsblank dsblank force-pushed the remove-cacheproxy-in-filter branch from c31e93b to a2ee99b Compare June 9, 2026 17:56
@dsblank

dsblank commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

@Nick-Hall you can consider this a bug, or an enhancement:

  1. The bug is that we should not have assumed that a cache would help. It does not.
  2. The enhancement is that removing the cache improves performance

Let me know what branch to target.

@dsblank dsblank added this to the v6.1 milestone Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant