Skip to content

feat: Two-tier cache system with bug fixes and refactoring#60

Merged
wesm merged 18 commits intomainfrom
cache-improvements
Dec 25, 2025
Merged

feat: Two-tier cache system with bug fixes and refactoring#60
wesm merged 18 commits intomainfrom
cache-improvements

Conversation

@wesm
Copy link
Owner

@wesm wesm commented Dec 18, 2025

Summary

This PR implements a two-tier cache system for optimized data fetching, along with several bug fixes and refactoring improvements discovered during development and testing.

Two-Tier Cache System

Split transaction cache into hot (recent 90 days, 6h refresh) and cold (historical, 30-day refresh) tiers to reduce unnecessary API calls while maintaining data freshness for recent transactions.

Key changes:

  • Add RefreshStrategy enum (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL)
  • Implement tier validation methods (is_hot_cache_valid, is_cold_cache_valid)
  • Add split save/load logic for hot and cold cache files
  • Integrate partial refresh in app.py (_partial_refresh method)
  • MTD optimization: skip cold cache when --mtd or --since within 90 days
  • Bump cache version to 3.0 (old caches auto-dropped)
  • Add 30-day overlap in cold cache to handle boundary transactions

Benefits:

  • Historical data refreshed every 30 days instead of 6h
  • Partial refresh fetches only expired tier, loads other from cache
  • --mtd loads only hot cache for faster startup
  • Graceful fallback to full fetch if partial refresh fails

Cache System Hardening

  • Fix cache invalidation from display filters (filters no longer affect cache state)
  • Preserve cold cache when committing edits in filtered view mode
  • Add cache structure sanity check to detect inconsistencies
  • Handle empty merchant cache edge case when concatenating
  • Improve cache refresh status messages with clear date ranges
  • Add date ranges to transaction fetch progress messages

New Module: Cache Orchestrator

Extract cache orchestration logic into cache_orchestrator.py for better separation of concerns and testability. This module handles the coordination between cache manager, data fetching, and refresh strategies.

Bug Fixes (Non-Cache Related)

  • fix: Escape regex special characters in merchant search - Merchants with special characters like () or . now search correctly
  • fix: Prevent delete/details actions in sub-grouped detail view - These actions are now properly disabled when viewing sub-grouped transactions

Refactoring

  • Extract testable business logic from edit_screens.py into pure functions
  • Reduce code duplication in cache loading with extracted helper methods
  • Consolidate and simplify cache tests

Test Coverage

  • Comprehensive cache system tests in test_cache.py
  • New test_cache_orchestrator.py with 224 lines of orchestrator tests
  • New test_edit_screens.py with 130 lines of edit screen logic tests
  • New test_app_controller.py tests for sub-grouped view behavior

Test Plan

  • All 750+ tests pass
  • Type checking clean (pyright moneyflow/)
  • Code formatting and linting pass
  • Manual testing with real Monarch Money account
  • Verified cache refresh behavior with --mtd and --refresh flags
  • Verified merchant search with special characters

Files Changed

  • moneyflow/cache_manager.py - Core two-tier cache implementation
  • moneyflow/cache_orchestrator.py - New orchestration module
  • moneyflow/app.py - Integration and UI updates
  • moneyflow/app_controller.py - Sub-grouped view fixes
  • moneyflow/screens/edit_screens.py - Extracted business logic
  • moneyflow/data_manager.py - Minor cache integration updates
  • tests/test_cache.py - Consolidated cache tests
  • tests/test_cache_orchestrator.py - New orchestrator tests
  • tests/test_edit_screens.py - New edit screen tests
  • tests/test_app_controller.py - New controller tests

🤖 Generated with Claude Code

Split transaction cache into hot (recent 90 days, 6h refresh) and cold
(historical, 30-day refresh) tiers to reduce unnecessary API calls while
maintaining data freshness for recent transactions.

Key changes:
- Add RefreshStrategy enum (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL)
- Implement tier validation methods (is_hot_cache_valid, is_cold_cache_valid)
- Add split save/load logic for hot and cold cache files
- Integrate partial refresh in app.py (_partial_refresh method)
- MTD optimization: skip cold cache when --mtd or --since within 90 days
- Bump cache version to 3.0 (old caches auto-dropped)
- Add 50 comprehensive tests in test_tiered_cache.py

Benefits:
- Historical data refreshed every 30 days instead of 6h
- Partial refresh fetches only expired tier, loads other from cache
- --mtd loads only hot cache for faster startup
- Graceful fallback to full fetch if partial refresh fails

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add _load_merchant_cache() for merchant cache loading with error handling
- Add _merge_hot_cold_dfs() for deduplication merge logic
- Simplify _check_and_load_cache() using helper methods
- Unify HOT_ONLY/COLD_ONLY branches in _partial_refresh()

Reduces app.py by ~85 lines while maintaining identical functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a sophisticated two-tier cache system that optimizes data fetching by splitting transactions into hot (recent 90 days) and cold (historical) tiers with different refresh intervals. The hot cache refreshes every 6 hours while the cold cache refreshes every 30 days, reducing unnecessary API calls for historical data.

Key changes:

  • Introduced RefreshStrategy enum with five strategies (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL) for intelligent cache refresh decisions
  • Implemented separate hot/cold cache files with split save/load logic and merge functionality with deduplication
  • Added MTD optimization that skips cold cache loading when queries are within the 90-day hot window

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_tiered_cache.py Comprehensive test suite with 50 tests covering tier splitting, validation, refresh strategies, merge logic, and data integrity
tests/test_cache_manager.py Updated existing tests for backwards compatibility with two-tier cache structure
tests/test_cache.py Updated integration tests to work with new hot/cold file paths and metadata structure
moneyflow/cache_manager.py Core implementation of two-tier cache with RefreshStrategy enum, tier validation methods, split save/load operations, and merge logic
moneyflow/app.py Integration of partial refresh logic, hot-only optimization for MTD queries, and strategy-based cache loading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wesm and others added 15 commits December 18, 2025 16:23
Address Copilot review comments on PR #60:
- Fix module docstring: "24 hours" → "6 hours" to match HOT_MAX_AGE_HOURS
- Remove unused List import from typing
- Rename test methods to reflect actual 6-hour expiry policy
- Update test values from 25 hours to 7 hours for more precise testing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three critical fixes for the two-tier cache system:

1. Separate display filters from cache behavior
   - --mtd and --year now only filter the VIEW, not what's cached
   - Cache always stores full data (year=None, since=None)
   - Prevents --mtd from nuking an existing full cache

2. Fix partial refresh API calls
   - Monarch API requires BOTH startDate and endDate
   - Was passing None for one date, causing API failure
   - Added get_hot_refresh_date_range() and get_cold_refresh_date_range()

3. Prevent gaps between cache tiers
   - Hot refresh now uses cold's latest_date (from metadata)
   - Cold refresh now uses hot's earliest_date (from metadata)
   - Both use 7-day overlap (TIER_OVERLAP_DAYS)
   - Fixes gap that would grow daily as boundary moves

Added 12 regression tests covering:
- Display filters don't invalidate cache
- Partial refresh date ranges always non-None
- Tier overlap ensures no gaps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running with --mtd or --year, the app loads only filtered (recent)
transactions. Previously, committing edits would call save_cache() with
this filtered data, causing the cold cache to be overwritten with empty
data since all transactions were within the hot window.

Now handle_commit_result() detects filtered view mode and uses
save_hot_cache() instead, which preserves the cold tier data.

- Add is_filtered_view parameter to handle_commit_result()
- Use save_hot_cache() when operating on filtered data
- Add logging to cache save operations for debugging
- Add 3 regression tests to prevent this bug from recurring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove RefreshStrategy.BOTH (use ALL when both tiers stale)
- Remove dead _filter_covered() method and year/since parameters
- Simplify get_refresh_strategy() and is_cache_valid() signatures
- Rename _merge_dataframes to public merge_tiers()
- Add logging to cache save operations

Test consolidation:
- Merge test_tiered_cache.py and test_cache_manager.py into test_cache.py
- Remove ~1700 lines of duplicate test code
- Add edge case tests (unicode, large data, corrupt files)
- Add display filtering tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show explicit date ranges in status messages during refresh
- Partial refresh: "Refreshing recent transactions (2024-09-23 to 2024-12-22)"
- Full refresh: "Full refresh: fetching 2025-01-01 to 2024-12-22"
- Clearer notifications showing what was fetched vs cached
- Remove unused boundary_str variable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Progress messages now show the date range being fetched, making it clear
whether it's a full refresh or a partial cache update. Messages like
"Fetching all transactions..." or "Downloading 1,069 transactions (2024-09-15 to 2024-12-22)..."
provide better visibility into what the app is doing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two fixes for cache continuity:

1. Cold cache now includes 30 days of overlap into the hot window.
   This prevents gaps when cold expires: after 30 days the boundary
   moves forward, but cold data still reaches the new boundary.

2. --mtd --refresh (or similar hot-only views) now only refreshes
   the hot tier, not all historical data. The override logic is kept
   in app.py where view context exists, not in the cache layer.

Added test_cold_cache_has_30_day_overlap to verify the overlap guarantee.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hants

When the merchant cache is empty, pl.Series creates a Series with
dtype null, which can't be concatenated with str Series. Fixed by
explicitly setting dtype=pl.Utf8 when creating the cached_series.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds _is_cache_structure_valid() that runs before using cached data.
If any check fails, forces a full refresh to prevent serving stale
or inconsistent data after code changes to cache logic.

Checks performed:
1. Required metadata fields exist for both tiers
2. Cold cache extends to boundary (within 7-day tolerance)
3. No gap between cold latest_date and hot earliest_date

This is a defensive measure that will auto-heal cache issues from
future changes to the cache handling logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The merchant search was using str.contains() which interprets the
input as regex by default. Characters like * ? ( ) would cause
regex parse errors. Fixed by adding literal=True to treat the
search pattern as a plain string.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extracted two pure functions from EditMerchantScreen:
- filter_merchants(): Filters merchant Series by query with proper
  regex escaping (literal=True)
- parse_merchant_option_id(): Parses __new__: prefix to distinguish
  new vs existing merchants

Added comprehensive unit tests (13 tests) including:
- Case-insensitive matching
- Partial string matching
- Deduplication and sorting
- Limit parameter
- Regex special character handling (* ? ( ) + [ ])
- Option ID parsing for new vs existing merchants

The regex test would have caught the bug fixed in the previous commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When in detail view with sub-grouping enabled, the current_data contains
aggregate fields (merchant, count, total) instead of transaction fields
(id). This caused a KeyError when trying to access row_data["id"].

Added sub_grouping_mode check to both action_delete_transaction and
action_show_transaction_details guards, matching the pattern already
used in the hide/unhide toggle code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm changed the title feat: Implement two-tier cache system for optimized data fetching feat: Two-tier cache system with bug fixes and refactoring Dec 23, 2025
@wesm wesm requested a review from Copilot December 23, 2025 17:42
@wesm
Copy link
Owner Author

wesm commented Dec 23, 2025

This PR has been quite a lot of bug whackamole. I'm going to use this branch locally for a while until I stop seeing serious bugs before merging

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Move TIER_OVERLAP_DAYS to class constants section
- Add GAP_TOLERANCE_DAYS constant (was hardcoded as 7)
- Add comments to empty except clauses explaining fallback behavior

Addresses code review feedback from Copilot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm merged commit 3080944 into main Dec 25, 2025
8 checks passed
@wesm wesm deleted the cache-improvements branch December 25, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants