feat: Two-tier cache system with bug fixes and refactoring by wesm · Pull Request #60 · wesm/moneyflow

wesm · 2025-12-18T20:37:23Z

Summary

This PR implements a two-tier cache system for optimized data fetching, along with several bug fixes and refactoring improvements discovered during development and testing.

Two-Tier Cache System

Split transaction cache into hot (recent 90 days, 6h refresh) and cold (historical, 30-day refresh) tiers to reduce unnecessary API calls while maintaining data freshness for recent transactions.

Key changes:

Add RefreshStrategy enum (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL)
Implement tier validation methods (is_hot_cache_valid, is_cold_cache_valid)
Add split save/load logic for hot and cold cache files
Integrate partial refresh in app.py (_partial_refresh method)
MTD optimization: skip cold cache when --mtd or --since within 90 days
Bump cache version to 3.0 (old caches auto-dropped)
Add 30-day overlap in cold cache to handle boundary transactions

Benefits:

Historical data refreshed every 30 days instead of 6h
Partial refresh fetches only expired tier, loads other from cache
--mtd loads only hot cache for faster startup
Graceful fallback to full fetch if partial refresh fails

Cache System Hardening

Fix cache invalidation from display filters (filters no longer affect cache state)
Preserve cold cache when committing edits in filtered view mode
Add cache structure sanity check to detect inconsistencies
Handle empty merchant cache edge case when concatenating
Improve cache refresh status messages with clear date ranges
Add date ranges to transaction fetch progress messages

New Module: Cache Orchestrator

Extract cache orchestration logic into cache_orchestrator.py for better separation of concerns and testability. This module handles the coordination between cache manager, data fetching, and refresh strategies.

Bug Fixes (Non-Cache Related)

fix: Escape regex special characters in merchant search - Merchants with special characters like () or . now search correctly
fix: Prevent delete/details actions in sub-grouped detail view - These actions are now properly disabled when viewing sub-grouped transactions

Refactoring

Extract testable business logic from edit_screens.py into pure functions
Reduce code duplication in cache loading with extracted helper methods
Consolidate and simplify cache tests

Test Coverage

Comprehensive cache system tests in test_cache.py
New test_cache_orchestrator.py with 224 lines of orchestrator tests
New test_edit_screens.py with 130 lines of edit screen logic tests
New test_app_controller.py tests for sub-grouped view behavior

Test Plan

All 750+ tests pass
Type checking clean (pyright moneyflow/)
Code formatting and linting pass
Manual testing with real Monarch Money account
Verified cache refresh behavior with --mtd and --refresh flags
Verified merchant search with special characters

Files Changed

moneyflow/cache_manager.py - Core two-tier cache implementation
moneyflow/cache_orchestrator.py - New orchestration module
moneyflow/app.py - Integration and UI updates
moneyflow/app_controller.py - Sub-grouped view fixes
moneyflow/screens/edit_screens.py - Extracted business logic
moneyflow/data_manager.py - Minor cache integration updates
tests/test_cache.py - Consolidated cache tests
tests/test_cache_orchestrator.py - New orchestrator tests
tests/test_edit_screens.py - New edit screen tests
tests/test_app_controller.py - New controller tests

🤖 Generated with Claude Code

Split transaction cache into hot (recent 90 days, 6h refresh) and cold (historical, 30-day refresh) tiers to reduce unnecessary API calls while maintaining data freshness for recent transactions. Key changes: - Add RefreshStrategy enum (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL) - Implement tier validation methods (is_hot_cache_valid, is_cold_cache_valid) - Add split save/load logic for hot and cold cache files - Integrate partial refresh in app.py (_partial_refresh method) - MTD optimization: skip cold cache when --mtd or --since within 90 days - Bump cache version to 3.0 (old caches auto-dropped) - Add 50 comprehensive tests in test_tiered_cache.py Benefits: - Historical data refreshed every 30 days instead of 6h - Partial refresh fetches only expired tier, loads other from cache - --mtd loads only hot cache for faster startup - Graceful fallback to full fetch if partial refresh fails 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add _load_merchant_cache() for merchant cache loading with error handling - Add _merge_hot_cold_dfs() for deduplication merge logic - Simplify _check_and_load_cache() using helper methods - Unify HOT_ONLY/COLD_ONLY branches in _partial_refresh() Reduces app.py by ~85 lines while maintaining identical functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR implements a sophisticated two-tier cache system that optimizes data fetching by splitting transactions into hot (recent 90 days) and cold (historical) tiers with different refresh intervals. The hot cache refreshes every 6 hours while the cold cache refreshes every 30 days, reducing unnecessary API calls for historical data.

Key changes:

Introduced RefreshStrategy enum with five strategies (NONE, HOT_ONLY, COLD_ONLY, BOTH, ALL) for intelligent cache refresh decisions
Implemented separate hot/cold cache files with split save/load logic and merge functionality with deduplication
Added MTD optimization that skips cold cache loading when queries are within the 90-day hot window

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/test_tiered_cache.py	Comprehensive test suite with 50 tests covering tier splitting, validation, refresh strategies, merge logic, and data integrity
tests/test_cache_manager.py	Updated existing tests for backwards compatibility with two-tier cache structure
tests/test_cache.py	Updated integration tests to work with new hot/cold file paths and metadata structure
moneyflow/cache_manager.py	Core implementation of two-tier cache with RefreshStrategy enum, tier validation methods, split save/load operations, and merge logic
moneyflow/app.py	Integration of partial refresh logic, hot-only optimization for MTD queries, and strategy-based cache loading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_tiered_cache.py

moneyflow/cache_manager.py

moneyflow/app.py

moneyflow/cache_manager.py

tests/test_tiered_cache.py

moneyflow/cache_manager.py

Address Copilot review comments on PR #60: - Fix module docstring: "24 hours" → "6 hours" to match HOT_MAX_AGE_HOURS - Remove unused List import from typing - Rename test methods to reflect actual 6-hour expiry policy - Update test values from 25 hours to 7 hours for more precise testing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Three critical fixes for the two-tier cache system: 1. Separate display filters from cache behavior - --mtd and --year now only filter the VIEW, not what's cached - Cache always stores full data (year=None, since=None) - Prevents --mtd from nuking an existing full cache 2. Fix partial refresh API calls - Monarch API requires BOTH startDate and endDate - Was passing None for one date, causing API failure - Added get_hot_refresh_date_range() and get_cold_refresh_date_range() 3. Prevent gaps between cache tiers - Hot refresh now uses cold's latest_date (from metadata) - Cold refresh now uses hot's earliest_date (from metadata) - Both use 7-day overlap (TIER_OVERLAP_DAYS) - Fixes gap that would grow daily as boundary moves Added 12 regression tests covering: - Display filters don't invalidate cache - Partial refresh date ranges always non-None - Tier overlap ensures no gaps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When running with --mtd or --year, the app loads only filtered (recent) transactions. Previously, committing edits would call save_cache() with this filtered data, causing the cold cache to be overwritten with empty data since all transactions were within the hot window. Now handle_commit_result() detects filtered view mode and uses save_hot_cache() instead, which preserves the cold tier data. - Add is_filtered_view parameter to handle_commit_result() - Use save_hot_cache() when operating on filtered data - Add logging to cache save operations for debugging - Add 3 regression tests to prevent this bug from recurring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove RefreshStrategy.BOTH (use ALL when both tiers stale) - Remove dead _filter_covered() method and year/since parameters - Simplify get_refresh_strategy() and is_cache_valid() signatures - Rename _merge_dataframes to public merge_tiers() - Add logging to cache save operations Test consolidation: - Merge test_tiered_cache.py and test_cache_manager.py into test_cache.py - Remove ~1700 lines of duplicate test code - Add edge case tests (unicode, large data, corrupt files) - Add display filtering tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Show explicit date ranges in status messages during refresh - Partial refresh: "Refreshing recent transactions (2024-09-23 to 2024-12-22)" - Full refresh: "Full refresh: fetching 2025-01-01 to 2024-12-22" - Clearer notifications showing what was fetched vs cached - Remove unused boundary_str variable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Progress messages now show the date range being fetched, making it clear whether it's a full refresh or a partial cache update. Messages like "Fetching all transactions..." or "Downloading 1,069 transactions (2024-09-15 to 2024-12-22)..." provide better visibility into what the app is doing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Two fixes for cache continuity: 1. Cold cache now includes 30 days of overlap into the hot window. This prevents gaps when cold expires: after 30 days the boundary moves forward, but cold data still reaches the new boundary. 2. --mtd --refresh (or similar hot-only views) now only refreshes the hot tier, not all historical data. The override logic is kept in app.py where view context exists, not in the cache layer. Added test_cold_cache_has_30_day_overlap to verify the overlap guarantee. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…hants When the merchant cache is empty, pl.Series creates a Series with dtype null, which can't be concatenated with str Series. Fixed by explicitly setting dtype=pl.Utf8 when creating the cached_series. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds _is_cache_structure_valid() that runs before using cached data. If any check fails, forces a full refresh to prevent serving stale or inconsistent data after code changes to cache logic. Checks performed: 1. Required metadata fields exist for both tiers 2. Cold cache extends to boundary (within 7-day tolerance) 3. No gap between cold latest_date and hot earliest_date This is a defensive measure that will auto-heal cache issues from future changes to the cache handling logic. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The merchant search was using str.contains() which interprets the input as regex by default. Characters like * ? ( ) would cause regex parse errors. Fixed by adding literal=True to treat the search pattern as a plain string. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Extracted two pure functions from EditMerchantScreen: - filter_merchants(): Filters merchant Series by query with proper regex escaping (literal=True) - parse_merchant_option_id(): Parses __new__: prefix to distinguish new vs existing merchants Added comprehensive unit tests (13 tests) including: - Case-insensitive matching - Partial string matching - Deduplication and sorting - Limit parameter - Regex special character handling (* ? ( ) + [ ]) - Option ID parsing for new vs existing merchants The regex test would have caught the bug fixed in the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When in detail view with sub-grouping enabled, the current_data contains aggregate fields (merchant, count, total) instead of transaction fields (id). This caused a KeyError when trying to access row_data["id"]. Added sub_grouping_mode check to both action_delete_transaction and action_show_transaction_details guards, matching the pattern already used in the hide/unhide toggle code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm · 2025-12-23T17:43:58Z

This PR has been quite a lot of bug whackamole. I'm going to use this branch locally for a while until I stop seeing serious bugs before merging

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

moneyflow/screens/edit_screens.py

moneyflow/data_manager.py

moneyflow/cache_manager.py

- Move TIER_OVERLAP_DAYS to class constants section - Add GAP_TOLERANCE_DAYS constant (was hardcoded as 7) - Add comments to empty except clauses explaining fallback behavior Addresses code review feedback from Copilot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm requested a review from Copilot December 18, 2025 20:37

Copilot started reviewing on behalf of wesm December 18, 2025 20:37 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

wesm and others added 15 commits December 18, 2025 16:23

Codex refactoring to reduce code duplication

a8b0e86

refactor: extract cache orchestration and harden tests

b7c017d

Fix formatting / lint issues

1ab5338

wesm changed the title ~~feat: Implement two-tier cache system for optimized data fetching~~ feat: Two-tier cache system with bug fixes and refactoring Dec 23, 2025

wesm requested a review from Copilot December 23, 2025 17:42

Copilot started reviewing on behalf of wesm December 23, 2025 17:42 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

wesm merged commit 3080944 into main Dec 25, 2025
8 checks passed

wesm deleted the cache-improvements branch December 25, 2025 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Two-tier cache system with bug fixes and refactoring#60

feat: Two-tier cache system with bug fixes and refactoring#60
wesm merged 18 commits intomainfrom
cache-improvements

wesm commented Dec 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wesm commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wesm commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Two-Tier Cache System

Cache System Hardening

New Module: Cache Orchestrator

Bug Fixes (Non-Cache Related)

Refactoring

Test Coverage

Test Plan

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wesm commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wesm commented Dec 18, 2025 •

edited

Loading