Fix: Automatic detection and refresh of corrupted cache data #136

Mohd-Mursaleen · 2025-10-09T18:15:41Z

Description

Fixes #135

This PR fixes a critical issue where corrupted cache files with null resume data would prevent the system from working correctly, even after configuration issues were resolved.

Problem

In development mode, the system caches resume and GitHub data to improve performance
If the initial extraction failed due to configuration issues, corrupted cache files with all null values would persist
Users would see null resume data even after fixing their environment/configuration
The only workaround was manually deleting cache files

Solution

Added automatic cache validation that:

Detects corrupted resume cache - Checks if main sections (basics, work, education, skills, projects) contain meaningful data
Detects corrupted GitHub cache - Validates presence of profile or project data
Automatically removes invalid cache - Deletes corrupted cache files and reprocesses fresh data
Provides clear feedback - Shows users when cache is being refreshed due to corruption

Changes Made

Added is_valid_resume_cache() function:
- Validates that resume data contains meaningful information
- Checks for non-null data in critical sections
- Returns False if all main sections are empty/null
Added is_valid_github_cache() function:
- Validates GitHub data has profile or project information
- Returns False for empty or malformed data
Enhanced cache loading logic:
- Validates cache data before using it
- Automatically removes corrupted cache files
- Falls back to fresh processing when cache is invalid
- Provides informative messages about cache status

Code Example

Before (Problematic):

if DEVELOPMENT_MODE and os.path.exists(cache_filename):
    cached_data = json.loads(Path(cache_filename).read_text())
    resume_data = JSONResume(**cached_data)  # Always loads, even if null

After (Fixed):

if DEVELOPMENT_MODE and os.path.exists(cache_filename):
    cached_data = json.loads(Path(cache_filename).read_text())
    temp_resume_data = JSONResume(**cached_data)
    
    if is_valid_resume_cache(temp_resume_data):
        resume_data = temp_resume_data
        print("✅ Valid cache data loaded")
    else:
        print("⚠️  Cache contains corrupted data, will refresh automatically")
        os.remove(cache_filename)  # Remove and reprocess

Testing

✅ Test Case 1: Normal operation with valid cache

Valid cache files are loaded and used correctly
No unnecessary reprocessing occurs

✅ Test Case 2: Corrupted resume cache detection

Created cache file with all null values
System detected corruption and automatically refreshed
Fresh data was extracted and cached

✅ Test Case 3: Corrupted GitHub cache detection

Created empty GitHub cache file
System detected corruption and refetched data
Valid GitHub data was retrieved and cached

✅ Test Case 4: JSON parsing errors

Malformed JSON cache files are handled gracefully
Invalid files are removed and fresh data is processed

User Experience Improvements

Before:

Loading cached data from cache/resumecache_resume.json
❌ Resume shows all null data - user confused

After:

Loading cached data from cache/resumecache_resume.json
⚠️  Cache contains corrupted data, will refresh automatically
✅ Fresh resume data extracted and cached

Impact

No breaking changes - Existing functionality remains unchanged
Automatic recovery - Users no longer need manual intervention
Better debugging - Clear messages indicate when cache is being refreshed
Improved reliability - System self-heals from cache corruption
Performance maintained - Valid cache is still used to avoid reprocessing

Code Formatting

Applied Black formatting to ensure consistent code style per contributing guidelines

Smoke Tests Performed

✅ Test with corrupted cache: Created cache with all null values, system detected and refreshed automatically
✅ Test with valid cache: Existing valid cache files are loaded correctly without reprocessing
✅ Test with malformed JSON: Invalid cache files are handled gracefully and removed
✅ Test end-to-end flow: Full resume processing works correctly after cache refresh

Files Modified

score.py - Added cache validation functions and enhanced cache loading logic

This fix ensures users always get valid resume data regardless of cache state, while maintaining the performance benefits of caching for valid data.

Fixes #135

Mohd-Mursaleen · 2025-10-10T04:51:34Z

@sp2hari @anxkhn-hacker
Review and merge please

sp2hari · 2025-10-10T05:24:15Z

@Mohd-Mursaleen - Can you explain the scenarios by which the cache is getting corrupted in the first place?

Mohd-Mursaleen · 2025-10-10T07:07:30Z

@Mohd-Mursaleen - Can you explain the scenarios by which the cache is getting corrupted in the first place?

@sp2hari
Corrupted cache usually happens when there’s an API key error. For me, I entered my Gemini key but forgot to change the model name, so the code ran and created a null cache. Even after fixing the model name, things didn’t work, took me few minutes to figure it was due to the existing null cache.

Fix: Automatic detection and refresh of corrupted cache data

a435cd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Automatic detection and refresh of corrupted cache data #136

Fix: Automatic detection and refresh of corrupted cache data #136

Uh oh!

Mohd-Mursaleen commented Oct 9, 2025

Uh oh!

Mohd-Mursaleen commented Oct 10, 2025 •

edited

Loading

Uh oh!

sp2hari commented Oct 10, 2025

Uh oh!

Mohd-Mursaleen commented Oct 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Fix: Automatic detection and refresh of corrupted cache data #136

Are you sure you want to change the base?

Fix: Automatic detection and refresh of corrupted cache data #136

Uh oh!

Conversation

Mohd-Mursaleen commented Oct 9, 2025

Description

Problem

Solution

Changes Made

Code Example

Testing

User Experience Improvements

Impact

Code Formatting

Smoke Tests Performed

Files Modified

Uh oh!

Mohd-Mursaleen commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sp2hari commented Oct 10, 2025

Uh oh!

Mohd-Mursaleen commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Mohd-Mursaleen commented Oct 10, 2025 •

edited

Loading

Mohd-Mursaleen commented Oct 10, 2025 •

edited

Loading