Skip to content

Conversation

@Mohd-Mursaleen
Copy link

Description

Fixes #135

This PR fixes a critical issue where corrupted cache files with null resume data would prevent the system from working correctly, even after configuration issues were resolved.

Problem

  • In development mode, the system caches resume and GitHub data to improve performance
  • If the initial extraction failed due to configuration issues, corrupted cache files with all null values would persist
  • Users would see null resume data even after fixing their environment/configuration
  • The only workaround was manually deleting cache files

Solution

Added automatic cache validation that:

  1. Detects corrupted resume cache - Checks if main sections (basics, work, education, skills, projects) contain meaningful data
  2. Detects corrupted GitHub cache - Validates presence of profile or project data
  3. Automatically removes invalid cache - Deletes corrupted cache files and reprocesses fresh data
  4. Provides clear feedback - Shows users when cache is being refreshed due to corruption

Changes Made

  1. Added is_valid_resume_cache() function:

    • Validates that resume data contains meaningful information
    • Checks for non-null data in critical sections
    • Returns False if all main sections are empty/null
  2. Added is_valid_github_cache() function:

    • Validates GitHub data has profile or project information
    • Returns False for empty or malformed data
  3. Enhanced cache loading logic:

    • Validates cache data before using it
    • Automatically removes corrupted cache files
    • Falls back to fresh processing when cache is invalid
    • Provides informative messages about cache status

Code Example

Before (Problematic):

if DEVELOPMENT_MODE and os.path.exists(cache_filename):
    cached_data = json.loads(Path(cache_filename).read_text())
    resume_data = JSONResume(**cached_data)  # Always loads, even if null

After (Fixed):

if DEVELOPMENT_MODE and os.path.exists(cache_filename):
    cached_data = json.loads(Path(cache_filename).read_text())
    temp_resume_data = JSONResume(**cached_data)
    
    if is_valid_resume_cache(temp_resume_data):
        resume_data = temp_resume_data
        print("✅ Valid cache data loaded")
    else:
        print("⚠️  Cache contains corrupted data, will refresh automatically")
        os.remove(cache_filename)  # Remove and reprocess

Testing

Test Case 1: Normal operation with valid cache

  • Valid cache files are loaded and used correctly
  • No unnecessary reprocessing occurs

Test Case 2: Corrupted resume cache detection

  • Created cache file with all null values
  • System detected corruption and automatically refreshed
  • Fresh data was extracted and cached

Test Case 3: Corrupted GitHub cache detection

  • Created empty GitHub cache file
  • System detected corruption and refetched data
  • Valid GitHub data was retrieved and cached

Test Case 4: JSON parsing errors

  • Malformed JSON cache files are handled gracefully
  • Invalid files are removed and fresh data is processed

User Experience Improvements

Before:

Loading cached data from cache/resumecache_resume.json
❌ Resume shows all null data - user confused

After:

Loading cached data from cache/resumecache_resume.json
⚠️  Cache contains corrupted data, will refresh automatically
✅ Fresh resume data extracted and cached

Impact

  • No breaking changes - Existing functionality remains unchanged
  • Automatic recovery - Users no longer need manual intervention
  • Better debugging - Clear messages indicate when cache is being refreshed
  • Improved reliability - System self-heals from cache corruption
  • Performance maintained - Valid cache is still used to avoid reprocessing

Code Formatting

  • Applied Black formatting to ensure consistent code style per contributing guidelines

Smoke Tests Performed

Test with corrupted cache: Created cache with all null values, system detected and refreshed automatically
Test with valid cache: Existing valid cache files are loaded correctly without reprocessing
Test with malformed JSON: Invalid cache files are handled gracefully and removed
Test end-to-end flow: Full resume processing works correctly after cache refresh

Files Modified

  • score.py - Added cache validation functions and enhanced cache loading logic

This fix ensures users always get valid resume data regardless of cache state, while maintaining the performance benefits of caching for valid data.

Fixes #135

@Mohd-Mursaleen
Copy link
Author

Mohd-Mursaleen commented Oct 10, 2025

@sp2hari @anxkhn-hacker
Review and merge please

@sp2hari
Copy link
Member

sp2hari commented Oct 10, 2025

@Mohd-Mursaleen - Can you explain the scenarios by which the cache is getting corrupted in the first place?

@Mohd-Mursaleen
Copy link
Author

Mohd-Mursaleen commented Oct 10, 2025

@Mohd-Mursaleen - Can you explain the scenarios by which the cache is getting corrupted in the first place?

@sp2hari
Corrupted cache usually happens when there’s an API key error. For me, I entered my Gemini key but forgot to change the model name, so the code ran and created a null cache. Even after fixing the model name, things didn’t work, took me few minutes to figure it was due to the existing null cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Resume data showing as null due to stale cache files

2 participants