Skip to content

[Bug]: FileStorage crashes on corrupted run JSON instead of failing gracefully #1488

@AadiSharma49

Description

@AadiSharma49

Describe the Bug

When a run JSON file is corrupted or invalid, FileStorage.load_run() crashes with a Pydantic ValidationError.
Instead of handling the error gracefully, the exception propagates and breaks execution.
This can easily happen if a file is partially written, manually edited, or corrupted on disk.
For a storage layer, this behavior is risky and can crash long-running systems.

To Reproduce

Steps to reproduce the behavior:

  1. Go to
core/framework/storage/backend.py
  1. Go to the load_run() method:
def load_run(self, run_id: str) -> Run  { ... }
  1. Create a corrupted JSON file and try to load it.
def test_corrupted_run_file_is_handled_gracefully():
    with tempfile.TemporaryDirectory() as tmp:
        storage = FileStorage(tmp)

        run_path = Path(tmp) / "runs" / "bad.json"
        run_path.parent.mkdir(parents=True, exist_ok=True)
        run_path.write_text("{ invalid json", encoding="utf-8")

        result = storage.load_run("bad")
        assert result is None
  1. See error

Actual error raised during test execution:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Run
Invalid JSON: key must be a string at line 1 column 3
[type=json_invalid, input_value='{ invalid json', input_type=str]

Stack trace (shortened):

File "framework/storage/backend.py", line 76, in load_run
    return Run.model_validate_json(f.read())
pydantic_core._pydantic_core.ValidationError

Expected Behavior

  • load_run() should not crash
  • It should return None if JSON is corrupted
  • Optionally log a warning for debugging
  • This keeps the storage layer safe and fault-tolerant.

Actual Behavior

  • The method raises a ValidationError
  • The exception propagates
  • Caller crashes instead of handling the failure

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

  • OS: Windows 11
  • Python version: 3.12.x
  • Project version: Latest main branch
  • File: core/framework/storage/backend.py

Additional Context

  • Corrupted files can happen due to partial writes or disk issues
  • Storage backends should always fail gracefully
  • A simple try/except around JSON parsing would solve this

Request

If this issue is accepted, please assign it to me @bryanadenhq , @TimothyZhang7 , @vincentjiang777 review it

I would like to work on the fix and add proper test coverage.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions