Skip to content

feat(checkpoint): filesystem checkpointing and rollback before destructive operations#559

Closed
alireza78a wants to merge 3 commits intoNousResearch:mainfrom
alireza78a:feat/filesystem-checkpointing-v2
Closed

feat(checkpoint): filesystem checkpointing and rollback before destructive operations#559
alireza78a wants to merge 3 commits intoNousResearch:mainfrom
alireza78a:feat/filesystem-checkpointing-v2

Conversation

@alireza78a
Copy link
Contributor

Closes #452

What changed

Added filesystem checkpointing using a shadow git repository before any destructive operation (delete, overwrite, move).

How it works

  • Shadow git repo at ~/.hermes/checkpoints/{dir-hash}/
  • Automatic snapshot before rm, mv, and file overwrite
  • restore and list commands via checkpoint tool
  • Uses GIT_DIR env var — never conflicts with project git
  • Nested git repo detection — checkpointing disabled if .git exists in working dir
  • Concurrency guard — threading.Lock() per directory prevents duplicate snapshots
  • Configurable timeout via HERMES_CHECKPOINT_TIMEOUT (10–60s, default 30s)
  • Respects project .gitignore via shadow repo core.excludesFile

Files changed

  • tools/checkpoint_tool.py — new tool
  • tools/patch_parser.py — checkpoint before delete/move
  • tools/file_operations.py — checkpoint before overwrite
  • model_tools.py — register new tool
  • tests/tools/test_checkpoint_tool.py — 50 tests, all passing

Tested on

macOS

@alireza78a alireza78a force-pushed the feat/filesystem-checkpointing-v2 branch from f17715d to 6fb5756 Compare March 7, 2026 00:27
@teknium1
Copy link
Contributor

Thanks for this PR @alireza78a! We loved the concept and the quality of your CheckpointStore implementation. We've taken your core approach (shadow git repos via GIT_DIR + GIT_WORK_TREE, deterministic path hashing, exclude patterns) and built it into a proper integration.

What we changed from your approach:

  1. Not a tool — checkpoints are transparent infrastructure. The LLM never sees it, no prompt tokens consumed, no tool schema overhead.

  2. Once per turn, not per write — instead of hooking into every write_file call, we checkpoint lazily on the first file-mutating operation per conversation turn. This avoids the performance overhead of running git add -A && git commit on every single write.

  3. Opt-in — enabled via --checkpoints CLI flag or checkpoints: { enabled: true } in config.yaml. No mandatory overhead for users who don't want it.

  4. /rollback command — user-facing restore via slash command in both CLI and gateway (Telegram/Discord/etc.). Lists checkpoints with timestamps, restore by number.

  5. No injection into file_operations.py or patch_parser.py — trigger lives in run_agent.py's tool dispatch loop, keeping core file tools clean and decoupled.

Your shadow git approach was the right design — we just adjusted the integration points. Thanks for the inspiration and the solid foundation! 🎉

@teknium1 teknium1 closed this Mar 10, 2026
teknium1 added a commit that referenced this pull request Mar 10, 2026
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: feat: filesystem checkpointing and rollback before destructive operations

2 participants