Skip to content

Conversation

@tanishkaa08
Copy link

Implements #1400

This PR implements a comprehensive snapshot and checkpoint system for the Sail RISC-V emulator, enabling users to save and restore complete architectural state at any point during simulation. This feature significantly improves workflow efficiency by allowing users to skip lengthy boot processes and resume simulations from saved states.

Features Implemented

Snapshot System

  • Full state capture: Saves PC, all registers (x0-x31, f0-f31), key machine-mode CSRs, complete memory state, and platform state
  • Snapshot creation: --snapshot-save <file> saves state at end of simulation
  • Snapshot restoration: --snapshot-restore <file> restores simulation from saved state
  • File format: JSON metadata + sparse binary memory dump for efficient storage

Checkpoint System

Automatic snapshot creation based on configurable conditions:

  • Instruction intervals: --checkpoint-interval <N> creates checkpoint every N instructions
  • PC-based: --checkpoint-pc <hex> creates checkpoint when PC reaches specified value
  • Memory access: --checkpoint-memory-read <addr> and --checkpoint-memory-write <addr> trigger on memory access
  • Register writes: --checkpoint-register-write <reg> triggers when integer register is written
  • Floating-point register writes: --checkpoint-freg-write <reg> triggers when FP register is written
  • CSR access: --checkpoint-csr-read <csr> and --checkpoint-csr-write <csr> trigger on CSR access
  • Statistics: Reports checkpoint creation statistics at end of simulation

Technical Implementation

Architecture

  1. SnapshotManager (snapshot_manager.h/cpp)
    • Orchestrates snapshot creation and restoration
    • Coordinates with Sail interface and memory functions
    • Handles file I/O and JSON serialization
  2. CheckpointManager (checkpoint_manager.h/cpp)
    • Implements callbacks_if interface to monitor simulation events
    • Tracks checkpoint conditions and triggers snapshot creation
    • Provides statistics on checkpoint creation
  3. SnapshotSailInterface (snapshot_sail_interface.h/cpp)
    • Provides C++ wrapper for accessing Sail model state
    • Directly accesses model members (PC, registers, CSRs) for safety
    • Handles type conversions between Sail types and C++ types
  4. Memory Snapshot Functions (snapshot_memory.h/c)
    • C functions interfacing with Sail's runtime memory system (rts.c)
    • Enumerates and dumps memory blocks
    • Handles memory restoration
  5. Snapshot Format (snapshot_format.h/cpp)
    • JSON serialization/deserialization using jsoncons library
    • Defines data structures for snapshot state
    • Handles versioning and metadata
  6. Sail State Access (model/postlude/snapshot.sail)
    • Sail functions for accessing architectural state
    • Compiled by Sail into C++ functions

State Capture Details

Registers:

  • Integer registers (x0-x31): Direct access via model.zxN members
  • Floating-point registers (f0-f31): Direct access via model.zfN members
  • Vector registers: Placeholder implementation (needs GMP handling)

CSRs:

  • Direct access to model members (e.g., model.zmstatus.zbits, model.zmepc)
  • Captures 18 key machine-mode CSRs (mstatus, misa, mtvec, mepc, mcause, etc.)
  • Read-only CSR protection during restoration (mvendorid, marchid, mimpid, mhartid, mconfigptr)

Memory:

  • Sparse binary format with index file
  • Enumerates all allocated memory blocks from Sail runtime
  • Efficient storage of only allocated memory regions

Platform State:

  • Privilege level, hart state
  • HTIF state (tohost, done, exit_code)
  • Timer state (mtime, mtimecmp)
  • Reservation state

Integration Points

  1. ModelImpl (riscv_model_impl.h/cpp)
    • Added SnapshotManager and CheckpointManager as members
    • Initialized in constructor
    • Checkpoint manager registered as callback
  2. CLI Integration (riscv_sim.cpp)
    • Added 12 new CLI options using CLI11 library
    • Snapshot save/restore logic in finish() and init_sail()
    • Checkpoint configuration in init_sail()
  3. Build System (CMakeLists.txt)
    • Added all new source files to riscv_model library
  4. Sail Build (riscv.sail_project)
    • Added postlude/snapshot.sail to build
    • Added FD_core and V_core to requirements for register access

Design Decisions

  1. Direct Model Access: Instead of using zread_CSR() API (which can crash for non-existent CSRs), we directly access model members for safety and reliability.
  2. Sparse Memory Format: Memory is stored in sparse binary format to avoid storing large zero-filled regions, improving efficiency.
  3. Callback-Based Checkpoints: Checkpoint system uses existing callback infrastructure, making it non-intrusive and efficient.
  4. Read-Only CSR Protection: During restoration, read-only CSRs (machine information registers) are skipped to prevent errors.

Files Changed

  • c_emulator/snapshot_format.h/cpp - JSON serialization/deserialization
  • c_emulator/snapshot_sail_interface.h/cpp - Sail model state access
  • c_emulator/snapshot_manager.h/cpp - Snapshot orchestration
  • c_emulator/checkpoint_manager.h/cpp - Checkpoint system
  • c_emulator/snapshot_memory.h/c - Memory dump/restore functions
  • model/postlude/snapshot.sail - Sail state access functions
  • c_emulator/riscv_model_impl.h/cpp - Added snapshot/checkpoint managers
  • c_emulator/riscv_sim.cpp - Added CLI options and integration
  • c_emulator/CMakeLists.txt - Added new source files to build
  • model/riscv.sail_project - Added snapshot.sail to Sail build

Testing

The feature has been tested with:

  • Snapshot creation and restoration
  • All checkpoint types (interval, PC, memory, register, CSR)
  • Multiple checkpoint conditions simultaneously
  • Checkpoint file restoration
  • State file validation (JSON structure, completeness)
  • Read-only CSR protection during restoration

Known Limitations

  • Vector registers: Currently placeholder implementation (needs GMP handling for lbits)
  • CSR coverage: Only key machine-mode CSRs captured (18 CSRs, not all 4096 possible)
  • PC checkpoints: May require PC values that are actually reached during execution
  • Memory blocks: Empty memory may result in 0-byte memory files (expected behavior)

@jrtc27
Copy link
Collaborator

jrtc27 commented Jan 6, 2026

Did you write this PR description entirely by yourself?

@tanishkaa08
Copy link
Author

tanishkaa08 commented Jan 6, 2026

Did you write this PR description entirely by yourself?

Hi, I wrote a rough implementation design based on the feature requested and formatted it with the help of AI.
However, I assure you I don’t intend to make low quality PR’s and before opening this PR, I took 2 days just to review all the changes entirely by myself.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jan 6, 2026

The code itself was written by an LLM too, not just the PR description?

@Timmmm
Copy link
Collaborator

Timmmm commented Jan 6, 2026

Yeah the AI definitely lowers my expectations, and my desire to review it. In any case this is a ton of code and I'm not sure the approach is the best option. I was thinking we'd just have Sail auto-generate functions to save/restore the model state. Much easier and probably less error-prone.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jan 6, 2026

Some communities have jumped onboard the vibe-coding hype train, but you should really disclose your use of generative AI unless you know the community in question does not expect it.

@tanishkaa08
Copy link
Author

Some communities have jumped onboard the vibe-coding hype train, but you should really disclose your use of generative AI unless you know the community in question does not expect it.

The implementation design and the code was written by me but I did took help in few areas like error solving or syntax help. I apologize if I violated the community guidelines in any way, that was not my intention.

@nadime15
Copy link
Collaborator

nadime15 commented Jan 7, 2026

Hey @tanishkaa08, regardless of whether you used AI or not and to what extent, thank you for your contribution!

I will definitely check your submitted code in the next days. We can still figure out what the best approach is to implement this feature, since we did not discuss different approaches publicly and only in our Monday weekly meeting, and to what extent AI tools can be used or not is a separate topic even though as the others have said we should avoid heavy usage of it or at least disclose its use.

When it comes to formatting, I think it is totally fine and to be honest, I wish that some would pipe their (sail) code into some tool to at least format it before submitting.

@Timmmm
Copy link
Collaborator

Timmmm commented Jan 12, 2026

Hello, we discussed this in the meeting (there's a weekly meeting for this project btw) and we're going to go with the approach of having the Sail compiler automatically generate snapshotting functions for the mode. I made a very initial start on this here: rems-project/sail#1602

Basically how it will work:

  1. The Sail compiler generates model_serialize() and model_deserialize() functions that save/restore the state of all register globals.
  2. In order to prevent errors when you save from one model and restore from another, it will find all the code for the registers, and the types that they use, and then pretty print that code and hash it. The hash will be saved in the snapshot file and checked when you read it. This means that you can still do stuff like adding printf debugging or even changing logic as long as it doesn't change the state space of the model.

Sorry that's probably not what you wanted to hear after making a big PR 😬

@tanishkaa08
Copy link
Author

tanishkaa08 commented Jan 13, 2026

Hello, we discussed this in the meeting (there's a weekly meeting for this project btw) and we're going to go with the approach of having the Sail compiler automatically generate snapshotting functions for the mode. I made a very initial start on this here: rems-project/sail#1602

Basically how it will work:

  1. The Sail compiler generates model_serialize() and model_deserialize() functions that save/restore the state of all register globals.
  2. In order to prevent errors when you save from one model and restore from another, it will find all the code for the registers, and the types that they use, and then pretty print that code and hash it. The hash will be saved in the snapshot file and checked when you read it. This means that you can still do stuff like adding printf debugging or even changing logic as long as it doesn't change the state space of the model.

Sorry that's probably not what you wanted to hear after making a big PR 😬

Hi, Thank you for the follow up message. Please let me know if I could somehow contribute further and also make it to the weekly meetings if it is open for new contributors. I believe that understanding the community would help me make a much more meaningful contribution.

@Timmmm
Copy link
Collaborator

Timmmm commented Jan 13, 2026

It's the "RV-LFX Golden Model" meeting on Monday on this calendar - open to everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants