-
Notifications
You must be signed in to change notification settings - Fork 70
Add simulated filesystem and barriers for crash-consistency testing #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
#15 for tracking |
mcches
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is blocking. Thank you for this contribution.
| pub struct FsHandleGuard { | ||
| _private: (), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub struct FsHandleGuard { | |
| _private: (), | |
| } | |
| pub struct FsHandleGuard; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is pub, keeping the private field prevents external construction (users must go through FsHandle::enter()). This also preserves forward compatibility if we ever need to add state to the guard. Added a comment explaining this.
src/fs/mod.rs
Outdated
| } | ||
|
|
||
| /// Clear entire cache. | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed - YAGNI.
src/fs/mod.rs
Outdated
| /// - Files in `persisted_files` but not in `synced_entries` are orphaned on crash | ||
| pub(crate) struct Fs { | ||
| /// Persisted file data (survives crash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - the comments were ambiguous. Updated to clarify the inode vs directory entry distinction: data in persisted_* survives crashes, but is only reachable if the path is also in synced_entries. Without a synced directory entry, the data is orphaned and cleaned up on crash.
src/fs/shim/std/fs/mod.rs
Outdated
| /// O_DIRECT flag value for bypassing page cache. | ||
| /// Platform-specific: Linux=0x4000, macOS doesn't have O_DIRECT (we use F_NOCACHE via fcntl). | ||
| /// For our simulation, we just need a consistent non-zero value users can pass. | ||
| #[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother with cfg here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - simplified. All branches had the same value because this is a simulation constant, not the real platform flag. macOS doesn't have O_DIRECT at all (uses fcntl F_NOCACHE), so we just use Linux's 0x4000 as the canonical value for portability.
Filesystem (`unstable-fs` feature): - Drop-in replacements for std::fs and tokio::fs with durability simulation - POSIX semantics: separate file data sync (fsync) and directory entry sync - Crash consistency testing: pending writes lost on crash unless synced - Configurable behaviors: sync probability, capacity limits, I/O errors, silent corruption, torn writes, I/O latency, and page cache simulation - Per-host isolation with FsHandle for worker thread support Barriers (`unstable-barriers` feature): - Inject hooks at specific code points for deterministic test control - Support for suspend/resume, panic injection, and observation - Integrates with fs corruption events for targeted testing
Summary
Adds two related features behind unstable feature flags:
unstable-fs) - Drop-in replacements forstd::fsandtokio::fswith crash-consistency testing supportunstable-barriers) - Observable synchronization points for testing (see Add barriers to turmoil #229)Filesystem Usage
Key Design Choices
POSIX durability model: File data and directory entries are tracked separately.
sync_all()makes file data durable;sync_dir()makes directory entries durable. Both are required for a new file to survive a crash.Torn writes: With
block_sizeconfigured, pending writes may be partially applied on crash (0 to N blocks survive), simulating real disk behavior.I/O latency + page cache (tokio shim only): Async operations can simulate realistic I/O timing with cache hits (~100ns) vs misses (configurable latency). O_DIRECT bypasses the cache. Sync operations (
std::fsshim) complete immediately.Worker thread support:
FsHandle::current()captures context for use inspawn_blockingthreads.Limitations
flock/fcntl)tokio::fs) operationsTest Coverage
86 tests covering durability, crash recovery, torn writes, symlinks, hard links, metadata, page cache, and worker threads.