Fix sharing-violation 500s + carry forward review fixes (v0.5.4)#23
Merged
Conversation
The previous PR squash-merged with four Copilot review threads unaddressed (auto-merge fired before the fixes landed). Brings those forward, and adds the actual bug they were a symptom of — sharing violations leaking through io_to_s3_error's catch-all arm as 500 InternalError with noisy `error:` log lines. Sharing violations are an expected transient under concurrent writes (the SMB server refuses a file held by another handle). The typed io::ErrorKind::ResourceBusy mapping was already correct; the HTTP layer just had no arm for it. Now maps to 503 SlowDown — the standard retryable S3 status — and logs via slog! instead of serr!, so SDKs auto-retry instead of seeing a fatal 500 and CI stops painting these red. Adds an io_to_s3_error unit test covering all four kinds (NotFound, PermissionDenied, ResourceBusy, fallback) so this regression has a test gate next time. Adds a stderr guard to the three CI integration scripts (test-sccache, test-extended, stress-concurrent) that fails the run if spiceio emits `[spiceio] error:` lines. Expected transients log through slog! without the prefix, so any hit means a new failure mode is leaking through the 500 InternalError arm. Would have caught the original regression. Review fixes carried forward from edc1579: - decode_read_response_from_msg rejects data_offset < SMB2_HEADER_SIZE so a malformed server response cannot slice into the SMB2 header. Adds two unit tests. - GetObject streaming channel capped by an 8 MiB per-request memory budget on top of READ_PIPELINE_DEPTH; at 1 MiB chunks the channel falls from 64 to 8 instead of buffering up to 64 MiB. - bench-live.sh mount_smbfs baseline no longer embeds the SMB password in argv (leaked via shell history / ps) and no longer shells out to perl -MURI::Escape (not in minimal Perl installs). Uses macOS Keychain credentials; skips clearly if not configured.
There was a problem hiding this comment.
Pull request overview
This PR addresses CI-noisy “sharing violation” errors by correctly mapping SMB sharing violations (io::ErrorKind::ResourceBusy) to a retryable S3 response (503 SlowDown) instead of a generic 500, and it carries forward several safety/operational fixes that were missed in the #22 squash merge. It also adds an integration-script guardrail that fails CI if unexpected [spiceio] error: lines appear on stderr, preventing similar regressions from being auto-merged silently.
Changes:
- Map
io::ErrorKind::ResourceBusyto S3503 SlowDown(info-level logging) and add unit tests coveringio_to_s3_errormappings. - Harden SMB read-response decoding by rejecting
data_offsetvalues that point into the SMB2 header, with new unit tests. - Add CI stderr guards to integration scripts and adjust benchmarking baseline to avoid leaking SMB credentials; bump version to v0.5.4.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/smb/protocol.rs |
Adds header-offset validation to zero-copy read decode and corresponding unit tests. |
src/s3/router.rs |
Caps GetObject streaming channel memory; maps ResourceBusy to 503; adds unit tests for error mapping. |
scripts/test-sccache.sh |
Captures spiceio stderr via tee and fails run on unexpected [spiceio] error: lines. |
scripts/test-extended.sh |
Adds stderr capture + guard to catch unexpected spiceio error logs during extended integration run. |
scripts/stress-concurrent.sh |
Adds stderr capture + guard to catch unexpected spiceio error logs during stress run. |
scripts/bench-live.sh |
Removes password-from-argv mounting; uses macOS Keychain-backed mount_smbfs -N and skips if not configured. |
Cargo.toml |
Bumps crate version to 0.5.4. |
Cargo.lock |
Updates locked package version to 0.5.4. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- stream_channel_capacity: extracted from the inline GetObject channel sizing into a tested helper. Floors chunk_size at 1 before the divide-by, so an SMB-negotiated max_read_size of 0 cannot panic the streaming task with a divide-by-zero. Four new unit tests cover the zero-chunk guard plus the default 64 KiB, large 1 MiB, and budget- exceeding cases. - Integration test stderr guards: kill+wait the spiceio PID(s) explicitly before grepping the captured stderr. The previous `sync` was a no-op for the tee process-substitution pipe, and the cleanup trap hadn't fired yet so spiceio's stderr stream was still open. test-sccache.sh has two spiceio instances; the guard now kills both. Misleading comment removed.
| // uncapped channel would buffer up to 64 MiB per concurrent GetObject. | ||
| // At default 64 KiB chunks the channel stays at the full pipeline depth | ||
| // (4 MiB); at 1 MiB chunks it falls to 8 (still room to overlap). | ||
| let chunk_size = handle.max_chunk; |
Comment on lines
+507
to
514
| // `data_offset` is from the start of the SMB2 message. It must not point | ||
| // inside the header (or before it) — a malformed server response with an | ||
| // offset < SMB2_HEADER_SIZE would otherwise slice into header bytes. | ||
| if data_offset < SMB2_HEADER_SIZE { | ||
| return None; | ||
| } | ||
| let start = data_offset; | ||
| let end = start.checked_add(data_length)?; |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two things in one PR — they're tightly coupled by the post-mortem of #22's auto-merge:
Fix the real bug behind the "error: sharing violation" log lines you noticed in CI for Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3) #22.
io_to_s3_errorhad arms forNotFoundandPermissionDeniedbut no arm forResourceBusy— which is the typedio::ErrorKindwe mapSTATUS_SHARING_VIOLATION(0xC0000043) to. Sharing violations were falling into the_catch-all and getting returned as 500 InternalError with a noisy[spiceio] error:stderr line. They're an expected transient under concurrent writes; the correct mapping is 503 SlowDown (SDKs auto-retry on it) logged viaslog!at info level. Adds a unit test for every arm ofio_to_s3_errorso the regression has a gate next time.Carry forward the four Copilot review fixes from Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3) #22 that didn't make the squash merge. Auto-merge fired before the review feedback was addressed. Picking the same fixes from
edc1579(the commit that was stranded on the closed branch) and landing them properly:decode_read_response_from_msgrejectsdata_offset < SMB2_HEADER_SIZEso a malformed server response cannot trick us into slicing into the SMB2 header bytes — two new unit tests.READ_PIPELINE_DEPTH. At default 64 KiB chunks it stays at the full pipeline depth of 64 (4 MiB); at 1 MiB chunks it falls to 8 so a single GetObject can't buffer 64 MiB regardless ofSPICEIO_SMB_MAX_IO.bench-live.shmount baseline stops embedding the SMB password in themount_smbfsargv (leaked via shell history /ps) and drops theperl -MURI::Escapedependency (not in minimal Perl). Uses macOS Keychain credentials and skips with a clear message if not configured.Add a CI stderr guard to all three integration scripts (
test-sccache.sh,test-extended.sh,stress-concurrent.sh): captures spiceio stderr viatee(so CI still streams live) and fails the build if any[spiceio] error:lines slip through. Would have caught the original regression — expected transients log viaslog!without theerror:prefix, so a hit means a new failure mode is leaking through the 500 InternalError arm.Bumps version to v0.5.4.
Why this happened
#22had auto-merge enabled with squash. The Copilot reviewer comments arrived after the required status checks went green; auto-merge picked up the green-CI signal and squashed onto trunk before any human acknowledged the review threads. The sharing-violation logs in the CI run were never going to fail CI because the integration tests count contended outcomes as "losers" by HTTP status, not by stderr content — so a 500 in a contended slot was indistinguishable from a 503 from the test's perspective.The stderr guard closes that loophole.
Test plan
make lint— fmt, clippy strict, rustdoc warnings all cleancargo test --locked --lib— 151/151 pass (was 142 on trunk; +5 for theio_to_s3_errormapping tests, +2 fordata_offsetbounds checks, +1 for the existingdecode_read_response_from_msg_validalready on trunk, +1 for too-short / overflow)./target/release/spiceio --version→spiceio 0.5.4bash -nio_to_s3_errorno longer routesResourceBusythrough the noisy fallbackNotes for review
This is a follow-up to the merged #22, not a re-do of it — the perf changes from #22 are already on trunk via
5736b5f. This PR adds only the corrections.