Skip to content

Fix sharing-violation 500s + carry forward review fixes (v0.5.4)#23

Merged
lukekim merged 2 commits into
trunkfrom
fix/sharing-violation-and-review-feedback
May 14, 2026
Merged

Fix sharing-violation 500s + carry forward review fixes (v0.5.4)#23
lukekim merged 2 commits into
trunkfrom
fix/sharing-violation-and-review-feedback

Conversation

@lukekim
Copy link
Copy Markdown
Contributor

@lukekim lukekim commented May 14, 2026

Summary

Two things in one PR — they're tightly coupled by the post-mortem of #22's auto-merge:

  1. Fix the real bug behind the "error: sharing violation" log lines you noticed in CI for Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3) #22. io_to_s3_error had arms for NotFound and PermissionDenied but no arm for ResourceBusy — which is the typed io::ErrorKind we map STATUS_SHARING_VIOLATION (0xC0000043) to. Sharing violations were falling into the _ catch-all and getting returned as 500 InternalError with a noisy [spiceio] error: stderr line. They're an expected transient under concurrent writes; the correct mapping is 503 SlowDown (SDKs auto-retry on it) logged via slog! at info level. Adds a unit test for every arm of io_to_s3_error so the regression has a gate next time.

  2. Carry forward the four Copilot review fixes from Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3) #22 that didn't make the squash merge. Auto-merge fired before the review feedback was addressed. Picking the same fixes from edc1579 (the commit that was stranded on the closed branch) and landing them properly:

    • decode_read_response_from_msg rejects data_offset < SMB2_HEADER_SIZE so a malformed server response cannot trick us into slicing into the SMB2 header bytes — two new unit tests.
    • GetObject streaming channel capped by an 8 MiB per-request memory budget on top of READ_PIPELINE_DEPTH. At default 64 KiB chunks it stays at the full pipeline depth of 64 (4 MiB); at 1 MiB chunks it falls to 8 so a single GetObject can't buffer 64 MiB regardless of SPICEIO_SMB_MAX_IO.
    • bench-live.sh mount baseline stops embedding the SMB password in the mount_smbfs argv (leaked via shell history / ps) and drops the perl -MURI::Escape dependency (not in minimal Perl). Uses macOS Keychain credentials and skips with a clear message if not configured.
  3. Add a CI stderr guard to all three integration scripts (test-sccache.sh, test-extended.sh, stress-concurrent.sh): captures spiceio stderr via tee (so CI still streams live) and fails the build if any [spiceio] error: lines slip through. Would have caught the original regression — expected transients log via slog! without the error: prefix, so a hit means a new failure mode is leaking through the 500 InternalError arm.

  4. Bumps version to v0.5.4.

Why this happened

#22 had auto-merge enabled with squash. The Copilot reviewer comments arrived after the required status checks went green; auto-merge picked up the green-CI signal and squashed onto trunk before any human acknowledged the review threads. The sharing-violation logs in the CI run were never going to fail CI because the integration tests count contended outcomes as "losers" by HTTP status, not by stderr content — so a 500 in a contended slot was indistinguishable from a 503 from the test's perspective.

The stderr guard closes that loophole.

Test plan

  • make lint — fmt, clippy strict, rustdoc warnings all clean
  • cargo test --locked --lib151/151 pass (was 142 on trunk; +5 for the io_to_s3_error mapping tests, +2 for data_offset bounds checks, +1 for the existing decode_read_response_from_msg_valid already on trunk, +1 for too-short / overflow)
  • ./target/release/spiceio --versionspiceio 0.5.4
  • All four shell scripts pass bash -n
  • CI integration run against the NAS: sccache + extended + stress should all pass; the new stderr guard should not flag anything since io_to_s3_error no longer routes ResourceBusy through the noisy fallback

Notes for review

This is a follow-up to the merged #22, not a re-do of it — the perf changes from #22 are already on trunk via 5736b5f. This PR adds only the corrections.

The previous PR squash-merged with four Copilot review threads
unaddressed (auto-merge fired before the fixes landed). Brings those
forward, and adds the actual bug they were a symptom of — sharing
violations leaking through io_to_s3_error's catch-all arm as 500
InternalError with noisy `error:` log lines.

Sharing violations are an expected transient under concurrent writes
(the SMB server refuses a file held by another handle). The typed
io::ErrorKind::ResourceBusy mapping was already correct; the HTTP
layer just had no arm for it. Now maps to 503 SlowDown — the standard
retryable S3 status — and logs via slog! instead of serr!, so SDKs
auto-retry instead of seeing a fatal 500 and CI stops painting these
red.

Adds an io_to_s3_error unit test covering all four kinds (NotFound,
PermissionDenied, ResourceBusy, fallback) so this regression has a
test gate next time.

Adds a stderr guard to the three CI integration scripts
(test-sccache, test-extended, stress-concurrent) that fails the run
if spiceio emits `[spiceio] error:` lines. Expected transients log
through slog! without the prefix, so any hit means a new failure
mode is leaking through the 500 InternalError arm. Would have caught
the original regression.

Review fixes carried forward from edc1579:
- decode_read_response_from_msg rejects data_offset <
  SMB2_HEADER_SIZE so a malformed server response cannot slice into
  the SMB2 header. Adds two unit tests.
- GetObject streaming channel capped by an 8 MiB per-request memory
  budget on top of READ_PIPELINE_DEPTH; at 1 MiB chunks the channel
  falls from 64 to 8 instead of buffering up to 64 MiB.
- bench-live.sh mount_smbfs baseline no longer embeds the SMB
  password in argv (leaked via shell history / ps) and no longer
  shells out to perl -MURI::Escape (not in minimal Perl installs).
  Uses macOS Keychain credentials; skips clearly if not configured.
Copilot AI review requested due to automatic review settings May 14, 2026 09:00
@lukekim lukekim self-assigned this May 14, 2026
@lukekim lukekim added the enhancement New feature or request label May 14, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses CI-noisy “sharing violation” errors by correctly mapping SMB sharing violations (io::ErrorKind::ResourceBusy) to a retryable S3 response (503 SlowDown) instead of a generic 500, and it carries forward several safety/operational fixes that were missed in the #22 squash merge. It also adds an integration-script guardrail that fails CI if unexpected [spiceio] error: lines appear on stderr, preventing similar regressions from being auto-merged silently.

Changes:

  • Map io::ErrorKind::ResourceBusy to S3 503 SlowDown (info-level logging) and add unit tests covering io_to_s3_error mappings.
  • Harden SMB read-response decoding by rejecting data_offset values that point into the SMB2 header, with new unit tests.
  • Add CI stderr guards to integration scripts and adjust benchmarking baseline to avoid leaking SMB credentials; bump version to v0.5.4.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/smb/protocol.rs Adds header-offset validation to zero-copy read decode and corresponding unit tests.
src/s3/router.rs Caps GetObject streaming channel memory; maps ResourceBusy to 503; adds unit tests for error mapping.
scripts/test-sccache.sh Captures spiceio stderr via tee and fails run on unexpected [spiceio] error: lines.
scripts/test-extended.sh Adds stderr capture + guard to catch unexpected spiceio error logs during extended integration run.
scripts/stress-concurrent.sh Adds stderr capture + guard to catch unexpected spiceio error logs during stress run.
scripts/bench-live.sh Removes password-from-argv mounting; uses macOS Keychain-backed mount_smbfs -N and skips if not configured.
Cargo.toml Bumps crate version to 0.5.4.
Cargo.lock Updates locked package version to 0.5.4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/s3/router.rs
Comment thread scripts/stress-concurrent.sh Outdated
- stream_channel_capacity: extracted from the inline GetObject channel
  sizing into a tested helper. Floors chunk_size at 1 before the
  divide-by, so an SMB-negotiated max_read_size of 0 cannot panic the
  streaming task with a divide-by-zero. Four new unit tests cover the
  zero-chunk guard plus the default 64 KiB, large 1 MiB, and budget-
  exceeding cases.
- Integration test stderr guards: kill+wait the spiceio PID(s)
  explicitly before grepping the captured stderr. The previous `sync`
  was a no-op for the tee process-substitution pipe, and the cleanup
  trap hadn't fired yet so spiceio's stderr stream was still open.
  test-sccache.sh has two spiceio instances; the guard now kills both.
  Misleading comment removed.
@lukekim lukekim requested a review from Copilot May 14, 2026 09:19
@lukekim lukekim enabled auto-merge (squash) May 14, 2026 09:19
@lukekim lukekim merged commit e31c63c into trunk May 14, 2026
6 checks passed
@lukekim lukekim deleted the fix/sharing-violation-and-review-feedback branch May 14, 2026 09:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

Comment thread src/s3/router.rs
// uncapped channel would buffer up to 64 MiB per concurrent GetObject.
// At default 64 KiB chunks the channel stays at the full pipeline depth
// (4 MiB); at 1 MiB chunks it falls to 8 (still room to overlap).
let chunk_size = handle.max_chunk;
Comment thread src/smb/protocol.rs
Comment on lines +507 to 514
// `data_offset` is from the start of the SMB2 message. It must not point
// inside the header (or before it) — a malformed server response with an
// offset < SMB2_HEADER_SIZE would otherwise slice into header bytes.
if data_offset < SMB2_HEADER_SIZE {
return None;
}
let start = data_offset;
let end = start.checked_add(data_length)?;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants