Skip to content

fix(rest): cp box->host extracts single file to dest (POL-37)#633

Open
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/rest-cp-out-extract-via-shared-tar
Open

fix(rest): cp box->host extracts single file to dest (POL-37)#633
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/rest-cp-out-extract-via-shared-tar

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented Jun 1, 2026

when REST copy_out a regular file from box to outside, a dir created in host because of the tar logic

Test plan

cargo test -p boxlite --lib --features rest tar_extract — 4 unit tests, in-memory tar construction:

scenario pre-fix post-fix
cp box:/etc/hosts ./hosts (tar = ./ + ./etc/ + ./etc/hosts) ./hosts/etc/hosts (directory tree) ./hosts is a regular file, content = /etc/hosts from the box
cp box:/foo ./out/ (trailing slash) dir dir (no regression)
cp box:/dir ./dir-copy (multi-file) dir dir (no regression)
--no-overwrite + dst exists old unpack overwrites silently error exists and overwrite=false, pre-existing content preserved

Two-side: temporarily restore the old archive.unpack body for extract_tar_bytes_to_pathsingle_regular_file_lands_at_dest_path fails with dest must be a regular file, not a directory tree and overwrite_false_refuses_existing_dest fails with expected overwrite-refusal error, got: failed to unpack — those are exactly the two POL-37 symptoms. Restore the fix: 4/4 pass.

Summary by CodeRabbit

  • Bug Fixes

    • Improved tar archive extraction to correctly handle synthetic directory entries and symlinks.
    • Enhanced overwrite protection during file extraction operations.
  • Tests

    • Expanded test coverage for archive extraction scenarios, including docker-cp-style archives and symlink handling.
    • Added validation for proper error handling when archives are missing.

… dir (POL-37)

The REST copy_out extracted the response tar with archive.unpack(host_dst)
unconditionally — i.e. it always treated host_dst as a target directory.
For `boxlite cp box:/etc/hosts ./hosts` the server's tar carries the
parent path entries (`./`, `./etc/`) along with `./etc/hosts`, so the
client built `./hosts/etc/hosts` instead of writing the file's content
to `./hosts`. The user reported this as "writes raw tar instead of the
file." Now the client counts regular file entries (ignoring the
synthetic directory entries the server interleaves) and, when the tar
has exactly one regular file and the dest doesn't look like a directory,
extracts that one entry's body to host_dst — matching `docker cp`'s
single-file UX. Multi-file and trailing-slash cases keep the previous
"unpack into directory" path with proper overwrite-refusal added.

Also threads include_parent / follow_symlinks through the query
string so a future server-side fix that honors them sends leaner tars
that the existing single-file branch will pick up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the fix/rest-cp-out-extract-via-shared-tar branch from f56bb75 to e95a4ec Compare June 1, 2026 08:55
…cker

The first cut of POL-37 inlined an extract_tar_bytes_to_path in
src/boxlite/src/rest/litebox.rs that paralleled
src/shared/src/tar.rs::unpack (the helper the local copy_out path
already uses via box_impl.rs:582). That meant two detection rules,
two extraction loops, two test suites, and two paths a future fix
has to remember to update. The PR description claimed "the client
still delegates to the shared tar unpacker and inherits its
detection logic" — it didn't.

This change makes that statement true.

Changes:

  - shared::tar gains `unpack_bytes(bytes, dest, opts)` as the
    sibling of the existing path-based `unpack`. Both go through a
    new private `unpack_archive_blocking` helper that takes a
    re-openable reader factory (`File::open(&path)` for unpack,
    `Cursor::new(bytes.as_slice())` for unpack_bytes), so neither
    public entry has to duplicate the FileToFile / IntoDirectory
    branches.

  - `detect_extraction_mode` now skips entries whose type is
    `is_dir()` when counting. The whole reason the REST handler
    needed its own detector was that shared counted every entry —
    the docker-cp `./` + `./etc/` + `./etc/hosts` tar tripped
    `count > 1` and got routed to IntoDirectory mode. Skipping
    pure-directory entries lands the single regular file at the
    user-supplied destination, matching docker semantics. Other
    non-content entries (Symlink, hardlink Link, sparse, char/block,
    fifo) DO still count toward `content_count`, so a tar carrying
    Regular + Symlink stays in directory mode and the symlink is
    not silently dropped — covering the previously-latent multi-
    entry footgun the REST-only detector had.

  - The FileToFile extraction path likewise tolerates synthetic
    directory entries: it walks past `is_dir()` headers before
    unpacking the first content entry. Confirmed via the detection
    pass that there is exactly one such entry, so there's no
    ambiguity about which one we land at `dest`.

  - REST `litebox.rs::copy_out` becomes a 7-line
    `boxlite_shared::tar::unpack_bytes(bytes.to_vec(),
    host_dst.to_path_buf(), UnpackContext { overwrite, mkdir_parents:
    true, force_directory: false }).await`. The 100-line inline
    helper, its stale "Extract a tar archive to a host directory"
    doc, and the unreachable "tar promised a single regular file
    but the second pass found none" branch all go away.

  - The four inline `tar_extract_tests` (single-file-to-dest,
    trailing slash, multi-file, overwrite refusal) move to shared
    as `unpack_bytes_*` tests where they actually exercise the
    helper, plus a new `unpack_bytes_regular_plus_symlink_uses_
    dir_mode_not_file_mode` to pin the symlink-not-dropped
    behavior the old REST-side detector could not have caught.
    Net diff for tests is positive — the shared coverage strictly
    supersets the deleted inline coverage.

Local copy_out (src/boxlite/src/litebox/box_impl.rs:582) inherits
the same detection refinement automatically. shared::tar::pack
doesn't emit synthetic ancestor-dir entries for single-file packs
(verified via `append_path_with_name` at shared/src/tar.rs:76), so
local behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@G4614 G4614 marked this pull request as ready for review June 2, 2026 10:25
@G4614 G4614 marked this pull request as draft June 2, 2026 10:29
The Codecov patch report on boxlite-ai#633 flagged 10 uncovered lines in
shared/tar.rs; most were `?` propagation paths in the new
`unpack_archive_blocking` core that fire only when the `open` closure
returns `Err`. The path-based `unpack` is the only entry point whose
closure can actually fail (the `unpack_bytes` Cursor never errors),
so one test against a nonexistent tar path covers them all:

  - the `map_err` closure body (formatting the named error)
  - the `?` after `detect_extraction_mode` (line 161)
  - the `?` inside `detect_extraction_mode` calling `open()?` (line 277)

Asserting on the error *message* (rather than just `is_err`) is the
load-bearing piece: callers that don't control the tar location
(e.g. local `copy_out` reading a tar staged by a peer process) need
the operation name AND the missing path so the operator can act,
not a generic spawn_blocking traceback.

Patch coverage 96.0% → 96.5%. The remaining 5 uncovered lines are
`tokio::JoinError` map_err arms (124 / 142 — spawn_blocking panic,
not reachable without panic injection), the symmetric second-pass
`open()?` failure inside FileToFile / IntoDirectory branches
(188 / 238 — a path-side race or a Cursor that can't fail), and
the malformed-tar-entry `continue` arm (282 — covering it ties the
test to the `tar` crate's specific header-decode failure modes).
Those are defensive paths that would require either panic injection
or brittle malformed-tar fixtures to hit; covering them would
trade one regression-guard for several Codecov-driven near-stubs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review June 2, 2026 10:41
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors tar extraction into a shared, generic routine that accepts a configurable reader opener, enabling consistent extraction behavior across on-disk and in-memory archives. REST copy_out now threads CopyOptions through the download step and delegates to the shared unpack_bytes routine, replacing local tar handling.

Changes

Shared tar extraction and REST integration

Layer / File(s) Summary
Shared tar unpacking refactoring
src/shared/src/tar.rs
Introduces unpack_archive_blocking as the common extraction path; refactors unpack and new unpack_bytes to route through it via spawn_blocking. Strengthens detect_extraction_mode to ignore synthetic directory entries, correctly routing single-file and docker-cp-style archives. Updates FileToFile and IntoDirectory extraction to work with the generic reader opener.
Shared tar extraction tests
src/shared/src/tar.rs
New tests validate unpack_bytes with docker-cp synthetic parents, single-file routing, multi-entry directory mode, symlink preservation, overwrite=false, and missing tar file error reporting.
REST copy_out integration
src/boxlite/src/rest/litebox.rs
copy_out now accepts CopyOptions, passes include_parent and follow_symlinks as query parameters to the REST /files endpoint, delegates extraction to unpack_bytes with UnpackContext from opts, and removes the local extract_tar_to_path helper.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

A tarball once lived in two separate ways,
on disk and in bytes, through REST's winding maze.
Now one unpacker guides them with care,
while RestBox threads choices through network air. 📦✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: enabling single-file extraction to destination in the REST copy_out flow instead of always unpacking as a directory structure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Jun 3, 2026

CLA assistant check
All committers have signed the CLA.

@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Jun 3, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ gamnaansong
❌ DorianZheng


gamnaansong seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/shared/src/tar.rs (1)

215-237: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Logic error: overwrite=false check fires after creating the directory with mkdir_parents.

When dest doesn't exist, mkdir_parents=true creates it at line 218. Then line 232 checks dest.exists() && !opts.overwrite, which is now true — causing an error for a directory we just created. This breaks the mkdir_parents + overwrite=false combination for IntoDirectory mode.

The check should occur before potentially creating the directory:

🐛 Proposed fix: reorder the overwrite check
         ExtractionMode::IntoDirectory => {
+            if dest.exists() && !opts.overwrite {
+                return Err(BoxliteError::Storage(format!(
+                    "destination {} exists and overwrite=false",
+                    dest.display()
+                )));
+            }
             if !dest.exists() {
                 if opts.mkdir_parents {
                     std::fs::create_dir_all(dest).map_err(|e| {
                         BoxliteError::Storage(format!(
                             "failed to create destination {}: {}",
                             dest.display(),
                             e
                         ))
                     })?;
                 } else {
                     return Err(BoxliteError::Storage(format!(
                         "destination {} does not exist",
                         dest.display()
                     )));
                 }
             }
-            if dest.exists() && !opts.overwrite {
-                return Err(BoxliteError::Storage(format!(
-                    "destination {} exists and overwrite=false",
-                    dest.display()
-                )));
-            }
             let mut archive = tar::Archive::new(open()?);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/src/tar.rs` around lines 215 - 237, The overwrite check is done
after creating the destination when handling ExtractionMode::IntoDirectory,
causing mkdir_parents=true to create dest then fail if opts.overwrite is false;
move the check that returns an error when dest.exists() && !opts.overwrite to
occur before any call to std::fs::create_dir_all(dest) so that the code first
refuses to proceed when an existing destination disallows overwrite (refer to
ExtractionMode::IntoDirectory, dest, opts.mkdir_parents, opts.overwrite and the
std::fs::create_dir_all call).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/shared/src/tar.rs`:
- Around line 215-237: The overwrite check is done after creating the
destination when handling ExtractionMode::IntoDirectory, causing
mkdir_parents=true to create dest then fail if opts.overwrite is false; move the
check that returns an error when dest.exists() && !opts.overwrite to occur
before any call to std::fs::create_dir_all(dest) so that the code first refuses
to proceed when an existing destination disallows overwrite (refer to
ExtractionMode::IntoDirectory, dest, opts.mkdir_parents, opts.overwrite and the
std::fs::create_dir_all call).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b518b0e-ae7b-40af-af80-9c2d0f0f9985

📥 Commits

Reviewing files that changed from the base of the PR and between 6802ca7 and 14dc56d.

📒 Files selected for processing (2)
  • src/boxlite/src/rest/litebox.rs
  • src/shared/src/tar.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants