Add an option to limit scope of an `upload` run

(Description adapted from a Claude plan.)

## Context

A customer has a very large dataset and wants to upload it in smaller chunks (subdirectories), but end up with a single root CID. They could upload pieces separately, but even within a space, each upload has its own shards and its own scans, so that wouldn't help. Instead, this approach is to work within a single upload, scoping each run to a subdirectory, then doing a final full run that builds the root.

## How It Works

This builds on #336, in which `--assume-unchanged-sources` is extended to work for each FS Entry, not just at the root as a whole.

```
# Run 1: Upload only subdir1
guppy upload --only subdir1/ --assume-unchanged-sources

# Run 2: Upload only subdir2
guppy upload --only subdir2/ --assume-unchanged-sources

# Final run: Complete the upload (builds root, calls upload/add)
guppy upload --assume-unchanged-sources
```

Each `--only` run:
1. **Scans** only the targeted subtree (walks FS from `subdir1/` instead of `"."`; skips things already scanned, if `--assume-unchanged-sources` is used)
2. **DAG-scans** only the files in that subtree (DAGScans created only for new FSEntries)
3. **Shards, indexes, and uploads** those nodes to the network
4. **Skips `upload/add`** because the root CID isn't known yet

The final run (no `--only`):
1. Scans from `"."` — `SkipEntry` (again, skips things already scanned, if `--assume-unchanged-sources` is used)
2. Creates FSEntries/DAGScans only for the root directory and any new top-level files
3. Root directory's DAGScan completes because `HasIncompleteChildren` finds all children's DAGScans already have CIDs from previous runs
4. Shards, indexes, and uploads only the new node(s)
5. Calls `upload/add` with ALL shards (from all runs — they're all the same upload)

Note that `--assume-unchanged-sources` is optional in each case, but useful if there's a lot of data to scan, which is generally when you'd use this.

<details><summary>Implementation details from Claude's plan</summary>

(I haven't closely reviewed this yet, it's mostly here to keep track of it. Feel free to ignore it while considering the proposal.)

## Implementation

### 1. Add `--only <path>` flag to upload command

**File:** `cmd/upload/root.go` (or wherever `upload start` is defined)

Add a `--only` string flag that specifies a subdirectory path relative to the source root. Pass it through to the upload execution.

### 2. Scope the scan to the subtree

**File:** `pkg/preparation/scans/scans.go`

Change `executeScan` to accept an optional `subtree` path. Instead of always passing `"."` to `WalkDir`, pass the subtree path:

```go
func (a API) executeScan(ctx context.Context, upload *uploadmodel.Upload, subtree string, fsEntryCb func(model.FSEntry) error) (model.FSEntry, error) {
    fsys, err := a.SourceAccessor(ctx, upload.SourceID())
    root := "."
    if subtree != "" {
        root = subtree
    }
    fsEntry, err := a.WalkerFn(fsys, root, visitor.NewScanVisitor(...))
    return fsEntry, nil
}
```

**Key detail:** When `subtree` is set, `ExecuteScan` should **not** set `rootFSEntryID` on the upload, because the returned FSEntry is for the subtree root, not the source root. Only set `rootFSEntryID` when doing a full scan (no `--only`).

### 3. Skip `upload/add` when root CID is not set

**File:** `pkg/preparation/uploads/uploads.go` — `runPostProcessShardWorker` finalize

Currently `AddStorachaUploadForUpload` is always called in the finalize step. Change to skip it when `rootCID` is unset:

```go
// finalize
func() error {
    upload, err := api.Repo.GetUploadByID(ctx, uploadID)
    if err != nil { return err }
    if upload.RootCID() == cid.Undef {
        log.Infow("Skipping upload/add: root CID not yet set (partial upload)", "upload", uploadID)
        return nil
    }
    return api.AddStorachaUploadForUpload(ctx, uploadID, spaceDID)
}
```

### 4. Skip root CID finalization when subtree-only

**File:** `pkg/preparation/uploads/uploads.go` — `runDAGScanWorker` finalize

The finalize step currently sets the root CID by looking up `CIDForFSEntry(upload.RootFSEntryID())`. When `rootFSEntryID` is unset (partial run), skip this:

```go
// finalize
func() error {
    upload, err := api.Repo.GetUploadByID(ctx, uploadID)
    if !upload.HasRootFSEntryID() {
        log.Infow("Skipping root CID finalization: no root FS entry (partial upload)", "upload", uploadID)
        close(nodeUploadsAvailable)
        return nil
    }
    // ... existing root CID logic ...
}
```

### 5. Handle the `--assume-unchanged-sources` check

**File:** `pkg/preparation/uploads/uploads.go` — `runScanWorker`

Currently skips the entire scan if `HasRootFSEntryID()`. With `--only`, `rootFSEntryID` isn't set after partial runs, so the check already does the right thing — it runs the scan because there's no root entry yet.

On the final full run, `rootFSEntryID` is still unset (from partial runs), so the scan runs. Combined with `--assume-unchanged-sources`, `SkipEntry` skips already-scanned subdirectories.

**No changes needed here.**

### 6. Pass subtree path through the API

**Files:**
- `pkg/preparation/uploads/uploads.go` — `ExecuteUpload` and `API` struct need a `Subtree string` field
- `pkg/preparation/scans/scans.go` — `API` struct or `ExecuteScan` needs the subtree parameter
- Thread `--only` flag value from CLI → upload API → scan API

## Files to Modify

| File | Change |
|------|--------|
| `cmd/upload/root.go` | Add `--only` flag |
| `pkg/preparation/scans/scans.go` | Accept optional subtree path, pass to WalkDir |
| `pkg/preparation/uploads/uploads.go` | Thread subtree through; skip root CID finalization and upload/add for partial runs |
| `pkg/preparation/storacha/storacha.go` | (Maybe) Make `AddStorachaUploadForUpload` gracefully handle missing root CID |

## Edge Cases

- **Overlapping subtrees**: If user runs `--only a/` then `--only a/b/`, the second run would find existing FSEntries for `a/b/` and its contents via `FindOrCreate`. DAGScans already exist too. This is safe — duplicate creation is idempotent.
- **Subtree doesn't exist**: Walker would error on stat. Standard filesystem error handling.
- **Running final without all subtrees**: Works fine — `HasIncompleteChildren` would block the root directory's DAGScan until all children are complete. The user would need to run again after uploading missing subtrees. (Or: the pipeline would process whatever's complete and skip directories with incomplete children, just like today.)
- **Concurrent partial runs**: Two `--only` runs on different subtrees could run concurrently. Shard creation is already per-upload with open shard tracking. Concurrent runs could contend on the same open shard — this is an existing concern, not new. Could document as "run subtrees sequentially."

## Verification

1. Create a test directory: `mkdir -p testdata/{a,b,c}` with files in each
2. Add source: `guppy upload source add test ./testdata`
3. Run partial: `guppy upload start --only a/ --assume-unchanged-sources`
   - Verify: FSEntries and DAGScans created only for `a/`
   - Verify: Shards created and uploaded
   - Verify: No `upload/add` call (no root CID)
4. Run partial: `guppy upload start --only b/ --assume-unchanged-sources`
   - Same verifications
5. Run full: `guppy upload start --assume-unchanged-sources`
   - Verify: Only root dir and `c/` are scanned (a/ and b/ skipped)
   - Verify: Root CID is set
   - Verify: `upload/add` is called with shards from all runs
6. Run `make test`

## Depends On

- `feat/faster-scans` branch (for `SkipEntry` / per-entry scan skipping)
</details>

File	Change
`cmd/upload/root.go`	Add `--only` flag
`pkg/preparation/scans/scans.go`	Accept optional subtree path, pass to WalkDir
`pkg/preparation/uploads/uploads.go`	Thread subtree through; skip root CID finalization and upload/add for partial runs
`pkg/preparation/storacha/storacha.go`	(Maybe) Make `AddStorachaUploadForUpload` gracefully handle missing root CID

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to limit scope of an `upload` run #381

Context

How It Works

Implementation

1. Add `--only <path>` flag to upload command

2. Scope the scan to the subtree

3. Skip `upload/add` when root CID is not set

4. Skip root CID finalization when subtree-only

5. Handle the `--assume-unchanged-sources` check

6. Pass subtree path through the API

Files to Modify

Edge Cases

Verification

Depends On

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add an option to limit scope of an upload run #381

Description

Context

How It Works

Implementation

1. Add --only <path> flag to upload command

2. Scope the scan to the subtree

3. Skip upload/add when root CID is not set

4. Skip root CID finalization when subtree-only

5. Handle the --assume-unchanged-sources check

6. Pass subtree path through the API

Files to Modify

Edge Cases

Verification

Depends On

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add an option to limit scope of an `upload` run #381

1. Add `--only <path>` flag to upload command

3. Skip `upload/add` when root CID is not set

5. Handle the `--assume-unchanged-sources` check