Add pull flow testing to validate SP piece retrieval from external sources

DealBot should test the new SP "pull" pathway: the flow where an SP fetches a piece from an external URL rather than receiving it via direct upload. This is a critical path for multi-copy durability (where a secondary SP pulls from the primary or another source) and needs validation alongside existing data storage checks.

## Background

The current data storage check tests the direct upload path: DealBot uploads a piece to an SP, then verifies onchain confirmation, discoverability, and retrieval. But the SDK's multi-copy durability flow uses a different Curio pathway where the SP is told to pull a piece from a URL. This path has distinct failure modes and performance characteristics that aren't currently covered.

## Curio Pull Implementation

Curio exposes `POST /pdp/piece/pull` (see [curio/pdp/handlers_pull.go](https://github.com/filecoin-project/curio/blob/pdpv0/pdp/handlers_pull.go)). Key details:

Request format:

```json
{
  "extraData": "0x...",
  "dataSetId": 0,
  "recordKeeper": "0x...",
  "pieces": [
    {
      "pieceCid": "baga...",
      "sourceUrl": "https://example.com/api/piece/baga..."
    }
  ]
}
```

Synapse should have utilities for this (soon):
* https://github.com/FilOzone/synapse-sdk/pull/544: the synapse-core piece to interact with this API
* https://github.com/FilOzone/synapse-sdk/pull/593: the rest of the picture which also adds `pull()` to `StorageContext`, wrapping it all up

There is some additional source URL validation ([curio/pdp/pull_types.go](https://github.com/filecoin-project/curio/blob/pdpv0/pdp/pull_types.go)):
- Path must _end_ in `/piece/{pieceCid}`
- CID in path must match the `pieceCid` field in the request

The API returns per-piece status with a progression of: pending => inProgress => retrying => complete => failed

You can poll the API by repeating the same `POST /pdp/piece/pull` request (it's idempotent via `(service, sha256(extraData), dataSetId, recordKeeper)` key). The SP returns current status for each piece. So you can hit it repeatedly and wait for success. This is built in to synapse-core with a "waitFor" function.

Pull task mechanics ([curio/tasks/pdp/task_pull_piece.go](https://github.com/filecoin-project/curio/blob/pdpv0/tasks/pdp/task_pull_piece.go)):
- Background task polls every 10s for pending items
- Downloads piece from source URL
- Computes CommP and verifies against expected CID
- Stores in StashStore, creates parked_pieces entry, StorePiece task then moves to "long-term" storage
- Max 5 retries with exponential backoff (10s base, 2x factor, 5min cap)
- 1 hour download timeout

## Discrete Testable Units

The pull flow decomposes into three independently testable operations:

1. SP can receive + park an uploaded piece: already tested by data storage check
2. SP can pull from a URL + park: new, this issue
3. SP can do add-pieces on chain for a parked piece: already tested by data storage check

Unit 2 is the gap. Once a piece is parked via pull, the add-pieces flow is identical to direct upload.

## Possible Approach: DealBot-Hosted Piece Endpoint

(From chat with @SgtPooki) DealBot hosts a temporary piece retrieval endpoint on the existing backend.

- Serve via `/api/piece/{pieceCid}`. This routes through Caddy to the Node backend already (the Caddy container can't dynamically serve new assets, but `/api/*`
forwards to the backend)
- Generate the random piece, convert to CAR, compute pieceCID
- Enable the endpoint for a specific pieceCID, limited time
- Tell the SP to pull from https://dealbot.filoz.org/api/piece/{pieceCid}
- Poll the SP for status until complete or failed
- Clean up the endpoint + stored piece data after confirmation


Lastly we want to verify the piece is actually there: this is tricky because the SP can claim it has the piece but do we believe it? We have two options: either download the piece back from the SP and check that the bytes are what we want (hash or just byte compare to original), or just by proceeding to AddPieces and make the SP prove it we get a level of assurance that's probably acceptable.

An alternative flow exists where we could use an existing SP and do a proper SP-to-SP pull. We don't even need to do an AddPieces, we just use a second SP as a staging ground and expect it to GC the piece. But this has a few problems:
- Ambiguity: if pull fails, the SP is the problem, not the source
- No dependency on another SP being healthy
- Tighter metrics: DealBot _could_ measure time-to-first-byte fetched and throughput from its own side

## Metrics

We coudl add new metrics for pull checks, some ideas:

* `pullRequestMs`: DealBot => SP pull request latency (initial POST
* `pullCompletionMs`: Pull request => SP reports complete
* `pullFirstByteMs`: Time from SP connecting to DealBot endpoint to first byte served (if DealBot-hosted)
* `pullThroughputBps`: Bytes served / time (if DealBot-hosted)
* `pullStatus`: Counter with value label: `complete`, `failed`, `retrying` (would need deeper control than the higher level Synapse APIs)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pull flow testing to validate SP piece retrieval from external sources #300

Background

Curio Pull Implementation

Discrete Testable Units

Possible Approach: DealBot-Hosted Piece Endpoint

Metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add pull flow testing to validate SP piece retrieval from external sources #300

Description

Background

Curio Pull Implementation

Discrete Testable Units

Possible Approach: DealBot-Hosted Piece Endpoint

Metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions