-
Notifications
You must be signed in to change notification settings - Fork 8
Description
DealBot should test the new SP "pull" pathway: the flow where an SP fetches a piece from an external URL rather than receiving it via direct upload. This is a critical path for multi-copy durability (where a secondary SP pulls from the primary or another source) and needs validation alongside existing data storage checks.
Background
The current data storage check tests the direct upload path: DealBot uploads a piece to an SP, then verifies onchain confirmation, discoverability, and retrieval. But the SDK's multi-copy durability flow uses a different Curio pathway where the SP is told to pull a piece from a URL. This path has distinct failure modes and performance characteristics that aren't currently covered.
Curio Pull Implementation
Curio exposes POST /pdp/piece/pull (see curio/pdp/handlers_pull.go). Key details:
Request format:
{
"extraData": "0x...",
"dataSetId": 0,
"recordKeeper": "0x...",
"pieces": [
{
"pieceCid": "baga...",
"sourceUrl": "https://example.com/api/piece/baga..."
}
]
}Synapse should have utilities for this (soon):
- feat(synapse-core): add SP-to-SP piece pull functionality synapse-sdk#544: the synapse-core piece to interact with this API
- feat(storage): multi-copy upload with store->pull->commit flow synapse-sdk#593: the rest of the picture which also adds
pull()toStorageContext, wrapping it all up
There is some additional source URL validation (curio/pdp/pull_types.go):
- Path must end in
/piece/{pieceCid} - CID in path must match the
pieceCidfield in the request
The API returns per-piece status with a progression of: pending => inProgress => retrying => complete => failed
You can poll the API by repeating the same POST /pdp/piece/pull request (it's idempotent via (service, sha256(extraData), dataSetId, recordKeeper) key). The SP returns current status for each piece. So you can hit it repeatedly and wait for success. This is built in to synapse-core with a "waitFor" function.
Pull task mechanics (curio/tasks/pdp/task_pull_piece.go):
- Background task polls every 10s for pending items
- Downloads piece from source URL
- Computes CommP and verifies against expected CID
- Stores in StashStore, creates parked_pieces entry, StorePiece task then moves to "long-term" storage
- Max 5 retries with exponential backoff (10s base, 2x factor, 5min cap)
- 1 hour download timeout
Discrete Testable Units
The pull flow decomposes into three independently testable operations:
- SP can receive + park an uploaded piece: already tested by data storage check
- SP can pull from a URL + park: new, this issue
- SP can do add-pieces on chain for a parked piece: already tested by data storage check
Unit 2 is the gap. Once a piece is parked via pull, the add-pieces flow is identical to direct upload.
Possible Approach: DealBot-Hosted Piece Endpoint
(From chat with @SgtPooki) DealBot hosts a temporary piece retrieval endpoint on the existing backend.
- Serve via
/api/piece/{pieceCid}. This routes through Caddy to the Node backend already (the Caddy container can't dynamically serve new assets, but/api/*
forwards to the backend) - Generate the random piece, convert to CAR, compute pieceCID
- Enable the endpoint for a specific pieceCID, limited time
- Tell the SP to pull from https://dealbot.filoz.org/api/piece/{pieceCid}
- Poll the SP for status until complete or failed
- Clean up the endpoint + stored piece data after confirmation
Lastly we want to verify the piece is actually there: this is tricky because the SP can claim it has the piece but do we believe it? We have two options: either download the piece back from the SP and check that the bytes are what we want (hash or just byte compare to original), or just by proceeding to AddPieces and make the SP prove it we get a level of assurance that's probably acceptable.
An alternative flow exists where we could use an existing SP and do a proper SP-to-SP pull. We don't even need to do an AddPieces, we just use a second SP as a staging ground and expect it to GC the piece. But this has a few problems:
- Ambiguity: if pull fails, the SP is the problem, not the source
- No dependency on another SP being healthy
- Tighter metrics: DealBot could measure time-to-first-byte fetched and throughput from its own side
Metrics
We coudl add new metrics for pull checks, some ideas:
pullRequestMs: DealBot => SP pull request latency (initial POSTpullCompletionMs: Pull request => SP reports completepullFirstByteMs: Time from SP connecting to DealBot endpoint to first byte served (if DealBot-hosted)pullThroughputBps: Bytes served / time (if DealBot-hosted)pullStatus: Counter with value label:complete,failed,retrying(would need deeper control than the higher level Synapse APIs)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status