Replica Transfer Request Failure Cases #214

frrist · 2025-09-03T19:14:24Z

frrist
Sep 3, 2025
Maintainer

When Piri receives a UCAN invocation to allocate a replica, here is what happens:

A transfer invocation and the node creates an async task to allocate a replica:

piri/pkg/service/storage/ucan/replica_allocate.go

Lines 128 to 137 in c50ea6b

    
           if err := storageService.Replicator().Replicate(ctx, &replicahandler.TransferRequest{ 
        
           	Space:  cap.Nb().Space, 
        
           	Blob:   cap.Nb().Blob, 
        
           	Source: replicaAddress, 
        
           	Sink:   sink, 
        
           	Cause:  trnsfInv, 
        
           }); err != nil { 
        
           	return nil, nil, fmt.Errorf("failed to enqueue replication task: %w", err) 
        
           }

The transfer request is enqueued into a replication job queue:

piri/pkg/service/replicator/replicate.go

Lines 86 to 88 in c50ea6b

    
           func (r *Service) Replicate(ctx context.Context, task *replicahandler.TransferRequest) error { 
        
           	return r.queue.Enqueue(ctx, "transfer-task", task) 
        
           }

2.a. Note, this queue is configured to re-try the request up to 10 times before giving up:

piri/pkg/service/replicator/replicate.go

Line 63 in c50ea6b

jobqueue.WithMaxRetries(10),

2.b. This queue will allow as many parallel transfer requests as the node has CPUs:

piri/pkg/service/replicator/replicate.go

Line 64 in c50ea6b

jobqueue.WithMaxWorkers(uint(runtime.NumCPU())),

When the the node has capacity to execute the transfer request the Transfer method is called:

piri/pkg/service/replicator/replicate.go

Lines 69 to 82 in c50ea6b

    
           if err := replicationQueue.Register("transfer-task", func(ctx context.Context, request *replicahandler.TransferRequest) error { 
        
           	return replicahandler.Transfer(ctx, 
        
           		&adapter{ 
        
           			id:         id, 
        
           			pdp:        p, 
        
           			blobs:      b, 
        
           			claims:     c, 
        
           			receipts:   rstore, 
        
           			uploadConn: uploadConn, 
        
           		}, 
        
           		request) 
        
           }); err != nil { 
        
           	return nil, err 
        
           }

The transfer method pulls the data from the "Source" node it is replicating the data from:

piri/pkg/service/storage/handlers/replica/transfer.go

Lines 135 to 172 in c50ea6b

    
           if request.Sink != nil { 
        
           	replicaResp, err := http.Get(request.Source.String()) 
        
           	if err != nil { 
        
           		return fmt.Errorf("http get replication source (%s) failed: %w", request.Source.String(), err) 
        
           	} 
        
           	// stream the source to the sink 
        
           	req, err := http.NewRequest(http.MethodPut, request.Sink.String(), replicaResp.Body) 
        
           	if err != nil { 
        
           		return fmt.Errorf("failed to create replication sink request: %w", err) 
        
           	} 
        
           	req.Header = replicaResp.Header 
        
           	res, err := http.DefaultClient.Do(req) 
        
           	if err != nil { 
        
           		return fmt.Errorf( 
        
           			"failed http PUT to replicate blob %s from %s to %s failed: %w", 
        
           			request.Blob.Digest, 
        
           			request.Source.String(), 
        
           			request.Sink.String(), 
        
           			err, 
        
           		) 
        
           	} 
        
           	// verify status codes 
        
           	if res.StatusCode >= 300 || res.StatusCode < 200 { 
        
           		topErr := fmt.Errorf( 
        
           			"unsuccessful http PUT to replicate blob %s from %s to %s status code %d", 
        
           			request.Blob.Digest, 
        
           			request.Source.String(), 
        
           			request.Sink.String(), 
        
           			res.StatusCode, 
        
           		) 
        
           		resData, err := io.ReadAll(res.Body) 
        
           		if err != nil { 
        
           			return fmt.Errorf("%s failed to read replication sink response body: %w", topErr, err) 
        
           		} 
        
           		return fmt.Errorf("%s response body: %s: %w", topErr, resData, err) 
        
           	} 
        
           }

4.a. If any part of this transfer fails fails due to network issues, Transfer returns an error, requeueing the transfer task. This is good, we want to make sure we pull the data we are supposed to replicate.

Once the transfer of data is complete, the node creates a receipt for the transfer invocation, and fires it off to the upload service:

piri/pkg/service/storage/handlers/replica/transfer.go

Lines 174 to 235 in c50ea6b

    
           acceptResp, err := blobhandler.Accept(ctx, service, &blobhandler.AcceptRequest{ 
        
           	Space: request.Space, 
        
           	Blob:  request.Blob, 
        
           	Put: blob.Promise{ 
        
           		UcanAwait: blob.Await{ 
        
           			Selector: ".out.ok", 
        
           			Link:     request.Cause.Link(), 
        
           		}, 
        
           	}, 
        
           }) 
        
           if err != nil { 
        
           	return fmt.Errorf("failed to accept replication source blob %s: %w", request.Blob.Digest, err) 
        
           } 
        
           res := replica.TransferOk{ 
        
           	Site: acceptResp.Claim.Link(), 
        
           } 
        
           forks := []fx.Effect{fx.FromInvocation(acceptResp.Claim)} 
        
           if acceptResp.PDP != nil { 
        
           	forks = append(forks, fx.FromInvocation(acceptResp.PDP)) 
        
           	tmp := acceptResp.PDP.Link() 
        
           	res.PDP = &tmp 
        
           } 
        
           ok := result.Ok[replica.TransferOk, ipld.Builder](res) 
        
           var opts []receipt.Option 
        
           if len(forks) > 0 { 
        
           	opts = append(opts, receipt.WithFork(forks...)) 
        
           } 
        
           rcpt, err := receipt.Issue(service.ID(), ok, ran.FromInvocation(request.Cause), opts...) 
        
           if err != nil { 
        
           	return fmt.Errorf("issuing receipt: %w", err) 
        
           } 
        
           if err := service.Receipts().Put(ctx, rcpt); err != nil { 
        
           	return fmt.Errorf("failed to put transfer receipt: %w", err) 
        
           } 
        
           msg, err := message.Build([]invocation.Invocation{request.Cause}, []receipt.AnyReceipt{rcpt}) 
        
           if err != nil { 
        
           	return fmt.Errorf("building message for receipt failed: %w", err) 
        
           } 
        
           uploadServiceRequest, err := service.UploadConnection().Codec().Encode(msg) 
        
           if err != nil { 
        
           	return fmt.Errorf("failed to encode message for receipt to http request: %w", err) 
        
           } 
        
           uploadServiceResponse, err := service.UploadConnection().Channel().Request(ctx, uploadServiceRequest) 
        
           if err != nil { 
        
           	return fmt.Errorf("failed to send request for receipt: %w", err) 
        
           } 
        
           if uploadServiceResponse.Status() >= 300 || uploadServiceResponse.Status() < 200 { 
        
           	topErr := fmt.Errorf("unsuccessful http POST to upload service") 
        
           	resData, err := io.ReadAll(uploadServiceResponse.Body()) 
        
           	if err != nil { 
        
           		return fmt.Errorf("%s failed to read replication sink response body: %w", topErr, err) 
        
           	} 
        
           	return fmt.Errorf("%s response body: %s: %w", topErr, resData, err) 
        
           } 
        
           return nil

Problems arise at step 5.
Once the node successfully performs the transfer, and accepts (blob/accept) the piece is now stored in Piri. However, if a failure occurs while accepting the blob, or issuing a receipt to the upload service, and error is returned, and the entire transfer operation starts again. This can lead to multiple redundant transfers occurring when failures post-transfer occur. We actually saw this play out in our replication tests, some nodes repeatedly allocated the same blob multiple times when communicating with the upload service.
This is a classic idempotency problem. The transfer operation could instead be made idempotent. That way, if the data has already been transferred successfully, we shouldn't transfer it again just because the receipt delivery failed, or the blob accept operation failed.

I think there are (at least) two issue here:

blobhandler.Accept is not idempotent, when called N times with the same blob it will create N allocations - I believe this is intended by design.
Transfers (moving the bytes between nodes) may occur multiple times during failure cases.

Together these issues compound into a larger problem, e.g.

First attempt: Transfer succeeds, Accept succeeds, but receipt sending fails
Retry:
- Transfer happens AGAIN (wasteful but succeeds)
- Accept is called AGAIN (might fail or create duplicates)
- Receipt sending is attempted again

The goal here is to discuss solutions to these issues.

Should blobhandler.Accept be made idempotent, such that it doesn't actually "store" the same blob twice?
How can the entire Transfer operation be made idempotent, such that it doesn't perform repeated transfers between nodes, and don't allocate blobs that have already been allocated?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replica Transfer Request Failure Cases #214

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Replica Transfer Request Failure Cases #214

Uh oh!

frrist Sep 3, 2025 Maintainer

Replies: 0 comments

frrist
Sep 3, 2025
Maintainer