feat: add receipts to the EgressTrackingService on retrieval#258
Conversation
…al-egress-tracking
frrist
left a comment
There was a problem hiding this comment.
I'd like to see us move the record keeping of receipts closer to where they are produced. wdyt?
| } | ||
|
|
||
| func withReceiptLogger(ets *egresstracking.EgressTrackingService) ucanretrieval.Option { | ||
| return ucanretrieval.WithReceiptLogger(func(ctx context.Context, rcpt receipt.AnyReceipt, inv invocation.Invocation) { |
There was a problem hiding this comment.
While this option is convenient I am unsure this is the right usage for it. Would we be open to performing this record keeping in the ucan handler instead? i.e. https://github.com/storacha/piri/blob/main/pkg/service/retrieval/ucan/space_content_retrieve.go#L32
This would provide us more control of which receipts get added to the batch. Based on my read of things right now, we'll be appending all receipts produced for a retrieval, which includes failures?
There was a problem hiding this comment.
I agree it is a bit weird this logic is so far from the actual handler. However, the issue is that receipts are not issued in the handlers, but in the ucanto server, i.e. handlers don't have access to them.
I can assure you this is exactly the right usage for WithReceiptLogger because I implemented it specifically for this 😄. I agree that appending failure receipts might not be great. We can probably pass more information to the receipt logging function from the ucanto server so that it can make better decisions. I'll revisit the current implementation.
There was a problem hiding this comment.
I thought this was basically a receipt?
https://github.com/storacha/piri/blob/main/pkg/service/retrieval/ucan/space_content_retrieve.go#L88
There was a problem hiding this comment.
oh, that is the result, which is part of the receipt. But the receipt contains some other stuff required for the UCAN flow (see https://github.com/storacha/go-ucanto/blob/main/core/receipt/datamodel/receipt.ipldsch).
The handler handles the request and produces a result, and the ucanto server takes care of issuing a proper receipt by adding the other bits, do the signing, etc. If we want to get the receipt as it is sent to the client, we need to hijack the server process to fetch it, and this is what I did here by letting the user pass a callback that the server invokes right before sending the receipt back to the client.
| if err := ets.AddReceipt(context.Background(), retrievalRcpt); err != nil { | ||
| log.Errorw("adding receipt to egress tracking service", "error", err) | ||
| return | ||
| } | ||
| }() |
There was a problem hiding this comment.
I liked this better when we were passing the context from the caller here 😄 can we add it back?
There was a problem hiding this comment.
I know, right? The issue is that the request context is cancelled once the request is fully handled. Since this is done asynchronously near the end of the request flow, if adding the receipt causes a new egress track task to be added to the job queue, the context it gets is likely to be cancelled already when the worker gets to it. During my testing I got "context cancelled" errors from the tasks in the queue.
All in all, I think it makes sense for this to not be tied to the request context, as it is a somewhat independent action. Also, since the task will be retried a number of times and then dropped I didn't see very valuable to set a timeout. WDYT?
There was a problem hiding this comment.
My concern is that spawning a goroutine per request with context.Background() effectively turns the Go scheduler into an unbounded queue. Under load this can create memory pressure, especially since there’s no back-pressure on the caller and AddReceipt does disk I/O.
Because net/http already runs each request in its own goroutine, I think it’s reasonable to block on recording the receipt. If we serve the data without recording it, the operator doesn’t get compensated, so it may be better not to serve it at all. If AddReceipt ever proves too heavy, the safer alternative would be a bounded worker pool.
Not blocking, just leaving this note for future reference.
There was a problem hiding this comment.
I agree with your concerns. A bounded pool or any other rate limiting mechanism would be the way to ensure there is some back-pressure on clients, but happy to remove the goroutine in the meantime.
About making the request fail if adding the receipt fails, there is a problem with the current implementation. AddReceipt enqueues a task in a jobqueue if there is a new batch to send, so this is happening asynchronously. If we think the client should not get its content delivered if recording the receipt fails (which is a fair point), then we shouldn't use the jobqueue and just implement a synchronous version of the task.
I created #263 to help keeping this in mind.
There was a problem hiding this comment.
For now lets remove the go routine.
As far as receipt recording failures:
- If the node is unable to
Appenda receipt to the store, due to lack of space, failure to rotate, etc. that's when I'd want to abort serving the data of the request. Since the node is unable to record a record of the content it served for payment. - Failing to report a batch to the egress service is also not ideal, but I view as a less critical error since a record of the content served will still exist on the filesystem, and manual intervention could be taken to "re-report" it for payment.
| if err := ets.AddReceipt(context.Background(), retrievalRcpt); err != nil { | ||
| log.Errorw("adding receipt to egress tracking service", "error", err) | ||
| return | ||
| } | ||
| }() |
There was a problem hiding this comment.
My concern is that spawning a goroutine per request with context.Background() effectively turns the Go scheduler into an unbounded queue. Under load this can create memory pressure, especially since there’s no back-pressure on the caller and AddReceipt does disk I/O.
Because net/http already runs each request in its own goroutine, I think it’s reasonable to block on recording the receipt. If we serve the data without recording it, the operator doesn’t get compensated, so it may be better not to serve it at all. If AddReceipt ever proves too heavy, the safer alternative would be a bounded worker pool.
Not blocking, just leaving this note for future reference.
d9298af
into
vic/feat/egress-tracking-service
Ref: #174 Implement an `EgressBatchStore` that allows appending `space/content/retrieve` receipts to a batch. When the batch is above the max, it is flushed automatically. Things that are missing and will be implemented in follow up PRs: - sending the batch to the egress tracking service in a `space/egress/track` invocation - exposing an `http.FileSystem` and adding the corresponding `http.FileServer` endpoint Once these are done, I'll hook it up in Alan's `space/content/retrieve` handler. --------- Co-authored-by: ash <alan@storacha.network>
Ref: #174
Depends on: #245 and #246
Attach a ReceiptLogger to the
space/content/retrievehandler to collect retrieval receipts and add them to the EgressTrackingService to be stored, batched and sent inspace/egress/trackinvocations.