-
Notifications
You must be signed in to change notification settings - Fork 240
Operation handling for reliable receipts #1637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
New callback that can be called to insert a number of operation updates reliably Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
|
Just thought of something different, instead of using a |
pkg/core/operation.go
Outdated
|
|
||
| type OperationCallbacks interface { | ||
| OperationUpdate(update *OperationUpdate) | ||
| BulkOperationUpdates(ctx context.Context, updates []*OperationUpdate, onCommit chan<- error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing a channel might be a little sticky. We've gone back and forth on this a few times in other code interfaces across the FireFly ecosystem.
My feeling is a callback function is much easier to pass in, and you then leave the caller the responsibility to decide if they want that callback to use a channel or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! #1637 (comment) I got to the same conclusion after reading more code recently and that is what I'm reworking 💪🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - I think maybe with the discussion below around making the interface synchronous, this might become a moot point
internal/operations/manager.go
Outdated
| } | ||
| log.L(om.ctx).Debugf("%s updating operation %s status=%s%s", update.Plugin, update.NamespacedOpID, update.Status, errString) | ||
| } | ||
| om.updater.SubmitBulkOperationUpdates(ctx, updates, onCommit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it that absolutely guarantees that this list of receipts in the order they were submitted, will be committed before any other receipts submitted to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very insightful, I realize that today that is up to the consumer to call this function only once at a time so this function should handle the case and not allow that to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - with the other discussions to make it a synchronous interface, I think you could just handle this as a comment on the API doc:
An insertion ordering guarantee is only provided when this code is called on a single goroutine inside of the connector. It is the responsibility of the connector code to allocate that routine, and ensure that there is only one.
| // Notice how this is not using the workers | ||
| // The reason is because we want for all updates to be stored at once in this order | ||
| // If offloaded into workers the updates would be processed in parallel, in different DB TX and in a different order | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is the fundamental question of design in this PR, but I'm unclear currently the way this solves it.
This go routine just gets fired off - who's responsibility is it, and exactly how, to do ordering against the API in a way that means future batches are extremely strictly ordered against completion of this batch.
Fundamentally - isn't this over complicated? Shouldn't we just make it synchronous and use the goroutine of the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about making it synchronous and wanted the connector to be able to wait for a notification back but you are right if we are treating this API as a Persist and wait until done in the connector this should absolutely be synchronous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that could be a big simplification overall to this. Push the responsibility to the connector to:
- Manage the goroutine that is the receipt writer
- Perform suitable performance batching into this code
- Handle all errors/retries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that simplifies massively this code to a synchronous bulk write of receipts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a caveat to the above with is doBatchUpdateWithRetry retries indefinitely, I think we need to break this if we are making it synchronous
| } | ||
| } else { | ||
| // Less than the max inserts, just do it in one go | ||
| err := ou.doBatchUpdateWithRetry(ctx, updates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make it syncrhonous you do not need retry here - the caller can be informed it is their responsibility to retry
|
|
||
| // This retries forever until there is no error | ||
| // but returns on cancelled context | ||
| if len(updates) > ou.conf.maxInserts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can choose to push this responsibility back to the connector. I don't think you need to try and do sub-pagination inside of this implementation.
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
5acdf0e to
283dce4
Compare
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1637 +/- ##
=======================================
Coverage 99.95% 99.95%
=======================================
Files 337 337
Lines 29524 29590 +66
=======================================
+ Hits 29512 29578 +66
Misses 8 8
Partials 4 4 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
|
@peterbroadhurst This is good to re-review. Test coverage should be good as well. There was a slight doubt on if this function should restrict and validate the Plugin name inside each operation update, I left it up to the connector plugin to put the right one but I might add validation there or make it pass a |
| return err | ||
| } | ||
| for _, update := range updates { | ||
| if update.OnComplete != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this in practice now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled with this one, it's still available in the struct core.OperationUpdate but it doesn't serve any purpose as we are returning synchronously so the top connector go routine can do what it needs when the updates have been committed.
So we can remove but we need to leave it for the existing flow and just comment on this flow that it will not be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why @EnriqueL8 ?
Because you think there's other stuff on core.OperationUpdate that needs to be on this batch API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the fields in core.OperationUpdate are needed in this batch API so yes we could create a new type that has that subset of fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @peterbroadhurst for the nudge and suggestion - move to OperationUpdateAsync and OperationUpdate
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
Signed-off-by: Enrique Lacal <enrique.lacal@kaleido.io>
peterbroadhurst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EnriqueL8 - looks great, and appreciate you going round the loop with me on this
Proposed changes
Add a new function in the Operation Handler called
BulkOperationUpdatesthat takes in a series of OperationUpdate and will notify the submitter when the TX transaction has been committed to the database.This is important for blockchain plugins that want to preserve ordering of receipts and updating the operation before acknowledging to the blockchain connector that FireFly has accepted those receipts.
Reasons why I went this way:
idempotency keyis the Operation ID so correlation is easy in the blockchain plugin between the receipt and the operationKey part of the code to looks at
doBatchUpdateWithRetryruns in aretry.Doand in a DB RunAsGroup, it retries forever seefirefly/internal/operations/operation_updater.go
Lines 184 to 199 in c7e6058
Update worker get picked based on operation ID here
firefly/internal/operations/operation_updater.go
Line 90 in c7e6058
Workers call
doBatchUpdateWithRetryafter timeout or reached a limit to writefirefly/internal/operations/operation_updater.go
Lines 172 to 180 in c7e6058
Contributes to #1622
Example Blockchain plugin code that has a durable websocket communication:
Types of changes
Please make sure to follow these points
< issue name >egAdded links in the documentation.