-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][pip] PIP-439: Adding Transaction Support to Pulsar Functions Through Managed Transaction Wrapping #24704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…s Through Auto-Transaction Wrapping
|
@wzhramc Please start a new thread on the dev mailing list about this proposal. You can reference the previous mailing list thread https://lists.apache.org/thread/rll8qyovpd7t9v5yxth25qo44zksbgkn in the new thread. |
…s Through Managed Transaction Wrapping
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a minor comment about pulsar_function_txn_latency metric, so that it would be disabled by default to reduce the volume and cardinality of the metrics.
|
One potential future improvement could be to add support for transaction batching. In transaction batching, multiple incoming messages would be processed within the same transaction due to performance reasons so the number of transactions can be reduced. There would need to be |
@wzhramc For high-scale use cases, it might be necessary to implement "transaction batching" so that unnecessary load isn't added to the broker (including the transaction coordinator) when enabling transactions. It might be useful to cover the "transaction batching" aspect directly in PIP-439 if you'd be ready to do that before proceeding to the vote. It's also possible to handle that later in another PIP. |
Waiting for feedback on the "transaction batching" aspect.
jiangpengcheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just have one question:
for an async function, and it will call the context.newOutputMessage method, how to make the newOutputMessage method to use the right transaction? shall we create a new method newOutputMessage and make it accept a transaction as an argument?
@jiangpengcheng The async function doesn't have access to the Transaction in managed transactions. I think that the context must have a reference so that any new messages would get added to the active transaction for that asynchronous call chain. |
I'm not really familiar with batch processing in Pulsar functions in general, so it would take a bit more time for me to figure it out... I would eventually need help with that as well. It would be nice if someone could help me out with this ^^' |
Transaction batching wouldn't be directly related to Pulsar batching. It's simply about using the same transaction to process multiple incoming messages. This would be necessary to achieve high performance without putting too much pressure on the transaction coordinator and adding too much latency to processing. We could proceed without having this in PIP-439 and come back to this after testing the performance. |
Ah! That makes a lot of sense! Sorry, I mixed it up with Pulsar batching. I can include it in the PIP, it would be pretty neat to have this right away. |
It might be tricky to implement directly, so a phased implementation could be useful in any case. One benefit of having it in this PIP is that we could find a way how to make the function configuration "future proof". The transaction batching feature makes sense only in the MANAGED mode. |
…s Through Managed Transaction Wrapping
@lhotari I've added a section and updated the configuration classes/definitions + metrics for Transaction Batching. I would appreciate it if you could review it when you have a chance |
… Managed Transaction Wrapping
… Through Managed Transaction Wrapping
I fixed the minor details in the latest commit. I'll start the PIP Voting now |
…s Through Managed Transaction Wrapping
Motivation
This PIP proposes extending Pulsar Functions with configuration-based automatic/managed transaction support to enable exactly-once processing semantics. Currently, Pulsar Functions cannot publish to multiple topics transactionally, which is a significant limitation for use cases requiring atomic multi-topic operations.
Does this pull request potentially affect one of the following parts:
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: