Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

backuitist · 2025-03-03T16:01:05Z

backuitist
Mar 3, 2025

I really wanted to like Key_Shared but I keep bumping into limitations, the last one being so severe that I'm wondering if this feature was ever designed for such a use case?

Use case is tens of thousands of IoT devices sending a reasonably small amount of telemetries (say 1 per second) that need to be statefully processed. On the publishing side, we have a handful of machines receiving the telemetries and dumping that onto pulsar using batching. To be able to consume using Key_Shared, batches must be produced using the KEY_BASED batching strategy, meaning that instead of having loads of telemetries per batch I end up with basically no batching - key is device ID and each device produces a mere 1 msg/sec... Since the absence of batching adds a very significant load on the cluster (broker & bookies) I'm wondering if this is just not the appropriate use case for the feature? Or... am I doing something wrong?

lhotari · 2025-03-13T17:28:03Z

lhotari
Mar 13, 2025
Collaborator

Use case is tens of thousands of IoT devices sending a reasonably small amount of telemetries (say 1 per second) that need to be statefully processed. On the publishing side, we have a handful of machines receiving the telemetries and dumping that onto pulsar using batching. To be able to consume using Key_Shared, batches must be produced using the KEY_BASED batching strategy, meaning that instead of having loads of telemetries per batch I end up with basically no batching - key is device ID and each device produces a mere 1 msg/sec... Since the absence of batching adds a very significant load on the cluster (broker & bookies) I'm wondering if this is just not the appropriate use case for the feature? Or... am I doing something wrong?

You are right, that it's not optimal when batching isn't used. One possible solution is to calculate an intermediate sharding key to reduce the cardinality significantly so that batching would happen. A gut feeling is that there would have to be fairly low number of these sharding keys so that it would actually be helpful in enabling batching.

For such high volume use cases, it could be better to use partitioned topics with failover consumers, and then have a single consumer per partition. (For strict ordering, it might be necessary to use exclusive subscription to workaround #15189)

If you happen to be using Reactive Spring & Spring Pulsar Reactive, there's an alternative for Key_Shared subscriptions which works with Failover subscriptions, retaining key order.
Reference of Reactive Message Consumption with Spring Pulsar:

However, when handling messages one-by-one, concurrency can be specified to increase processing throughput. Simply set the concurrency property on @ReactivePulsarListener. Additionally, when concurrency > 1 you can ensure messages are ordered by key and therefore sent to the same handler by setting useKeyOrderedProcessing = "true" on the annotation.

In pulsar-client-reactive ReactiveMessagePipelineBuilder, it's this option:
https://github.com/apache/pulsar-client-reactive/blob/a93b096a10b9451eaec2046b346e[…]/pulsar/reactive/client/api/ReactiveMessagePipelineBuilder.java

Under the covers it uses Project Reactor's groupBy operator to ensure that processing happens in key-order. The benefit over plain failover subscription is that you have tunable concurrency level, which can go to 100s with low resource consumption. This actually better than Key_Shared subscriptions for certain use cases where processing itself is not costly but most of the work is performed in external API backends. That is the sweet spot for Reactive Spring and Spring Pulsar Reactive / pulsar-client-reactive. There are examples in https://github.com/lhotari/reactive-iot-backend-ApacheCon2021 and https://github.com/lhotari/reactive-pulsar-showcase of using Pulsar Reactive Client.

10 replies

backuitist Mar 26, 2025
Author

Thanks, I actually came up with that complex design, using readers and key hash ranges. That said, I haven't tested this strategy thoroughly just yet, do you foresee any major road block (esp. w.r.t. key hash ranges)?

lhotari Mar 26, 2025
Collaborator

Thanks, I actually came up with that complex design, using readers and key hash ranges. That said, I haven't tested this strategy thoroughly just yet, do you foresee any major road block (esp. w.r.t. key hash ranges)?

That wouldn't solve the batching inefficiency issue which you brought up in the question. Replacing consumers with readers is one potentially a solution in certain cases where consuming state is not handled by Pulsar at all. However, that has many drawbacks regarding message retention and backlog handling.

The Kubernetes Statefulset solution would also work with the STICKY mode where key hash ranges are assigned based on pod index and number of pods in the stateful set. However, that's a totally different matter.

backuitist Mar 26, 2025
Author

I might have missed something, but do I have to keep the KEY_BASED batching strategy when using key hash ranges?

lhotari Mar 26, 2025
Collaborator

@backuitist Yes, KEY_BASED batching strategy is required when using key hash ranges.

backuitist Mar 26, 2025
Author

Ouch... ok, this could be perhaps documented better... I'll try to submit a PR for it.

However, that has many drawbacks regarding message retention and backlog handling.

Are there other drawbacks beside having to configure data retention appropriately?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 10 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is Key_Shared suitable for stateful stream processing of small throughput and high cardinality #24044

Uh oh!

backuitist Mar 3, 2025

Replies: 1 comment · 10 replies

Uh oh!

Uh oh!

lhotari Mar 13, 2025 Collaborator

Uh oh!

backuitist Mar 26, 2025 Author

Uh oh!

lhotari Mar 26, 2025 Collaborator

Uh oh!

backuitist Mar 26, 2025 Author

Uh oh!

lhotari Mar 26, 2025 Collaborator

Uh oh!

backuitist Mar 26, 2025 Author

backuitist
Mar 3, 2025

Replies: 1 comment 10 replies

lhotari
Mar 13, 2025
Collaborator

backuitist Mar 26, 2025
Author

lhotari Mar 26, 2025
Collaborator

backuitist Mar 26, 2025
Author

lhotari Mar 26, 2025
Collaborator

backuitist Mar 26, 2025
Author