Kafka scaler on Azure Event Hub misbehaving #7309

krisnashaypp · 2025-12-11T10:52:45Z

krisnashaypp
Dec 11, 2025

Hi guys, I've been having issues with my scaler in Azure Container Apps.

I have an application that should scale between 0-3 based on lag, when lag is over 10 it should scale up once more. We are using Azure Event Hub on the kafka protocol and that works fine but the scaling itself isn't working as we want. There are 2 different issues that pop up.

scaleToZeroOnInvalidOffset true

Here it works most of the time but then seemingly from nowhere it stops scaling from 0 and we have to force the app to start and then it works again for a while

scaleToZeroOnInvalidOffset false

Here we get the reverse problem, it refuses to scale down to 0 and just keeps going even though there is no lag. Usually it sticks to 1 replica but I've also seen it get stuck on 3 once which could have been expensive if I hadn't been watching.

Consumer details

There is only a single application listening to this topic, the keda scaler is configured to use the same consumerGroup and topic of course, it reads in and commits messages just fine as far as we can see.

Would appreciate any suggestions on our settings, or if it's maybe a problem with Event Hub on Kafka protocol?
Thank you!

zroubalik · 2025-12-11T11:15:35Z

zroubalik
Dec 11, 2025
Maintainer

@dttung2905 any idea here?

0 replies

dttung2905 · 2025-12-11T21:40:20Z

dttung2905
Dec 11, 2025
Collaborator

Hi both, let me take a look at it. I think I might know where the problem lies but will need some time to validate my thought :D

0 replies

dttung2905 · 2025-12-13T13:40:30Z

dttung2905
Dec 13, 2025
Collaborator

Hi @krisnashaypp,

I've investigated the issue and found the root cause.

When Azure Event Hub intermittently doesn't return partition information in the offset response, the code was accessing partition offsets without checking if they exist, causing incorrect lag calculations and errors that break the entire scaling calculation.

Scenario 1 (scaleToZeroOnInvalidOffset: true): Missing partitions caused errors before the scaleToZeroOnInvalidOffset logic could execute.

Scenario 2 (scaleToZeroOnInvalidOffset: false): Missing partitions returned errors, breaking the entire lag calculation and preventing scale-to-zero.

The original code accessed topicPartitionOffsets[topic][partitionID] without checking if the partition exists:

if _, found := topicPartitionOffsets[topic]; !found {
    return 0, 0, fmt.Errorf("error finding partition offset for topic %s", topic)
}
latestOffset := topicPartitionOffsets[topic][partitionID]  //  No partition check!

When a partition is missing, this returns 0 (zero value) instead of detecting it as missing, causing incorrect lag calculations.

I will create a PR to fix that soon!

0 replies

krisnashaypp · 2025-12-15T07:20:47Z

krisnashaypp
Dec 15, 2025
Author

Hi @dttung2905 !

Thank you so much for your response, I'm happy to hear you found this. If and when you have time, I just had some hopefully small questions regarding this.

The issue is the code doesn't check if the partition exists as you say, but our partitions were created months ago in the Event Hub interface and haven't been touched so not sure how they could be "missing". Is this my poor understanding of Kafka and/or Event Hub internals and that is expected behaviour, for partitions to change behind the scenes?
On a normal kafka broker (say confluent kafka), would this issue still appear or is it due to Event Hub?
Since we're on Azure Container Apps, we sadly don't control which version of KEDA we are on (currently 2.17.2), if it is possible, can you guesstimate which release of KEDA this might go out in so I can keep an eye out? (Say a 2.17.3 or will it be on 2.18?)

Thank you again and happy holidays 🙂

1 reply

dttung2905 Dec 16, 2025
Collaborator

but our partitions were created months ago in the Event Hub interface and haven't been touched so not sure how they could be "missing". Is this my poor understanding of Kafka and/or Event Hub internals and that is expected behaviour, for partitions to change behind the scenes?

Yes the topic in your Event Hub can be created before and you are right about that. The issue (that I suspect) here is that the response we get back from Event Hub is blipping and missing some information sometimes which I think cause the wrong calculation

On a normal kafka broker (say confluent kafka), would this issue still appear or is it due to Event Hub?

There is still a chance if some of the info about some partitions is missing from the response payload.

Since we're on Azure Container Apps, we sadly don't control which version of KEDA we are on (currently 2.17.2), if it is possible, can you guesstimate which release of KEDA this might go out in so I can keep an eye out? (Say a 2.17.3 or will it be on 2.18?)

I have created a PR for this #7324 so you can follow it for more details. It will go out in the next release either 2.19 or 2.18.3. You can also test it out by running KEDA operator locally https://github.com/kedacore/keda/blob/main/BUILD.md or build the custom image from source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kafka scaler on Azure Event Hub misbehaving #7309

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Kafka scaler on Azure Event Hub misbehaving #7309

Uh oh!

krisnashaypp Dec 11, 2025

scaleToZeroOnInvalidOffset true

scaleToZeroOnInvalidOffset false

Consumer details

Replies: 4 comments · 1 reply

Uh oh!

zroubalik Dec 11, 2025 Maintainer

Uh oh!

dttung2905 Dec 11, 2025 Collaborator

Uh oh!

dttung2905 Dec 13, 2025 Collaborator

Uh oh!

krisnashaypp Dec 15, 2025 Author

Uh oh!

dttung2905 Dec 16, 2025 Collaborator

krisnashaypp
Dec 11, 2025

Replies: 4 comments 1 reply

zroubalik
Dec 11, 2025
Maintainer

dttung2905
Dec 11, 2025
Collaborator

dttung2905
Dec 13, 2025
Collaborator

krisnashaypp
Dec 15, 2025
Author

dttung2905 Dec 16, 2025
Collaborator