Kafka scaler on Azure Event Hub misbehaving #7309
Replies: 4 comments 1 reply
-
|
@dttung2905 any idea here? |
Beta Was this translation helpful? Give feedback.
-
|
Hi both, let me take a look at it. I think I might know where the problem lies but will need some time to validate my thought :D |
Beta Was this translation helpful? Give feedback.
-
|
Hi @krisnashaypp, I've investigated the issue and found the root cause. When Azure Event Hub intermittently doesn't return partition information in the offset response, the code was accessing partition offsets without checking if they exist, causing incorrect lag calculations and errors that break the entire scaling calculation. Scenario 1 ( Scenario 2 ( The original code accessed if _, found := topicPartitionOffsets[topic]; !found {
return 0, 0, fmt.Errorf("error finding partition offset for topic %s", topic)
}
latestOffset := topicPartitionOffsets[topic][partitionID] // No partition check!When a partition is missing, this returns I will create a PR to fix that soon! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @dttung2905 ! Thank you so much for your response, I'm happy to hear you found this. If and when you have time, I just had some hopefully small questions regarding this.
Thank you again and happy holidays 🙂 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys, I've been having issues with my scaler in Azure Container Apps.
I have an application that should scale between 0-3 based on lag, when lag is over 10 it should scale up once more. We are using Azure Event Hub on the kafka protocol and that works fine but the scaling itself isn't working as we want. There are 2 different issues that pop up.
scaleToZeroOnInvalidOffset true
Here it works most of the time but then seemingly from nowhere it stops scaling from 0 and we have to force the app to start and then it works again for a while
scaleToZeroOnInvalidOffset false
Here we get the reverse problem, it refuses to scale down to 0 and just keeps going even though there is no lag. Usually it sticks to 1 replica but I've also seen it get stuck on 3 once which could have been expensive if I hadn't been watching.
Consumer details
There is only a single application listening to this topic, the keda scaler is configured to use the same consumerGroup and topic of course, it reads in and commits messages just fine as far as we can see.
Would appreciate any suggestions on our settings, or if it's maybe a problem with Event Hub on Kafka protocol?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions