KAFKA-17503: Fix incorrect calculation of poll-idle-ratio-avg for con… #17139

srdo · 2024-09-09T16:15:36Z

…sumers

The calculation of the metric was slightly off because the divisor ends up covering more than one full poll cycle. It instead covers a full poll cycle, plus the duration of the latest call to poll.

As an example, say a thread consistently spends 10 ms in poll and 10 ms outside poll. We would expect a fraction of 0.5.

The metric as it was implemented would return 0.33.

At time 0, we enter poll.
At time 10, we exit poll, registering a meaningless metric value.
At time 20, we enter poll, and update timeSinceLastPollMs to 20.
At time 30, we exit poll, and compute the metric value to 10 / (10 + 20) = 0.33

This commit changes the implementation to track the timestamp when we enter and exit poll, and updates the metric when we enter a new poll cycle.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

gaurav-narula · 2024-09-13T14:30:18Z

.../src/main/java/org/apache/kafka/clients/consumer/internals/metrics/KafkaConsumerMetrics.java

-        long pollTimeMs = pollEndMs - pollStartMs;
-        double pollIdleRatio = pollTimeMs * 1.0 / (pollTimeMs + timeSinceLastPollMs);
-        this.pollIdleSensor.record(pollIdleRatio);
+        this.pollEndMs = pollEndMs;


think this wouldn't record a metric when there's only one invocation of poll(). This is also evident from the failing KafkaConsumerTest::testPollIdleRatio test.

You are right. This is on purpose. If you only have 1 invocation of poll, then it doesn't make sense to ask "What fraction of the polling interval time was spent inside poll". The interval is not defined yet. The existing implementation just decides that the fraction should be 1.0, but that's not really true, and slightly skews the metric if there are few calls to poll. Until you have a full polling interval (i.e. two calls to poll), I think the metric should stay undefined.

Thanks for letting me know about the test, I've adjusted the test to account for the changes. I also added a bit to demonstrate the issue: Repeatedly running poll while spending 10ms inside and outside the poll causes the metric to approach 0.33. With these changes, the metric instead approaches 0.5.

Please let me know if the changes look reasonable. I note that there's a similar metric in KafkaShareConsumerMetricsTest. Should that be updated too? If so, let me know and I'll apply the same changes to that one.

…sumers The calculation of the metric was slightly off because the divisor ends up covering more than one full poll cycle. It instead covers a full poll cycle, plus the duration of the latest call to poll. As an example, say a thread consistently spends 10 ms in poll and 10 ms outside poll. We would expect a fraction of 0.5. The metric as it was implemented would return 0.33. At time 0, we enter poll. At time 10, we exit poll, registering a meaningless metric value. At time 20, we enter poll, and update timeSinceLastPollMs to 20. At time 30, we exit poll, and compute the metric value to 10 / (10 + 20) = 0.33 This commit changes the implementation to track the timestamp when we enter and exit poll, and updates the metric when we enter a new poll cycle.

gaurav-narula · 2024-11-10T21:39:04Z

Apologies for the late reply. I stumbled upon a similar metric in KafkaRaftMetrics which was modified in #13207. Perhaps we should consider a similar logic for parity? One concern I have is unlike the KRaft metric, which is limited to the KRaft Controller, a change in the consumer's metric would have a wider impact. WDYT @hachikuji @jsancio ?

srdo · 2024-11-11T15:39:18Z

One concern I have is unlike the KRaft metric, which is limited to the KRaft Controller, a change in the consumer's metric would have a wider impact

While this is true, I think it's also worth keeping in mind that the metric that exists today is so far off correct that it's not very helpful.

In the example of an even split between time inside and outside poll, the current metric will report 33%. That's very far from accurate.

ahuang98 · 2024-12-17T19:53:59Z

.../src/main/java/org/apache/kafka/clients/consumer/internals/metrics/KafkaConsumerMetrics.java

+        if (this.pollEndMs != 0) {
+            long pollIntervalMs = pollStartMs - this.pollStartMs;
+            long pollTimeMs = this.pollEndMs - this.pollStartMs;
+            double pollIdleRatio = pollTimeMs * 1.0 / pollIntervalMs;
+            this.pollIdleSensor.record(pollIdleRatio);
+        }


this method would benefit from some comments

Will add some, though I'll likely move these updates into recordPollEnd along with changing the polling interval to be based on the end timestamps instead of the start timestamps.

The poll(0ms) case you're thinking of I assume is regarding division by 0 here?

ahuang98 · 2024-12-17T20:00:57Z

clients/src/test/java/org/apache/kafka/clients/consumer/KafkaConsumerTest.java

+        // This also means the metric will always exclude the latest poll, since we don't know how much time is spent outside poll for that interval
+        // until poll is called again


wouldn't this not be an issue if we instead calculated
(pollEnd - pollStart) / (pollEnd - lastPollEnd)

this would mean our metric includes the latest poll, because it updates at the end of the poll

Yes, that would probably work. I'll try changing it.

I can change this, but the consequence is that we will exclude the first poll rather than the last poll.

The issue is essentially that this metric is measuring a property of the interval between two polls, which means you're going to have to either exclude either the first or the last poll depending on whether you choose to start intervals at the start or end of the poll.

If you define the poll interval as being from start to start (as I did here), that means the last poll is excluded because until the poll after that begins, the interval for the last poll has no endpoint.

If you define the poll interval as being from end to end (as you suggest), the first poll is excluded, because there is no starting point for that interval.

Here's a small illustration:

Picking the start time of the poll call as the boundary of the interval, the poll intervals for 3 sequential completed polls look like this:

[start1, start2], [start2, start3], [start3, ?].

By using these boundaries, each poll interval starts when poll is called, and ends when the subsequent poll is called. The time you spent outside of poll in an interval is the time after poll returns.

The third interval has an unknown boundary because that boundary point will be the start time of the fourth poll, and that poll hasn't started yet. This means we have to wait to record that last interval until then, which means the last poll is not included in the metric.

If we instead pick the end time as the boundary, we get these intervals:

[?, end1], [end1, end2], [end2, end3]

By using these boundaries, each poll interval starts when the previous poll returns, and ends when the current poll returns. The time you spent outside of poll in a specific interval is the time before you called poll.

The first interval has an unknown boundary, because there is no "previous poll" to get an end point from, and so that interval will be excluded from the metric.

I don't know which is preferable, they seem pretty equal to me. Do you have an opinion @ahuang98?

That's a fair point, I do see you decided to try implementing TimeRatio so I'll hold off on continuing this thread

ahuang98 · 2024-12-17T20:05:23Z

clients/src/test/java/org/apache/kafka/clients/consumer/KafkaConsumerTest.java

@@ -3098,31 +3098,53 @@ public void testPollIdleRatio(GroupProtocol groupProtocol) {
        assertEquals(Double.NaN, consumer.metrics().get(pollIdleRatio).metricValue());

        // 1st poll
-        // Spend 50ms in poll so value = 1.0
+        // Spend 50ms in poll. value=NaN because "the fraction of time spent inside poll" is undefined until the polling interval has an end point,


before the first poll, maybe worth adding an additional sleep

Sure, I can add that. Is it just because the starting time is never 0 in production, or is there something else you want to cover by doing this?

I've added this increment, but I don't understand what it's supposed to demonstrate. If you let me know what your reasoning is, I can put that in as a comment

I realize there isn't a benefit to adding the additional sleep, recordPollStart (at least at that commit) doesn't factor in time before the first pollStart

👍 I'll remove the sleep again then

ahuang98 · 2024-12-17T20:13:34Z

Apologies for the late reply. I stumbled upon a similar metric in KafkaRaftMetrics which was modified in #13207. Perhaps we should consider a similar logic for parity? One concern I have is unlike the KRaft metric, which is limited to the KRaft Controller, a change in the consumer's metric would have a wider impact. WDYT @hachikuji @jsancio ?

Interesting, I had a read of #13207 and I would agree that it would make sense to have parity here (given that this PR has been open for some time already, seems worth it to take the additional time to get the ratio right).
Having the poll(0ms) case tested would be great

Turn timeSinceLastPollMs into a local Add some comments to recordPollStart Make recordPollStart handle polling intervals of 0 ms by ignoring them, rather than letting the metric value become NaN Try to improve the explanation in KafkaConsumerTest of how this metric is intended to work, and why it excludes the last poll. Move time increments in the test around a bit. A polling interval runs from poll start to poll start, which means that the time in an interval we consider to be "outside of poll" is the time after the poll completes, until the next poll starts. This means the test is easier to read if we increment the time after each poll, rather than before each poll. Add bit to the test that verifies the 0ms interval handling correctly ignores such intervals.

srdo · 2024-12-22T09:30:04Z

@ahuang98 I have addressed your comments in the latest commit. Please take another look when you get a chance.

As outlined in #17139 (comment), I don't think there is any benefit to choosing to measure the intervals between poll endpoints instead of poll start points. We will have to exclude a poll from the metric either way, the only difference is whether it's the first or last poll.

So I've stuck to measuring intervals from poll start to poll start for now.

srdo · 2024-12-22T10:17:01Z

Regarding #13207, I think that is likely a better solution than the changes made here, since this suffers from exactly the problem described in that PR.

Happy to update this to use TimeRatio instead, if people agree that this is the way to go.

I have some minor concerns about that class though:

It's trying to give a more accurate metric by not collapsing each poll interval into a percentage. Instead, it determines the measurement interval by resetting when measure is called. If I understand the metrics code correctly, that means metrics reporters can "interfere" in the sense that the width of these windows will depend on how many metrics reporters you have, and how often they run.
I think measure is called by the metrics reporters, and measure mutates the state of TimeRatio. Doesn't that mean the internals of TimeRatio need to be made thread safe?
Since the calls to measure determine when TimeRatio resets the interval it's tracking, that means poll intervals can intersect with the measurement interval boundary, which means TimeRatio has to throw some values away (e.g. because a poll started before the current measurement interval but completed within it).
It could be slightly misleading to insert "fake" data points with a defaultRatio, rather than more clearly indicating that the value is not valid with e.g. NaN or -1 or something like that.

I'm unsure if these matter in practice.

The issue TimeRatio is trying to solve is that we're collapsing a bunch of not-equally-large time intervals into ratios and then averaging those ratios, which is wrong.

Would we be better off with a metric that makes the windows fixed-width?

If polling intervals are defined as running from the start of a poll until the start of the next poll, we know that the non-polling time in an interval "comes after" the polling time in that interval.

I think that would let us define fixed-width samples that can be averaged without losing precision. When we hit the first pollStart that exceeds the target time window, we can "backfill" time in the previous window because we know what kind of time is missing from it.

Example:

Say the window size is 10s

Poll 2s
Non-poll 2s
Poll 0s
Non-poll 0s
Poll 7s
Non-poll 2s

Say that we end up calling pollStart at t=13s

When we call pollStart, the counters which track all but the most recent poll would say we've spent 2s in poll and 2s outside poll. We'd know that we've accounted for 4/10 seconds.

We can then look at the latest poll and "fill in" the interval, drawing first from polling seconds (since we know poll "comes before" non-poll), and next from non-polling seconds to hit the exact window size.

In this case we're missing 6 seconds, so we pull that from the poll bucket, and the final numbers for the interval are 2+6 seconds in poll, and 2 seconds outside poll, meaning we'd log a sample ratio of 0.8.

The final 1 polling second and 2 non-poll seconds roll over to the next window. We'd start the next window at the end of the previous window, and add the remaning seconds to it. So the next interval in this case starts at 10s, and starts out with 1s spent polling and 2s spent not polling.

If the polling interval is much larger than the window size, we can apply the rollover method repeatedly. For example, if the window size is 10s and a poll interval takes 62s, a pollStart call can just submit 6 samples for the 6 intervals covered since the last pollStart, and carry over the last few seconds into the next interval.

I think this would let us accurately track the average ratio, without too many edge cases. The metrics reporters would just use Avg and SampledStat to get their value, like they do now. This means reporters can't interfere with each other. The samples for the fixed-size intervals would "age out" the same way samples do today.

This is to demonstrate the idea, needs polish before merging.

srdo · 2024-12-26T13:15:50Z

I've implemented the above idea in the latest commit, as input to the discussion. It's not quite mergeable, but it should be good enough to show that the idea can work. Rather than doing approximation like TimeRatio, it instead divides time into fixed-size windows, and submits one sample to the metric per window.

I believe this fixes the issue TimeRatio is also trying to solve, which is that it's bad to do averages over time intervals that are variable size. TimeRatio fixes it by not doing an average and instead ending each sampling window when a reporter asks for a value. This commit fixes it by making the intervals all equal in size.

I think the benefits over TimeRatio is that this metric is exact and doesn't have any edge cases related to 0-wide polling intervals, nor does it have any potential threading issues or impact from having multiple reporters looking at the metric.

The drawback is that the metric will be slightly behind, since the measured time within a sampling window is only recorded in the metric once the window closes. For example, if you have a sampling window size of 10 seconds, the metric will be showing values for polls between 0 and 10 seconds in the past, depending on how far through the window you are.

If we decide that we'd like to go with this type of metric over the TimeRatio approximation, I'd be happy to replace TimeRatio with the new fixed-window version, and adapt the tests as necessary.

Otherwise, let me know and I can make this metric use the current version of TimeRatio instead.

srdo · 2024-12-26T13:54:41Z

.../src/main/java/org/apache/kafka/clients/consumer/internals/metrics/KafkaConsumerMetrics.java

@@ -68,6 +67,7 @@ public KafkaConsumerMetrics(Metrics metrics, String metricGrpPrefix) {
                metricGroupName,
                "The average fraction of time the consumer's poll() is idle as opposed to waiting for the user code to process records."),
                new Avg());
+        this.pollIdleRatio = new TimeRatio2(metrics.config().timeWindowMs(), pollIdleSensor);


I'm a bit unsure whether it would be better to select a smaller time window here. The defaults for the metrics system is a window of 30 seconds, which is a bit large for this purpose, and it also only keeps 2 samples. It might make sense to aim for e.g. a sample every few seconds, or 1/5th the window size the metric system uses, or something like that.

ahuang98 · 2025-01-13T23:24:27Z

Since the calls to measure determine when TimeRatio resets the interval it's tracking, that means poll intervals can intersect with the measurement interval boundary, which means TimeRatio has to throw some values away (e.g. because a poll started before the current measurement interval but completed within it).

I see, this was hard to visualize, but is the issue essentially - if poll() was called at time=10, TimeRatio.measure() called at time=20, and poll had not completed yet, the 10 ongoing seconds of polling time will not accounted for in the calculated ratio. It is actually treated as non-polling time as well?

ahuang98 · 2025-01-13T23:39:42Z

It's trying to give a more accurate metric by not collapsing each poll interval into a percentage. Instead, it determines the measurement interval by resetting when measure is called. If I understand the metrics code correctly, that means metrics reporters can "interfere" in the sense that the width of these windows will depend on how many metrics reporters you have, and how often they run.

I think measure is called by the metrics reporters, and measure mutates the state of TimeRatio. Doesn't that mean the internals of TimeRatio need to be made thread safe?

I don't have enough knowledge with how folks generally configure metric reporters to comment on this, but these concerns seem reasonable. I'll see if I can get a second opinion on this

I am having a hard time following along with your example - specifically starting at this point

We can then look at the latest poll and "fill in" the interval, drawing first from polling seconds (since we know poll "comes before" non-poll), and next from non-polling seconds to hit the exact window size.

What I've gleaned is that with a window size of 10s, we are able to calculate the ratio for window 1 (which ends during the 6th second of the third poll) and window 2 is partial and includes everything after that.

srdo · 2025-01-14T00:23:34Z

I see, this was hard to visualize, but is the issue essentially - if poll() was called at time=10, TimeRatio.measure() called at time=20, and poll had not completed yet, the 10 ongoing seconds of polling time will not accounted for in the calculated ratio. It is actually treated as non-polling time as well?

Thanks for reviewing.

Yes. Polling time can essentially "leak" as long as a poll starts before the measurement window, and finishes during the window.

Example:
Assume we've called measure at t=0, which initialized the measurement window.
Call poll at t=10
Call measure at t=20. This reports 0 in the metric as the poll isn't done yet.
Complete the poll at t=30
Call measure at t=31.

We now have 20 ms of polling time inside an 11 ms measurement window. This causes the metric to report 1.0 and reset the counters to 0.

Those remaining 9 ms of polling time have "leaked" at this point, because they're simply lost from the tracking. The next time we call measure, it will not include those 9 ms, because we discarded them when resetting the counters.

A way to visualize this is that if we have any incomplete polls when we began a measurement window, then when the measurement window ends, we will try to "squeeze in" the pre-window-start polling time by pretending it actually occurred within the measurement window. If there isn't room for that time inside the window, we'll truncate it and report 1.0 for that window.

But either way, moving the time to the measurement window is a bit incorrect. If it doesn't fit, we lose time. If it does fit, we report a lower idle time for the window than we really should, because we included polling time in the window that actually didn't belong in that window.

That being said, the inaccuracy this introduces is probably minor as long as the measurement windows are large in comparison to the time it takes to run poll, since if you have very short polls and very long windows, only a small fraction of the window's tracked polling time can be coming from before the window began.

srdo · 2025-01-14T00:33:04Z

I am having a hard time following along with your example - specifically starting at this point

We can then look at the latest poll and "fill in" the interval, drawing first from polling seconds (since we know poll "comes before" non-poll), and next from non-polling seconds to hit the exact window size.

What I've gleaned is that with a window size of 10s, we are able to calculate the ratio for window 1 (which ends during the 6th second of the third poll) and window 2 is partial and includes everything after that.

Yes, exactly. The basic idea is to let each window be flushed to the metric as soon as we know we've accounted for all seconds from that window. Since windows are fixed-size, we can know when a window is ready to flush.

So in the example, once we finish the third poll, we have enough seconds to complete window 1, because we have 2 seconds from the first poll, 2 non-polling seconds, and then 6 polling seconds from the third poll.

The last second from the third poll then gets "saved up" for the next window which isn't complete yet. The same thing happens to the two non-polling seconds from the third poll. At some future point, we'll complete polls enough to get the remaining 7 seconds, and then window 2 can be flushed.

In the implementation the point where we account for the latest poll's seconds and consider whether to flush the open window is in pollStart.

ahuang98 · 2025-01-14T18:49:43Z

Hm, I take back at least this portion of my statement

It is actually treated as non-polling time as well?

It looks like measure ignores currentTimestamp, so it will exclude the current partial poll time from both the divisor and dividend.

double intervalDurationMs = Math.max(lastRecordedTimestampMs - intervalStartTimestampMs, 0);

ahuang98 · 2025-01-14T19:15:25Z

Complete the poll at t=30
Call measure at t=31.
We now have 20 ms of polling time inside an 11 ms measurement window. This causes the metric to report 1.0 and reset the counters to 0.

[edited] I see, I wrote the following test for understanding if partial polls would be accounted for correctly (and it seems to be the case) but I just duplicated your scenario and it doesn't seem to behave nicely (just as you mentioned)

    @Test
    public void testPartialPoll() {
        MetricConfig config = new MetricConfig();
        MockTime time = new MockTime();
        TimeRatio ratio = new TimeRatio(1.0);

        ratio.record(config, 0.0, time.milliseconds());
        time.sleep(20);
        ratio.record(config, 10, time.milliseconds());

        time.sleep(10);
        // pretend a poll() starts at this point
        time.sleep(10);

        // We ignore the last 20ms that lapsed after the last record() was called. Cumulative value captured by calls
        // to record since last call to measure is 10, lastRecordedTimestampMs - intervalStartTimestampMs = 20s - 0s.
        assertEquals(0.5, ratio.measure(config, time.milliseconds()));

        // Now the current poll() finishes.
        ratio.record(config, 10, time.milliseconds());
        // Cumulative value captured by calls to record since last call to measure is 10,
        // lastRecordedTimestampMs - intervalStartTimestampMs = 40s - 20s.
        assertEquals(0.5, ratio.measure(config, time.milliseconds()));

        // If we had calculated the ratio over the entire window of the test (40s), the ratio would have been
        // (10 + 10) / 40 = 0.5.
    }

ahuang98 · 2025-01-14T19:34:44Z

Hm, okay your edge case only happens on the call to measure after the first call to record. All following measurements shouldn't have this issue of 'leaking' poll time (because lastRecordedTimestampMs will not be -1)

e.g.

That being said, the inaccuracy this introduces is probably minor as long as the measurement windows are large in comparison to the time it takes to run poll, since if you have very short polls and very long windows, only a small fraction of the window's tracked polling time can be coming from before the window began.

Only on startup would we see at most two measurement windows with an inflated value (first window will always report 1.0, and the window where the poll ended due to the edge case you found) if poll time greatly surpasses the measurement window.

srdo · 2025-01-14T19:56:31Z

You are completely correct. My mistake. TimeRatio can't "lose time" the way I thought. The scenario I described was wrong, I misunderstood the code.

So if the two other concerns mentioned above (thread safety, the effect of multiple metrics reporters) turn out to not be relevant, I guess TimeRatio is fine as is?

Not sure if this matters in practice, so maybe just for fun: Here's another example demonstrating funny metric outputs:

    @Test
    public void testPartialPoll() {
        MetricConfig config = new MetricConfig();
        MockTime time = new MockTime();
        TimeRatio ratio = new TimeRatio(1.0);

        ratio.record(config, 0.0, time.milliseconds());
        time.sleep(10);
        // Pretend poll starts here
        time.sleep(10);

        //This emits the default value because the poll isn't done, so the current window has size 0
        assertEquals(1.0, ratio.measure(config, time.milliseconds()));

        time.sleep(10);
        ratio.record(config, 20, time.milliseconds());

        // This is technically the correct value, but this measurement ends up covering the entire time interval
        // and we're still left with a meaningless 1.0 metric value above.
        // When a user looks at this in their metrics, they'll see 1.0 followed by 0.66, which is pretty hard to interpret.
        assertEquals(2 / 3.0, ratio.measure(config, time.milliseconds()));
        
        // To demonstrate that this can be repeated
        // Pretend another poll starts here
        time.sleep(10);
        // We again get a meaningless 1.0 metric since the window is 0 in size
        assertEquals(1.0, ratio.measure(config, time.milliseconds()));
        time.sleep(10);
        ratio.record(config, 10, time.milliseconds());
        // followed by the "real" value in this call
        assertEquals(0.5, ratio.measure(config, time.milliseconds()));
    }

In this test, the user will observe the following emitted metric values: 1.0, 0.66, 1.0, 0.5, even though two of those values are meaningless. This happens if measure is called twice in a row without a completed poll between them.

ahuang98 · 2025-01-14T20:30:41Z

(haven't read your last example yet, but will today) I think your concern about multithreading behavior is still valid though, if we do decide to change how we calculate poll ratios, perhaps it may make sense to introduce a KIP to get more eyes on the implementation (since this may impact the values of both the consumer and raft poll metric)

ahuang98 · 2025-01-14T21:36:20Z

Basically the metric can oscillate if the reporting window is consistently skewed with when polls end. I think the same issue could occur even with pre-determined window sizes right?

srdo · 2025-01-14T21:42:15Z

The example I posted above isn't showing a real oscillation, it's showing that we might report the metric default value (in the example, this is 1.0) if measure is called twice without a poll completing between the two calls. The 1.0 values aren't "real", they're placeholder values inserted because measure must return some value when called.

ahuang98 · 2025-01-31T16:21:21Z

@srdo what did you think about the KIP? Your point about the multithreading issue is valid, and since fixing TimeRatio would impact the metric reporting for the existing idle ratio metric it would be good to get more eyes on your proposed solution

github-actions · 2025-05-02T03:43:52Z

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

github-actions · 2025-06-01T03:51:52Z

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

kirktrue added the consumer label Sep 12, 2024

gaurav-narula reviewed Sep 13, 2024

View reviewed changes

srdo force-pushed the idle-ratio-fix branch 3 times, most recently from ce75605 to 1bdc192 Compare September 13, 2024 15:54

srdo added 2 commits October 24, 2024 22:18

fix test

d25e358

srdo force-pushed the idle-ratio-fix branch from 1bdc192 to d25e358 Compare October 24, 2024 20:18

github-actions bot added clients small Small PRs labels Oct 24, 2024

ahuang98 reviewed Dec 17, 2024

View reviewed changes

Implement fixed-window metric

0cd6848

This is to demonstrate the idea, needs polish before merging.

github-actions bot removed the small Small PRs label Dec 26, 2024

srdo commented Dec 26, 2024

View reviewed changes

github-actions bot added the stale Stale PRs label May 2, 2025

github-actions bot added the closed-stale PRs that were closed due to inactivity label Jun 1, 2025

github-actions bot closed this Jun 1, 2025

		// This also means the metric will always exclude the latest poll, since we don't know how much time is spent outside poll for that interval
		// until poll is called again

KAFKA-17503: Fix incorrect calculation of poll-idle-ratio-avg for con… #17139

KAFKA-17503: Fix incorrect calculation of poll-idle-ratio-avg for con… #17139

Uh oh!

Conversation

srdo commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srdo Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaurav-narula commented Nov 10, 2024

Uh oh!

srdo commented Nov 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahuang98 Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srdo Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahuang98 commented Dec 17, 2024

Uh oh!

srdo commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahuang98 commented Jan 13, 2025

Uh oh!

ahuang98 commented Jan 13, 2025

Uh oh!

srdo commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahuang98 commented Jan 14, 2025

Uh oh!

ahuang98 commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahuang98 commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srdo commented Sep 9, 2024 •

edited

Loading

srdo Sep 13, 2024 •

edited

Loading

ahuang98 Dec 17, 2024 •

edited

Loading

srdo Dec 18, 2024 •

edited

Loading

srdo commented Dec 22, 2024 •

edited

Loading

srdo commented Dec 22, 2024 •

edited

Loading

srdo commented Dec 26, 2024 •

edited

Loading

srdo Dec 26, 2024 •

edited

Loading

srdo commented Jan 14, 2025 •

edited

Loading

srdo commented Jan 14, 2025 •

edited

Loading

ahuang98 commented Jan 14, 2025 •

edited

Loading

ahuang98 commented Jan 14, 2025 •

edited

Loading

srdo commented Jan 14, 2025 •

edited

Loading