Add quota limit for auto evaluations by davidgisbey · Pull Request #720 · alphagov/govuk-chat

davidgisbey · 2025-12-19T15:14:49Z

Description

If we run our auto-evaluation on every answer that's generated we could end up spending a significant amount of money as many metrics have multiple LLM calls. Each of which is run 3 times.

This adds a quota for evaluations per hour which we've set to 300. Each job that runs an evaluation:

check for the presence of the auto_evaluations_count key in the cache
if not found. It adds the key with a value of 1 and set's the cache to expire at the end of the hour
If it is found then it checks to see if the value >= to the threshold of 300

If it is then it logs that the the quota is at the threshold and exits the jobs
if it isn't then it increments the counter and runs the evaluation

I've added the functionality to the base worker so it can be reused across the workers and added a shared example to test it.

I've updated the AnswerTopicsJob to inherit from the base worker and updated it to adhere from the quota.

I've also added another shared example for retry logic to remove some duplication

Trello card

https://trello.com/c/lfwcnNaz/3045-add-quota-for-auto-evaluation-llm-requests

kevindew

Great job, just a few small changes

davidgisbey · 2026-01-07T14:46:17Z

Thanks for the review @kevindew i've made those changes with the exception of one comment i left.

davidgisbey · 2026-01-07T17:22:47Z

I ran into an interesting issue when testing on Integration where it wasn't working as implemented. I pushed up a new commit which outlines the issue and gives a few suggestions.

Any feedback the approach we should take would be great.

kevindew · 2026-01-07T17:30:18Z

Ooh tricky problem - thanks for outlining options. I think I like:

expires_in = (Time.current.end_of_minute - Time.current).to_i
count = Rails.cache.increment("auto_evaluations_count", expires_in:)

if count >= max_evaluations
  do the various logging things
  return false
end

Though I would suggest doing: if count > max_evaluations as we've incremented it before.

Did you have problems with expires_at? I notice that's changed from expires_in

davidgisbey · 2026-01-07T17:36:25Z

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

davidgisbey · 2026-01-07T18:14:55Z

    Rack::Attack.cache.store = old_store
  end

+  config.before(:each, :with_memory_store) do


this is required for the system specs that run auto evaluation because NullStore returns nil when you call increment which causes this line to blow up.

I think rather than doing this - because it's pretty heavy, it'd be better to change the usage code to be:

# fallback to 1 in scenarios where we have a null cache (dev/test) and this returns nil count = Rails.cache.increment("auto_evaluations_count", 1, expires_in:) || 1

davidgisbey · 2026-01-08T09:49:46Z

Thanks for the feedback @kevindew i've made those changes.

kevindew · 2026-01-08T10:30:41Z

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

So it should be fine because the the merged_options method is supposed to convert it to an expires_in

kevindew · 2026-01-08T11:01:18Z

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

So it should be fine because the the merged_options method is supposed to convert it to an expires_in

Hmm I've had a bit of a play with it and while it does work, there is an edge case you can hit where if you were very close to the end of an hour you'd get a negative number and an exception:

govuk-chat(prod)> Rails.cache.increment("test", expires_at: expiry)
(govuk-chat):37:in '<main>': Cache expiration time is invalid, cannot be negative: -16.07605283 (ArgumentError)

            raise error

which would be annoying to hit.

I then considered that it might be better to do expires_in but I realised that if we hit the same sort of edge case on that we'd end up with the cache potentially spanning more than one hour.

So, I've tried to think of a way to do it without these edge cases that might be a little more robust.

Rather than us relying on expiry I think we should set the key based on the time and then just allow it to expire after an hour. I think that's actually what Rack::Attack does if I remember right.

So something like:

key = "auto_evaluations_count_#{Time.beginning_of_hour.to_i}"
count = Rails.cache.increment("key", 1, expires_in: 1.hour)

I think this would also have an advantage that for every hour we get a fresh cache key and that is something explicit we can debug on if we hit problems, whereas relying on the TTL for MemCache is very cryptic because you can't look it up. I learnt this in my little terminal session when I accidentally ran fetch on a key once and then it seemed to never expire:

govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 13
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 14
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 15
govuk-chat(prod)> Time.now
=> 2026-01-08 10:41:27.049026017 +0000
govuk-chat(prod)> Time.now
=> 2026-01-08 10:41:43.991510293 +0000
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 16

davidgisbey · 2026-01-08T16:12:08Z

@kevindewgood shout on that. I'll remember that approach going forward 🎉 I've made the requested change.

This adds the ability to limit the number of autoevaluations that can be run per hour. If the limit is reached, further evaluations are skipped and a warning is logged. It uses the 'auto_evaluations_count_' prefix combined with the start of the current hour's timestamp (to_i) to form the cache key. It will expire after the last evaluation in that hour plus an hour. When the first evaluation is run in the next hour and new key is created. We've set the limit to 300 evaluations per hour based on discussion with data science. We're treating topic tagging as an auto evaluation. In order to reuse the quota functionality i've updated the AnswerTopicsJob to ingerit from the base job and added the quota checks.

There's essentially duplicate tests for retries in the topics and relevancy job specs. Any 4xx or 5xx error is raised via the OpenAI or Anthropic clients, so we can generalise the shared example to cover both cases.

kevindew

Great stuff

davidgisbey changed the base branch from main to add-metrics-data-models-and-integrate-into-workflow December 19, 2025 15:15

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from a8be0c2 to 5d8e312 Compare December 19, 2025 15:15

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:16 Inactive

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 5d8e312 to 3fbe2f0 Compare December 19, 2025 15:19

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:20 Inactive

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:24 Inactive

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 9b5ae54 to e2550d1 Compare December 19, 2025 15:48

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:49 Inactive

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from e2550d1 to 7cdcb58 Compare December 19, 2025 16:01

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 16:01 Inactive

davidgisbey marked this pull request as ready for review December 19, 2025 16:11

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from ff73c90 to 884228c Compare December 23, 2025 09:17

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 7cdcb58 to 67881a2 Compare December 23, 2025 09:21

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 09:21 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 884228c to e4adb49 Compare December 23, 2025 09:26

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 67881a2 to e4850b6 Compare December 23, 2025 09:31

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 09:31 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from e4adb49 to f96ea91 Compare December 23, 2025 12:00

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from e4850b6 to dfda1b4 Compare December 23, 2025 12:02

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:02 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from f96ea91 to 2f964ea Compare December 23, 2025 12:07

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from dfda1b4 to b563490 Compare December 23, 2025 12:09

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:09 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 2f964ea to 9883cc1 Compare December 23, 2025 12:12

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from b563490 to 7fb2370 Compare December 23, 2025 12:13

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:13 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch 2 times, most recently from 09e9852 to 3e18724 Compare December 23, 2025 12:50

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 7fb2370 to 7296534 Compare December 23, 2025 12:50

govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:50 Inactive

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 524abaa to 8d2835a Compare January 6, 2026 11:29

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch 2 times, most recently from bf41a84 to 5517936 Compare January 6, 2026 11:54

davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch 5 times, most recently from e3709b8 to 4cabe13 Compare January 7, 2026 09:51

Base automatically changed from add-metrics-data-models-and-integrate-into-workflow to main January 7, 2026 10:31

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 5517936 to df2b79b Compare January 7, 2026 10:37

davidgisbey changed the title ~~Add quota limit for metric requests and utilise shared logic for Topics~~ Add quota limit for metric requests Jan 7, 2026

davidgisbey changed the title ~~Add quota limit for metric requests~~ Add quota limit for auto evaluation LLM requests Jan 7, 2026

davidgisbey force-pushed the add-quota-limit-for-metric-requests branch 3 times, most recently from 57c49c1 to 43aeb61 Compare January 7, 2026 11:29

kevindew reviewed Jan 7, 2026

View reviewed changes

Comment thread app/jobs/answer_analysis/base_job.rb Outdated

kevindew reviewed Jan 7, 2026

View reviewed changes

Comment thread spec/support/job_examples.rb Outdated

Comment thread spec/support/job_examples.rb Outdated

Comment thread app/jobs/answer_analysis/base_job.rb

davidgisbey commented Jan 7, 2026

View reviewed changes

davidgisbey added 2 commits January 8, 2026 16:13

Add shared example for retries on specific errors

6c683f4

There's essentially duplicate tests for retries in the topics and relevancy job specs. Any 4xx or 5xx error is raised via the OpenAI or Anthropic clients, so we can generalise the shared example to cover both cases.

kevindew approved these changes Jan 8, 2026

View reviewed changes

Conversation

davidgisbey commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Trello card

Uh oh!

Uh oh!

kevindew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidgisbey commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidgisbey commented Jan 7, 2026

Uh oh!

kevindew commented Jan 7, 2026

Uh oh!

davidgisbey commented Jan 7, 2026

Uh oh!

davidgisbey Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevindew Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

davidgisbey commented Jan 8, 2026

Uh oh!

kevindew commented Jan 8, 2026

Uh oh!

kevindew commented Jan 8, 2026

Uh oh!

davidgisbey commented Jan 8, 2026

Uh oh!

kevindew left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidgisbey commented Dec 19, 2025 •

edited

Loading

davidgisbey Jan 7, 2026 •

edited

Loading