Skip to content

Add quota limit for auto evaluations#720

Merged
davidgisbey merged 2 commits into
mainfrom
add-quota-limit-for-metric-requests
Jan 8, 2026
Merged

Add quota limit for auto evaluations#720
davidgisbey merged 2 commits into
mainfrom
add-quota-limit-for-metric-requests

Conversation

@davidgisbey
Copy link
Copy Markdown
Contributor

@davidgisbey davidgisbey commented Dec 19, 2025

Description

If we run our auto-evaluation on every answer that's generated we could end up spending a significant amount of money as many metrics have multiple LLM calls. Each of which is run 3 times.

This adds a quota for evaluations per hour which we've set to 300. Each job that runs an evaluation:

  1. check for the presence of the auto_evaluations_count key in the cache
  2. if not found. It adds the key with a value of 1 and set's the cache to expire at the end of the hour
  3. If it is found then it checks to see if the value >= to the threshold of 300
  • If it is then it logs that the the quota is at the threshold and exits the jobs
  • if it isn't then it increments the counter and runs the evaluation

I've added the functionality to the base worker so it can be reused across the workers and added a shared example to test it.

I've updated the AnswerTopicsJob to inherit from the base worker and updated it to adhere from the quota.

I've also added another shared example for retry logic to remove some duplication

Trello card

https://trello.com/c/lfwcnNaz/3045-add-quota-for-auto-evaluation-llm-requests

@davidgisbey davidgisbey changed the base branch from main to add-metrics-data-models-and-integrate-into-workflow December 19, 2025 15:15
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from a8be0c2 to 5d8e312 Compare December 19, 2025 15:15
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:16 Inactive
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 5d8e312 to 3fbe2f0 Compare December 19, 2025 15:19
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:20 Inactive
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:24 Inactive
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 9b5ae54 to e2550d1 Compare December 19, 2025 15:48
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 15:49 Inactive
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from e2550d1 to 7cdcb58 Compare December 19, 2025 16:01
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 19, 2025 16:01 Inactive
@davidgisbey davidgisbey marked this pull request as ready for review December 19, 2025 16:11
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from ff73c90 to 884228c Compare December 23, 2025 09:17
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 7cdcb58 to 67881a2 Compare December 23, 2025 09:21
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 09:21 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 884228c to e4adb49 Compare December 23, 2025 09:26
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 67881a2 to e4850b6 Compare December 23, 2025 09:31
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 09:31 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from e4adb49 to f96ea91 Compare December 23, 2025 12:00
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from e4850b6 to dfda1b4 Compare December 23, 2025 12:02
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:02 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from f96ea91 to 2f964ea Compare December 23, 2025 12:07
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from dfda1b4 to b563490 Compare December 23, 2025 12:09
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:09 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 2f964ea to 9883cc1 Compare December 23, 2025 12:12
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from b563490 to 7fb2370 Compare December 23, 2025 12:13
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:13 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch 2 times, most recently from 09e9852 to 3e18724 Compare December 23, 2025 12:50
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 7fb2370 to 7296534 Compare December 23, 2025 12:50
@govuk-ci govuk-ci temporarily deployed to govuk-chat-add-quota-li-wsdrak December 23, 2025 12:50 Inactive
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch from 524abaa to 8d2835a Compare January 6, 2026 11:29
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch 2 times, most recently from bf41a84 to 5517936 Compare January 6, 2026 11:54
@davidgisbey davidgisbey force-pushed the add-metrics-data-models-and-integrate-into-workflow branch 5 times, most recently from e3709b8 to 4cabe13 Compare January 7, 2026 09:51
Base automatically changed from add-metrics-data-models-and-integrate-into-workflow to main January 7, 2026 10:31
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch from 5517936 to df2b79b Compare January 7, 2026 10:37
@davidgisbey davidgisbey changed the title Add quota limit for metric requests and utilise shared logic for Topics Add quota limit for metric requests Jan 7, 2026
@davidgisbey davidgisbey changed the title Add quota limit for metric requests Add quota limit for auto evaluation LLM requests Jan 7, 2026
@davidgisbey davidgisbey force-pushed the add-quota-limit-for-metric-requests branch 3 times, most recently from 57c49c1 to 43aeb61 Compare January 7, 2026 11:29
Comment thread app/jobs/answer_analysis/base_job.rb Outdated
Copy link
Copy Markdown
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, just a few small changes

Comment thread app/jobs/answer_analysis/base_job.rb Outdated
Comment thread app/jobs/answer_analysis/base_job.rb Outdated
Comment thread spec/support/job_examples.rb Outdated
Comment thread spec/support/job_examples.rb Outdated
Comment thread spec/support/job_examples.rb
Comment thread spec/support/job_examples.rb
@davidgisbey
Copy link
Copy Markdown
Contributor Author

Thanks for the review @kevindew i've made those changes with the exception of one comment i left.

Comment thread spec/support/job_examples.rb Outdated
Comment thread spec/support/job_examples.rb Outdated
Comment thread app/jobs/answer_analysis/base_job.rb
@davidgisbey
Copy link
Copy Markdown
Contributor Author

I ran into an interesting issue when testing on Integration where it wasn't working as implemented. I pushed up a new commit which outlines the issue and gives a few suggestions.

Any feedback the approach we should take would be great.

@kevindew
Copy link
Copy Markdown
Member

kevindew commented Jan 7, 2026

Ooh tricky problem - thanks for outlining options. I think I like:

expires_in = (Time.current.end_of_minute - Time.current).to_i
count = Rails.cache.increment("auto_evaluations_count", expires_in:)

if count >= max_evaluations
  do the various logging things
  return false
end

Though I would suggest doing: if count > max_evaluations as we've incremented it before.

Did you have problems with expires_at? I notice that's changed from expires_in

@davidgisbey
Copy link
Copy Markdown
Contributor Author

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

Comment thread spec/spec_helper.rb Outdated
Rack::Attack.cache.store = old_store
end

config.before(:each, :with_memory_store) do
Copy link
Copy Markdown
Contributor Author

@davidgisbey davidgisbey Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required for the system specs that run auto evaluation because NullStore returns nil when you call increment which causes this line to blow up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than doing this - because it's pretty heavy, it'd be better to change the usage code to be:

# fallback to 1 in scenarios where we have a null cache (dev/test) and this returns nil
count = Rails.cache.increment("auto_evaluations_count", 1, expires_in:) || 1

@davidgisbey
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @kevindew i've made those changes.

@kevindew
Copy link
Copy Markdown
Member

kevindew commented Jan 8, 2026

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

So it should be fine because the the merged_options method is supposed to convert it to an expires_in

@kevindew
Copy link
Copy Markdown
Member

kevindew commented Jan 8, 2026

I did. it didn't seem to work so i had a look at the docs and increment expects expires_in

So it should be fine because the the merged_options method is supposed to convert it to an expires_in

Hmm I've had a bit of a play with it and while it does work, there is an edge case you can hit where if you were very close to the end of an hour you'd get a negative number and an exception:

govuk-chat(prod)> Rails.cache.increment("test", expires_at: expiry)
(govuk-chat):37:in '<main>': Cache expiration time is invalid, cannot be negative: -16.07605283 (ArgumentError)

            raise error

which would be annoying to hit.

I then considered that it might be better to do expires_in but I realised that if we hit the same sort of edge case on that we'd end up with the cache potentially spanning more than one hour.


So, I've tried to think of a way to do it without these edge cases that might be a little more robust.

Rather than us relying on expiry I think we should set the key based on the time and then just allow it to expire after an hour. I think that's actually what Rack::Attack does if I remember right.

So something like:

key = "auto_evaluations_count_#{Time.beginning_of_hour.to_i}"
count = Rails.cache.increment("key", 1, expires_in: 1.hour)

I think this would also have an advantage that for every hour we get a fresh cache key and that is something explicit we can debug on if we hit problems, whereas relying on the TTL for MemCache is very cryptic because you can't look it up. I learnt this in my little terminal session when I accidentally ran fetch on a key once and then it seemed to never expire:

govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 13
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 14
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 15
govuk-chat(prod)> Time.now
=> 2026-01-08 10:41:27.049026017 +0000
govuk-chat(prod)> Time.now
=> 2026-01-08 10:41:43.991510293 +0000
govuk-chat(prod)> Rails.cache.increment("test", expires_in: 10)
=> 16

@davidgisbey
Copy link
Copy Markdown
Contributor Author

@kevindewgood shout on that. I'll remember that approach going forward 🎉 I've made the requested change.

This adds the ability to limit the number of autoevaluations
that can be run per hour. If the limit is reached, further evaluations
are skipped and a warning is logged.

It uses the 'auto_evaluations_count_' prefix combined with the start of
the current hour's timestamp (to_i) to form the cache key. It will expire
after the last evaluation in that hour plus an hour.

When the first evaluation is run in the next hour and new key is created.

We've set the limit to 300 evaluations per hour based on discussion with
data science.

We're treating topic tagging as an auto evaluation. In order to reuse the
quota functionality i've updated the AnswerTopicsJob to ingerit from the
base job and added the quota checks.
There's essentially duplicate tests for retries in the topics and relevancy
job specs. Any 4xx or 5xx error is raised via the OpenAI or Anthropic
clients, so we can generalise the shared example to cover both cases.
Copy link
Copy Markdown
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants