Skip to content

RFC: Rate limiting for MCP servers#57

Draft
jerm-dro wants to merge 6 commits intomainfrom
jerm/2026-03-18-rate-limiting
Draft

RFC: Rate limiting for MCP servers#57
jerm-dro wants to merge 6 commits intomainfrom
jerm/2026-03-18-rate-limiting

Conversation

@jerm-dro
Copy link
Contributor

@jerm-dro jerm-dro commented Mar 18, 2026

Summary

Adds an RFC for configurable rate limiting in ToolHive's MCPServer and VirtualMPCServer.

Scope:

  • Per-user and global rate limits at the server level
  • Per-operation limits for individual tools, prompts, and resources
  • Consistent config across MCPServer and VirtualMCPServer
  • Redis-backed shared counters for multi-replica correctness

This is an early draft focused on problem definition, user-facing configuration, and open questions — not implementation details.

🤖 Generated with Claude Code

jerm-dro and others added 4 commits March 18, 2026 15:05
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jerm-dro jerm-dro requested a review from Derek2Tu March 18, 2026 22:53
Copy link
Contributor

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the windowing section is the one part of this RFC that I think needs to be pinned down before we move forward. Right now it reads as "we'll figure it out during implementation"... but the algorithm choice affects burst behavior, storage, and how operators should reason about the limits they configure. Worth deciding here.

I'd suggest going with a token bucket. Here's why:

Your config schema (requestsPerWindow + windowSeconds) already maps to it naturally. A token bucket has two parameters: a refill rate (requestsPerWindow / windowSeconds) and a burst capacity (requestsPerWindow). Each token represents a single allowed request. The bucket starts full, refills at a steady rate, and when a request comes in it takes one token out. Empty bucket? Request rejected.

So "100 requests per 60 seconds" means: bucket holds 100 tokens, refills at ~1.67 tokens/second. The operator doesn't need to think about buckets or tokens at all... the config they already write just works.

Per-user limiting works the same way. Each user just gets their own bucket, keyed by identity. The algorithm is identical, only the Redis key changes:

  • Global limit: thv:rl:{server}:global
  • Per-user limit: thv:rl:{server}:user:{userId}
  • Per-user per-tool: thv:rl:{server}:user:{userId}:tool:{toolName}

In Redis, each bucket is a hash with two fields: token count and last refill timestamp. The check-and-decrement runs as a single atomic operation in Redis, so no race conditions across replicas.

Storage is O(1) per counter (two fields per hash). Total footprint is O(users × limits). So 500 users with 10 per-operation limits = 5,000 tiny hashes... basically nothing for Redis. Keys auto-expire when inactive too, so no garbage collection needed.

Compare that to a sliding window log, which stores a sorted set entry per request and is O(requests) per counter. That's the one to avoid.

Fixed window counters are simpler (single integer, also O(1)), but they have the classic boundary burst problem where a user can fire 2x their limit by timing requests across a window edge. Token bucket avoids that without adding real complexity.

A few things in the RFC that would need updating if you go with this:

  • The "Windowing" section should be reframed. Token bucket is continuous refill, not windowing. The current language about "approximate windowing" wouldn't be accurate anymore.
  • Burst behavior should be documented. An idle user can send a full burst of requestsPerWindow requests at once, since their bucket is full. That's a feature (it handles legitimate traffic spikes), but operators should understand it so they set capacity accordingly.
  • Retry-After becomes easy to compute. With token bucket you know exactly when the next token arrives (1 / refill_rate). The rejection behavior section is vague on this right now... worth noting that the algorithm gives you precise Retry-After values for free.

jerm-dro and others added 2 commits March 19, 2026 08:27
- Adopt token bucket algorithm per Ozz's review (replaces vague windowing section)
- Document burst behavior, Redis key structure, and precise Retry-After
- Clarify scope is Kubernetes-based deployments (Derek's question)
- Replace "operators" with "administrators" throughout (Derek's question)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jerm-dro
Copy link
Contributor Author

@JAORMX Thank you for the suggestion. I've updated the algorithm / solution section to use the proposed algorithm.

@JAORMX
Copy link
Contributor

JAORMX commented Mar 20, 2026

Hey @jerm-dro, two things I'd like to see before we move this forward:

The security considerations section is missing entirely, and it's a required one per the template. Rate limiting sits in the request hot-path and makes accept/reject decisions on every request... we need a threat model here. Think Redis as an attack surface, input validation on operation names used in Redis keys, audit logging for rejections (especially given agent exfiltration is a motivating threat), and Redis unavailability semantics (probably should be configurable like THV-0017's failure_policy).

Also, there's no alternatives considered section. At minimum, let's capture the algorithm discussion we already had. You originally proposed a sliding window approach and we went with token bucket instead. That decision and the reasoning behind it should live in the RFC so future readers have context.

Everything else is looking good. The token bucket write-up and additive limit semantics are solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants