Skip to content

feat(worker): add token bucket rate limiter Durable Object#5504

Merged
H2Shami merged 19 commits intomainfrom
new-rate-limter
Jan 16, 2026
Merged

feat(worker): add token bucket rate limiter Durable Object#5504
H2Shami merged 19 commits intomainfrom
new-rate-limter

Conversation

@replicas-connector
Copy link
Contributor

Summary

  • Implement production-grade token bucket rate limiter using Cloudflare Durable Objects
  • Support request-based and cost-based (cents) limiting via Helicone-RateLimit-Policy header
  • Three segment types for rate limiting isolation:
    • Global: One shared bucket per organization
    • Per-user: Bucket per Helicone-User-Id
    • Per-property: Bucket per Helicone-Property-[Name] (e.g., organization, tenant)

Key Design Decisions

  • Token bucket algorithm with lazy refill - no background timers, tokens computed on demand
  • Durable Objects for enforcement - guarantees atomic operations and consistency at scale
  • Configurable failure mode - fail-open (default, preserves availability) or fail-closed (preserves cost control)
  • Policy change detection - gracefully handles updates to rate limit policies

Policy Header Format

Helicone-RateLimit-Policy: [quota];w=[time_window];u=[unit];s=[segment]

Examples:

  • 1000;w=3600 - 1000 requests per hour, global
  • 5000;w=86400;u=cents - $50 per day, global
  • 100;w=60;s=user - 100 requests per minute, per user
  • 10000;w=3600;s=organization - 10000 requests per hour, per organization

Files Added

File Description
TokenBucketRateLimiterDO.ts Core Durable Object with bucket state management
policyParser.ts Policy string parsing with validation
segmentExtractor.ts Header-based segment identifier extraction
tokenBucketClient.ts Worker integration layer
test/rate-limit/*.spec.ts 90 unit tests

Test plan

  • All 90 unit tests pass
  • Manual testing with wrangler dev (can test global/user/property isolation)
  • Integration testing in staging environment

🤖 Generated with Claude Code

Implement a production-grade token bucket rate limiter for the AI Gateway
using Cloudflare Durable Objects for consistent enforcement at high concurrency.

Key features:
- Token bucket algorithm with lazy refill (no background timers)
- Supports request-based and cost-based (cents) limiting
- Segment types: global, per-user (Helicone-User-Id), per-property
- Policy header format: [quota];w=[window];u=[unit];s=[segment]
- Atomic operations via DO's single-threaded execution model
- Configurable fail-open/fail-closed on errors
- Policy change detection and graceful handling

Files added:
- TokenBucketRateLimiterDO: Core DO with bucket state management
- policyParser: Robust policy string parsing with validation
- segmentExtractor: Header-based segment identifier extraction
- tokenBucketClient: Worker integration layer

90 unit tests covering:
- Policy parsing edge cases
- Token bucket math (refill, consume, reset time)
- Segment extraction and DO key generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Jan 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Review Updated (UTC)
helicone Skipped Skipped Jan 16, 2026 11:23pm
helicone-bifrost Skipped Skipped Jan 16, 2026 11:23pm
helicone-eu Skipped Skipped Jan 16, 2026 11:23pm

Request Review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@supabase
Copy link

supabase bot commented Jan 13, 2026

This pull request has been ignored for the connected project bolqqmqbrciybnypvklh because there are no changes detected in supbase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@supabase
Copy link

supabase bot commented Jan 13, 2026

This pull request has been ignored for the connected project lmahfbbnchpworytrrqk because there are no changes detected in .supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@claude
Copy link
Contributor

claude bot commented Jan 13, 2026

Claude finished @replicas-connector[bot]'s task —— View job


PR Review: Token Bucket Rate Limiter

Score: 8/10 - Well-implemented production-grade rate limiter with excellent test coverage and thoughtful design decisions.

Todo List:

  • Analyze core rate limiter implementation
  • Review policy parsing and validation logic
  • Examine segment extraction for security issues
  • Check integration with existing codebase
  • Assess test coverage and quality
  • Evaluate performance implications
  • Provide final review with confidence score

Detailed Review

Strengths

Excellent Algorithm Design

  • Token bucket with lazy refill is correctly implemented - no background timers, efficient computation
  • Smart dual-strategy for request vs cost-based limiting handles the "unknown cost upfront" problem elegantly
  • Policy change detection with graceful handling (clamp existing tokens to new capacity)

Strong Security & Robustness

  • Input sanitization in segmentExtractor.ts:172-180 prevents injection attacks in DO keys
  • Cost normalization with reasonable limits (max $10,000/request) prevents abuse
  • Fail-open default preserves availability, configurable fail-closed for cost control
  • Comprehensive validation of policy parameters

Excellent Test Coverage

  • 90+ comprehensive unit tests across 3 test files (1,200+ lines of tests)
  • Tests cover edge cases, algorithm correctness, and error conditions
  • Pure function testing approach for bucket algorithm verification

Production-Ready Integration

  • DataDog tracing integration for monitoring
  • Proper error handling and logging
  • Clean separation of concerns (parser, extractor, client, DO)

⚠️ Minor Issues & Suggestions

1. Hash Function Quality (BucketRateLimiterDO.ts:317-326)

  • Uses simple string hash for policy change detection
  • Consider crypto.subtle.digest() for better collision resistance, though current approach is likely adequate

2. Cost Validation Gap

  • normalizeCost() clamps to 1M cents but doesn't validate minimum cost
  • Suggest adding minimum cost validation (e.g., >= 0.01 cents) to catch precision issues

3. Error Logging Enhancement

  • Rate limiter errors are caught and silently handled in some places
  • Consider structured logging for debugging production issues

4. Documentation

  • Excellent inline documentation, but could benefit from architecture decision records (ADRs) for the dual-strategy approach

🔧 Suggested Improvements

  1. Enhanced Monitoring:

    // Add more detailed metrics in bucketClient.ts
    tracer.setTag(spanId, "tokens_before", bucketState.tokens);
    tracer.setTag(spanId, "refill_amount", tokensToAdd);
  2. Improved Error Handling:

    // In ProxyForwarder.ts:210-212, add specific error logging
    } catch (error) {
      console.error("Rate limit check failed:", error);
      // Current fail-open behavior is correct
    }
  3. Policy Validation:

    // In policyParser.ts, add minimum cost validation
    if (policy.unit === "cents" && quota < 0.01) {
      return err({ field: "quota", message: "Minimum cost is 0.01 cents" });
    }

Architecture Decisions

The dual-strategy approach for request vs cost-based limiting is particularly well thought out:

  • Request-based: Preemptive deduction prevents race conditions
  • Cost-based: Check-only pre-request + post-request deduction handles unknown costs
  • This handles the fundamental challenge that LLM costs are only known after completion

🚀 Performance Analysis

  • Durable Objects provide strong consistency with good performance
  • Lazy refill algorithm is O(1) time complexity
  • Key structure ensures efficient bucket isolation
  • Single storage operation per request is optimal

Suggestions Summary

Critical Fixes: None - this is ready for production

Recommended Improvements:

  • Enhanced error logging for debugging
  • Add minimum cost validation for cents-based policies
  • Consider stronger hash function for policy change detection
  • Add more detailed monitoring metrics

Nice-to-Have:

  • Architecture decision records documenting the dual-strategy approach
  • Performance benchmarks under load

This is a well-architected, thoroughly tested implementation that demonstrates strong understanding of distributed rate limiting challenges. The code quality is high and the approach is production-ready.


@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 13, 2026

CI/CD Failure - Resolved

Workflow Worker AI Gateway Tests failed on commit 3d1bc10.

Fix: Fixed lint error no-control-regex in segmentExtractor.ts by adding an eslint-disable comment for the control character regex pattern (used for sanitizing input). Also fixed prettier formatting issues and removed an unused import.

Commit: bebeff5

@replicas-connector
Copy link
Contributor Author

CI/CD Failure

Workflow Worker Build Precheck failed on commit 3d1bc10. Investigating.

- Add eslint-disable comment for control-regex in segmentExtractor
- Fix prettier formatting issues in policyParser, segmentExtractor, tokenBucketClient
- Remove unused SegmentExtractionError import
- Change single quotes to double quotes for string literal

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 13, 2026 21:19 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 13, 2026 21:19 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone January 13, 2026 21:19 Inactive
@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 13, 2026

CI/CD Failure - Resolved

Workflow Worker Build Precheck failed on commit bebeff5.

Root Cause: The worker-configuration.d.ts file was generated with a different wrangler version locally (4.59.1) than what CI uses (4.53.0). Different wrangler versions embed different workerd runtimes, which produce slightly different runtime type definitions.

Fix: Regenerated worker-configuration.d.ts using npx wrangler@4.53.0 types --strict-vars false to match the CI environment exactly.

Commits:

  • ea294836b - Initial regeneration (wrong workerd version)
  • 91d1d0aee - Regenerated with wrangler 4.53.0 to match CI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone January 13, 2026 21:22 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 13, 2026 21:22 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 13, 2026 21:22 Inactive
@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 13, 2026

CI/CD Failure - Resolved

Workflow Worker Build Precheck failed on commit ea29483.

Root Cause: This workflow ran against an intermediate commit that still had wrangler version mismatch in the generated types.

Fix: The fix was already pushed in commit 91d1d0aee which regenerates worker-configuration.d.ts using wrangler@4.53.0 to match CI exactly.

Status: The new workflow run 20973131541 on commit 91d1d0aee has passed successfully.

Use the same wrangler version as CI (4.53.0) to ensure the generated
worker-configuration.d.ts matches exactly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 13, 2026 21:25 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone January 13, 2026 21:25 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 13, 2026 21:25 Inactive
Integrates the TokenBucketRateLimiterDO into the proxy request handler:

- Add checkTokenBucketRateLimit call in ProxyForwarder.ts after existing
  rate limit checks
- Add addTokenBucketRateLimitHeaders method to ResponseBuilder
- Rate limiting is triggered by the Helicone-RateLimit-Policy header
- Uses fail-open behavior to preserve availability on errors
- Adds rate limit response headers (Limit, Remaining, Policy, Reset)
- Returns HTTP 429 when rate limited

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone January 13, 2026 23:47 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 13, 2026 23:47 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 13, 2026 23:47 Inactive
@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 13, 2026

CI/CD Failure - Unrelated to PR Changes

Workflow Worker AI Gateway Tests failed on commit a939584.

Findings: The failure is in registry-ts.spec.ts test "openai - gpt-4o - PTB direct" with error:

Failed to pop isolated storage stack frame in registry-ts.spec.ts's test "openai - gpt-4o - PTB direct".
In particular, we were unable to pop Durable Objects storage.

This is a pre-existing flaky test issue with the Cloudflare Vitest pool workers' Durable Object storage isolation, not related to the token bucket rate limiter changes in this PR.

Evidence:

  • All 90 rate-limit tests passed successfully (policyParser: 34, segmentExtractor: 27, tokenBucket: 29)
  • The main branch also has intermittent failures with this same test (see workflow runs on 2026-01-08)
  • The error is in test infrastructure (@cloudflare/vitest-pool-workers) not in application code

Recommendation: Re-run the workflow or investigate the flaky test separately.

@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 14, 2026 01:10 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 14, 2026 01:10 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 15, 2026 23:55 Inactive
The rate limit filter was looking up a property filter by label, which failed
when the Helicone-Rate-Limit-Status property hadn't been used yet. This caused
the filter node to be an empty object ({}) that matched all requests instead
of only rate-limited ones.

Fixed by building the filter node directly using the known property structure.
Use empty object {} when not filtering (valid FilterNode type) instead of "all"
string which causes backend validation errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rate limit filter was looking up a property filter by label, which failed
when the Helicone-Rate-Limit-Status property hadn't been used yet. This caused
the filter node to be an empty object ({}) that matched all requests instead
of only rate-limited ones.

Fixed by building the filter node directly using the known property structure
with the correct value "bucket_rate_limited" (not "rate_limited").

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 16, 2026 00:03 Inactive
Changed the chart's userFilters to use the correct property value
"bucket_rate_limited" instead of "rate_limited". Also simplified the
filter structure to avoid validation errors with nested "all" strings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
H2Shami and others added 2 commits January 16, 2026 11:28
Resolved conflicts keeping bucket rate limiter implementation while
merging DataDog tracer imports from main.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Includes tracer.setOrgId() call that was in the main branch's
rate limit tracking for correlation purposes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 16, 2026

CI/CD Failure - Unresolved (Flaky Test)

Workflow Worker AI Gateway Tests failed on commit 170e4ab.

Findings: Same pre-existing flaky test issue in registry-ts.spec.ts:

Failed to pop isolated storage stack frame in registry-ts.spec.ts's test "openai - gpt-4o - PTB direct".
In particular, we were unable to pop Durable Objects storage.

Root Cause: This is a known issue with @cloudflare/vitest-pool-workers Durable Objects storage isolation, not related to the rate limiter changes in this PR.

All rate-limit tests passed successfully. The failing test is unrelated to token bucket rate limiter implementation.

Recommendation: This flaky test should be addressed separately - it's been failing intermittently across multiple CI runs and predates this PR.

- Add tracer and traceContext parameters to checkBucketRateLimit
- Add tracer and traceContext parameters to recordBucketUsage
- Add spans with metrics: remaining, rate_limited, quota_limit,
  time_window_seconds, rate_limit_unit, segment info
- Pass tracer/traceContext from ProxyForwarder to bucket functions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel vercel bot temporarily deployed to Preview – helicone January 16, 2026 23:23 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-eu January 16, 2026 23:23 Inactive
@vercel vercel bot temporarily deployed to Preview – helicone-bifrost January 16, 2026 23:23 Inactive
@replicas-connector
Copy link
Contributor Author

replicas-connector bot commented Jan 16, 2026

CI/CD Failure - Unresolved (Flaky Test)

Workflow Worker AI Gateway Tests failed on commit 7956bc9.

Findings: Same pre-existing flaky test issue in registry-ts.spec.ts:

FAIL registry-ts.spec.ts > Registry Tests > PTB Tests > with sufficient credits > openai - gpt-4o - PTB direct
Error: Test timed out in 10000ms.
AssertionError: Isolated storage failed. There should be additional logs above.

Root Cause: This is a known issue with @cloudflare/vitest-pool-workers Durable Objects storage isolation. The test times out and then the isolated storage cleanup fails. This has been occurring across multiple CI runs and predates this PR.

All 1256 other tests passed successfully, including all rate-limit tests. The failing test is unrelated to the token bucket rate limiter implementation.

Recommendation: This flaky test should be addressed separately - consider increasing the timeout for this specific test or investigating the DO storage isolation issue in the test framework.

@H2Shami H2Shami merged commit 15f5505 into main Jan 16, 2026
11 of 12 checks passed
@H2Shami H2Shami deleted the new-rate-limter branch January 16, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants