A production-grade, distributed rate limiting service built with Spring Boot and Redis. Supports three algorithms, dynamic per-client configuration, atomic Redis Lua scripts, and real-time observability via Prometheus and Grafana.
Live Demo: https://rate-limiter-service-y5d7.onrender.com/swagger-ui/index.html
Docker Hub: https://hub.docker.com/repository/docker/bhaveshlohana/rate-limiter-service
Rate limiting is a critical component of any production API — it protects services from abuse, ensures fair usage across clients, and prevents cascading failures under high load. This project implements rate limiting as a standalone service that any backend application can integrate with via REST API or as a Spring Boot Starter dependency.
Key features:
- Three rate limiting algorithms — Fixed Window, Sliding Window Log, Token Bucket
- Atomic Redis Lua scripts — eliminates race conditions under concurrent load
- Dynamic configuration — change limits per client type without restarting
- Default config fallback — unknown client types fall back to a DEFAULT policy
- Admin API — manage configs and inspect client state at runtime
- Real-time observability — Prometheus metrics + Grafana dashboards
- Plug and play — use as a REST service or embed via
@RateLimitannotation as a Spring Boot Starter
rate-limiter/
├── rate-limiter-core/ ← shared algorithms, models, Redis logic
├── rate-limiter-service/ ← standalone REST service
└── rate-limiter-spring-boot-starter/ ← embeddable Spring Boot library
rate-limiter-core is a shared library consumed by both the service and the starter — no logic duplication.
┌─────────────────────────────────────────────────────────┐
│ Your Application │
│ │
│ Mode 1: POST /api/rate-limiter/check │
│ Mode 2: @RateLimit(clientType = "PREMIUM") │
│ via Spring Boot Starter │
└─────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Rate Limiter Service │
│ │
│ RateLimiterController │
│ │ │
│ ▼ │
│ RateLimiterFactory ──── ClientConfigService │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Algorithm │ │ Config │ │
│ │ Selection │ │ Lookup │ │
│ └─────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Redis Lua Script │ │
│ │ (atomic read-check-write) │ │
│ └──────────────────────────────────┘ │
└─────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Redis │
│ │
│ ratelimit:config:PREMIUM ← client configs │
│ ratelimit:fixed:user123 ← algorithm state │
│ ratelimit:token:user456 ← algorithm state │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Observability Stack │
│ │
│ /actuator/prometheus ──► Prometheus ──► Grafana │
└─────────────────────────────────────────────────────────┘
| Algorithm | Memory | Accuracy | Burst Handling | Best For |
|---|---|---|---|---|
| Fixed Window | Low | Low (boundary burst) | Allows boundary burst | Simple APIs, low traffic |
| Sliding Window Log | High | High | Smooth, no bursts | Strict rate limiting |
| Token Bucket | Low | High | Controlled burst | Most production use cases |
Divides time into fixed buckets. Counts requests per bucket. Resets when the window expires.
Redis structure: STRING — integer counter with TTL
Known limitation: Boundary burst — a client can make 2x requests at window boundaries
Stores a timestamp log of every request in a Sorted Set. On each request, evicts entries older than the window and counts what remains.
Redis structure: ZSET — members are UUIDs, scores are timestamps
Known limitation: Memory heavy for high-traffic clients
Each client has a bucket that refills at a fixed rate. Each request consumes one token. Allows bursts up to bucket capacity while enforcing an average rate.
Redis structure: HASH — tokens and lastRefillTime
Best for: Most real-world rate limiting scenarios
All three algorithms use Redis Lua scripts for atomic execution. The read-check-write cycle executes as a single Redis operation, eliminating race conditions under concurrent load. Both naive (non-atomic) and atomic implementations are available for comparison.
- Docker and Docker Compose
- Java 21 (for local development)
git clone https://github.com/bhaveshlohana/rate-limiter-service
cd rate-limiter-service
docker compose up -dThis starts:
- Rate Limiter Service on
http://localhost:8080 - Redis on
localhost:6379 - Prometheus on
http://localhost:9090 - Grafana on
http://localhost:3000(admin/admin)
docker run -p 8080:8080 \
-e SPRING_DATA_REDIS_HOST=host.docker.internal \
-e SPRING_DATA_REDIS_PORT=6379 \
bhaveshlohana/rate-limiter-service:latest./mvnw spring-boot:runRequires Redis running on localhost:6379.
POST /api/rate-limiter/check{
"clientId": "user123",
"clientType": "PREMIUM"
}Responses:
200 OK— request allowed429 Too Many Requests— rate limit exceeded
{
"allowed": true,
"reason": "Request allowed",
"remainingRequests": 47
}POST /api/admin/config{
"clientType": "PREMIUM",
"algorithm": "TOKEN_BUCKET",
"capacity": 500,
"refillRatePerSecond": 10.0
}GET /api/admin/config/{clientType}GET /api/admin/configDELETE /api/admin/config/{clientType}GET /api/admin/status?clientId=user123&clientType=PREMIUM{
"clientId": "user123",
"clientType": "PREMIUM",
"algorithm": "TOKEN_BUCKET",
"currentTokens": 487.5,
"remainingRequests": 487
}Client configurations are stored dynamically in Redis. No restart required to update limits.
| Field | Type | Required For | Description |
|---|---|---|---|
clientType |
String | All | Identifier for the client type |
algorithm |
Enum | All | FIXED_WINDOW, SLIDING_WINDOW, TOKEN_BUCKET |
limit |
Integer | Fixed/Sliding Window | Max requests per window |
windowSizeSeconds |
Integer | Fixed/Sliding Window | Window duration in seconds |
capacity |
Integer | Token Bucket | Max bucket size (burst limit) |
refillRatePerSecond |
Double | Token Bucket | Tokens added per second |
A DEFAULT config is seeded on startup and applies to any unknown client type:
{
"clientType": "DEFAULT",
"algorithm": "FIXED_WINDOW",
"limit": 10,
"windowSizeSeconds": 60
}# Anonymous users — strict
curl -X POST http://localhost:8080/api/admin/config \
-H "Content-Type: application/json" \
-d '{
"clientType": "ANONYMOUS",
"algorithm": "FIXED_WINDOW",
"limit": 10,
"windowSizeSeconds": 60
}'
# Registered users — moderate
curl -X POST http://localhost:8080/api/admin/config \
-H "Content-Type: application/json" \
-d '{
"clientType": "REGISTERED",
"algorithm": "SLIDING_WINDOW",
"limit": 100,
"windowSizeSeconds": 60
}'
# Premium users — generous burst
curl -X POST http://localhost:8080/api/admin/config \
-H "Content-Type: application/json" \
-d '{
"clientType": "PREMIUM",
"algorithm": "TOKEN_BUCKET",
"capacity": 500,
"refillRatePerSecond": 10.0
}'Any service can integrate by calling the /check endpoint before processing a request:
RestTemplate restTemplate = new RestTemplate();
RateLimitRequest request = new RateLimitRequest("user123", "PREMIUM");
ResponseEntity<RateLimitResponse> response = restTemplate.postForEntity(
"http://rate-limiter-service/api/rate-limiter/check",
request,
RateLimitResponse.class
);
if (response.getStatusCode() == HttpStatus.TOO_MANY_REQUESTS) {
throw new RateLimitExceededException();
}Add the dependency to your Spring Boot project:
<dependency>
<groupId>com.bhavesh.learn</groupId>
<artifactId>rate-limiter-spring-boot-starter</artifactId>
<version>1.0.0</version>
</dependency>Configure client types in application.yml:
rate-limiter:
configs:
- clientType: DEFAULT
algorithm: FIXED_WINDOW
limit: 60
windowSizeSeconds: 60
- clientType: PREMIUM
algorithm: TOKEN_BUCKET
capacity: 100
refillRatePerSecond: 10.0Annotate your endpoints:
@RateLimit(clientType = "PREMIUM")
@GetMapping("/api/data")
public ResponseEntity<?> getData() {
return ResponseEntity.ok(data);
}When the rate limit is exceeded, the starter automatically returns 429 Too Many Requests — no additional configuration needed.
Metrics are exposed at /actuator/prometheus and scraped by Prometheus every 5 seconds.
| Metric | Labels | Description |
|---|---|---|
ratelimit_request_total |
clientType, algorithm, result |
Total requests checked |
Key queries:
# Request rate per second
rate(ratelimit_request_total[1m])
# Rejection rate by client type
rate(ratelimit_request_total{result="rejected"}[1m])
# Allowed vs rejected
ratelimit_request_total
Import the dashboard from grafana/dashboard.json or connect Grafana to your Prometheus instance.
Load tested with k6 — 10 concurrent users per algorithm, 30 seconds each. Test run twice for consistency.
| Metric | Run 1 | Run 2 |
|---|---|---|
| Total requests | 8,550 | 8,620 |
| Throughput | ~85 req/sec | ~86 req/sec |
| Rejection rate | 90.17% | 90.25% |
| Avg response time | 4.29ms | 3.49ms |
| p(95) response time | 7.8ms | 5.26ms |
| Max response time | 27.99ms | 26.95ms |
| All checks passed | ✅ 100% | ✅ 100% |
Results were consistent across both runs. All responses returned within 200ms under load with no errors or timeouts across all three algorithms.
Why Redis for config storage?
Configs and rate limit state share the same Redis instance — no extra infrastructure. Config changes are reflected immediately without restarts.
Why Lua scripts for atomicity?
Redis executes Lua scripts atomically — the entire read-check-write cycle runs as a single operation. This eliminates the race condition where two concurrent requests both read the same counter value and both get allowed when only one slot remains. Both naive and atomic implementations are provided for comparison.
Why fail closed on missing config?
If a client type has no config and no DEFAULT exists, requests are rejected. A rate limiter is a security boundary — unknown clients should not get unlimited access by default.
Why Token Bucket for most use cases?
Fixed Window allows boundary bursts. Sliding Window is memory-heavy at scale. Token Bucket provides accurate rate limiting with controlled burst support at O(1) memory per client.
Why a Spring Boot Starter?
The starter allows any Spring Boot application to add rate limiting with a single dependency and annotation — no REST calls, no manual wiring. It auto-configures all beans and seeds client configs from application.yml on startup.
KEYS *used ingetAllConfigs()— blocks Redis on large keyspaces. Production replacement: useSCANfor incremental iteration.- No authentication on admin endpoints — add Spring Security before production use.
- Render free tier cold starts — app spins down after 15 minutes of inactivity, causing ~30s delay on first request.
- Single Redis instance — no Redis Cluster support. For high availability, configure Redis Sentinel or Cluster.
- Java 21 + Spring Boot 3.4
- Redis — rate limit state and config storage
- Lua Scripts — atomic Redis operations
- Prometheus + Grafana — observability
- Docker + Docker Compose — containerization
- GitHub Actions — CI/CD
- Render — cloud deployment
# run all tests across all modules
./mvnw test
# run tests for a specific module
./mvnw test -pl rate-limiter-service
./mvnw test -pl rate-limiter-coreTests use embedded Redis — no external dependencies required.

