Skip to content

dataset rate limit#5527

Merged
chitalian merged 1 commit intomainfrom
justin/dataset-rate-limit
Jan 16, 2026
Merged

dataset rate limit#5527
chitalian merged 1 commit intomainfrom
justin/dataset-rate-limit

Conversation

@chitalian
Copy link
Contributor

Ticket

Link to the ticket(s) this pull request addresses.

Component/Service

What part of Helicone does this affect?

  • Web (Frontend)
  • Jawn (Backend)
  • Worker (Proxy)
  • Bifrost (Marketing)
  • AI Gateway
  • Packages
  • Infrastructure/Docker
  • Documentation

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Performance improvement
  • Refactoring

Deployment Notes

  • No special deployment steps required
  • Database migrations need to run
  • Environment variable changes required
  • Coordination with other teams needed

Screenshots / Demos

Before After

Extra Notes

Any additional context, considerations, or notes for reviewers.

Context

Why are you making this change?

Screenshots / Demos

@vercel
Copy link

vercel bot commented Jan 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Review Updated (UTC)
helicone Skipped Skipped Jan 16, 2026 0:33am
helicone-bifrost Skipped Skipped Jan 16, 2026 0:33am
helicone-eu Skipped Skipped Jan 16, 2026 0:33am

Review with Vercel Agent

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@chitalian chitalian merged commit 43a86f7 into main Jan 16, 2026
11 checks passed
@chitalian chitalian deleted the justin/dataset-rate-limit branch January 16, 2026 00:33
@supabase
Copy link

supabase bot commented Jan 16, 2026

This pull request has been ignored for the connected project bolqqmqbrciybnypvklh because there are no changes detected in supbase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@supabase
Copy link

supabase bot commented Jan 16, 2026

This pull request has been ignored for the connected project lmahfbbnchpworytrrqk because there are no changes detected in .supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @chitalian's task —— View job


Code Review Complete ✅

  • Read the modified file to understand the dataset rate limiting implementation
  • Analyze code for bugs, security issues, and performance problems
  • Provide detailed review feedback with specific suggestions
  • Give confidence score and summary recommendations

Review of Dataset Rate Limit Implementation

Score: 7/10 - Good implementation with minor suggestions for improvement

What Was Added

The PR adds rate limiting to the dataset row creation feature:

  • New getTodayRowCount() method (valhalla/jawn/src/managers/dataset/HeliconeDatasetManager.ts:50-59)
  • Rate limit check in addRequests() method (valhalla/jawn/src/managers/dataset/HeliconeDatasetManager.ts:219-227)
  • Enforces 1000 rows per day per organization with clear error messaging

Code Quality Assessment

Strengths

  1. Clear Error Message: The rate limit error provides helpful context including current usage, requested amount, and reset time
  2. Proper Authorization: Uses this.authParams.organizationId for organization-scoped rate limiting
  3. Efficient Query: getTodayRowCount() uses CURRENT_DATE for accurate daily counting
  4. Good Integration: Rate limit check happens before expensive operations like S3 copying

⚠️ Areas for Improvement

1. Race Condition Risk (valhalla/jawn/src/managers/dataset/HeliconeDatasetManager.ts:220-227)

// Current implementation
const todayCount = await this.getTodayRowCount();
if (todayCount + addRequests.length > 1000) {
  return err(/* rate limit exceeded */);
}
// Later: INSERT INTO helicone_dataset_row...

Issue: Multiple concurrent requests could bypass the rate limit if they check the count before any of them insert rows.

Suggestion: Use a database constraint or atomic counter:

// Option 1: Database constraint with better error handling
INSERT INTO helicone_dataset_row (organization_id, origin_request_id, dataset_id)
VALUES ${values}
RETURNING id, origin_request_id

// Add constraint in migration:
// ALTER TABLE helicone_dataset_row ADD CONSTRAINT daily_row_limit 
// CHECK (daily_row_count_for_org(organization_id) <= 1000);

// Option 2: Use UPSERT with daily counter table
INSERT INTO org_daily_usage (organization_id, date, row_count) 
VALUES ($1, CURRENT_DATE, $2) 
ON CONFLICT (organization_id, date) 
DO UPDATE SET row_count = org_daily_usage.row_count + $2
WHERE org_daily_usage.row_count + $2 <= 1000
RETURNING row_count;

2. Performance Consideration (valhalla/jawn/src/managers/dataset/HeliconeDatasetManager.ts:51-57)

SELECT COUNT(*)::int as count
FROM helicone_dataset_row
WHERE organization_id = $1
  AND created_at >= CURRENT_DATE

Issue: This query could become slow for organizations with many dataset rows as it requires scanning all rows for the organization.

Suggestion: Add a database index:

CREATE INDEX idx_helicone_dataset_row_org_created 
ON helicone_dataset_row (organization_id, created_at);

3. Magic Number (valhalla/jawn/src/managers/dataset/HeliconeDatasetManager.ts:221)

if (todayCount + addRequests.length > 1000) {

Suggestion: Extract to a constant or configuration:

private static readonly DAILY_ROW_LIMIT = 1000;

// Or better, make it configurable per organization tier
private getDailyRowLimit(): number {
  return this.authParams.organizationTier === 'premium' ? 10000 : 1000;
}

Security & Business Logic

Security Assessment

  • ✅ Proper organization scoping prevents cross-organization rate limit bypass
  • ✅ No SQL injection risks (uses parameterized queries)
  • ✅ Input validation exists via existing request validation

💡 Enhancement Opportunities

  1. Observability: Consider adding metrics/logging for rate limit hits for monitoring
  2. Graceful Degradation: Could implement sliding window or burst capacity for better UX
  3. Admin Override: Consider adding capability for support team to temporarily raise limits

Suggestions Summary

  • High Priority: Fix potential race condition with atomic rate limiting
  • Medium Priority: Add database index for performance
  • Low Priority: Extract magic number to constant, add observability

Recommendation

This is a solid defensive implementation that successfully adds rate limiting to prevent abuse. The race condition is the main concern, but it's unlikely to be exploited in normal usage patterns. The performance impact should be monitored as dataset usage grows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant