Skip to content

Add proto specs for usage alerts#12035

Open
bduffany wants to merge 1 commit intomasterfrom
usage-alerts-proto
Open

Add proto specs for usage alerts#12035
bduffany wants to merge 1 commit intomasterfrom
usage-alerts-proto

Conversation

@bduffany
Copy link
Copy Markdown
Member

@bduffany bduffany commented Apr 29, 2026

Adds protos for basic usage alerting. For v0, I'm thinking the system can look something close to this:

  • Users can manage* alerts using an "Alerting" tab in the Usage page, setting SKU + optional labels, alerting window (daily, weekly, monthly), and alerting thresholds after which the alerts will fire.
  • Alerts can be stored in the primary DB (mysql).
  • The rule evaluator can run as a k8s cronjob, running at some reasonable, short-ish interval. At each interval, the evaluator:
    • (1) queries UsageAlertingRules that need to be evaluated (given the current timestamp, last_fired_at timestamp, and alerting window)
    • (2) queries the RawUsage table to compare counts vs thresholds
    • (3) sends email alerts** if any thresholds are exceeded
    • (4) updates the last_evaluated_at and last_fired_at timestamps.

* Only adding Create, List, and Delete APIs for now. "Update" can be done by deleting and recreating a rule (could maybe add a "clone" button in the UI to make it slightly less tedious, or implement Update later if we have a need)
** If the cronjob fails between steps 3 and 4, we may send duplicate alerts (at-least-once semantics). As a "nice to have," if we choose an email delivery provider that supports idempotency keys, then we can get exactly-once delivery semantics.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the usage protobuf API surface to support usage-based alerting rules, including rule metadata, configuration, status, and basic CRUD request/response messages.

Changes:

  • Added new usage alerting rule protos (UsageAlertingRule, metadata/config/status, and UsageAlertingWindow enum) to proto/usage.proto.
  • Added request/response messages for listing, creating, and deleting usage alerting rules.
  • Updated Bazel proto build targets to include new proto dependencies (user ID + timestamp) for usage proto generation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
proto/usage.proto Adds message/enum definitions and RPC message shapes for usage alerting rules, including timestamps and creator identity.
proto/BUILD Wires in user_id and timestamp deps for proto / TS / Go generation of the updated usage protos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bduffany bduffany force-pushed the usage-alerts-proto branch 3 times, most recently from 8bf8ad2 to 080e97e Compare April 29, 2026 19:37
Comment thread proto/usage.proto Outdated
map<string, string> labels = 2;

// Alerts are triggered if the usage exceeds this limit.
int64 threshold = 3;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another type of alert that may be popular is "Alert if this metric is X% higher than last week". Even if we don't support that in v0, the proto should be flexible enough to add that later

In check-metrics.yaml, we have a concept of "absolute" and "relative" thresholds. Maybe something similar could apply here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to "absolute_threshold" and we can add "relative_threshold" later 👍

Comment thread proto/usage.proto
}

// Usage alerting rule status.
message UsageAlertingRuleStatus {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this a bit in our meeting, but could we use a scheduled Workflow in the -internal repo rather than a k8s cron job? For example, we'd create one scheduled workflow per alert. Then we wouldn't have to re-implement some of the logic around when to evaluate the rules

This should also make editing the rules a bit easier. It does not sound like a pleasant UI experience to clone and delete rules to edit them 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open to it - once scheduled workflows are battle-tested in prod, we could consider switching this over fully. In the meantime we could maybe extract a cronutil package once we see a common pattern emerging in the evaluation logic

@bduffany bduffany force-pushed the usage-alerts-proto branch from 080e97e to 6d80add Compare April 29, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants