Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the usage protobuf API surface to support usage-based alerting rules, including rule metadata, configuration, status, and basic CRUD request/response messages.
Changes:
- Added new usage alerting rule protos (
UsageAlertingRule, metadata/config/status, andUsageAlertingWindowenum) toproto/usage.proto. - Added request/response messages for listing, creating, and deleting usage alerting rules.
- Updated Bazel proto build targets to include new proto dependencies (user ID + timestamp) for
usageproto generation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
proto/usage.proto |
Adds message/enum definitions and RPC message shapes for usage alerting rules, including timestamps and creator identity. |
proto/BUILD |
Wires in user_id and timestamp deps for proto / TS / Go generation of the updated usage protos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8bf8ad2 to
080e97e
Compare
| map<string, string> labels = 2; | ||
|
|
||
| // Alerts are triggered if the usage exceeds this limit. | ||
| int64 threshold = 3; |
There was a problem hiding this comment.
Another type of alert that may be popular is "Alert if this metric is X% higher than last week". Even if we don't support that in v0, the proto should be flexible enough to add that later
In check-metrics.yaml, we have a concept of "absolute" and "relative" thresholds. Maybe something similar could apply here?
There was a problem hiding this comment.
Renamed to "absolute_threshold" and we can add "relative_threshold" later 👍
| } | ||
|
|
||
| // Usage alerting rule status. | ||
| message UsageAlertingRuleStatus { |
There was a problem hiding this comment.
Discussed this a bit in our meeting, but could we use a scheduled Workflow in the -internal repo rather than a k8s cron job? For example, we'd create one scheduled workflow per alert. Then we wouldn't have to re-implement some of the logic around when to evaluate the rules
This should also make editing the rules a bit easier. It does not sound like a pleasant UI experience to clone and delete rules to edit them 😅
There was a problem hiding this comment.
open to it - once scheduled workflows are battle-tested in prod, we could consider switching this over fully. In the meantime we could maybe extract a cronutil package once we see a common pattern emerging in the evaluation logic
080e97e to
6d80add
Compare
Adds protos for basic usage alerting. For v0, I'm thinking the system can look something close to this:
* Only adding Create, List, and Delete APIs for now. "Update" can be done by deleting and recreating a rule (could maybe add a "clone" button in the UI to make it slightly less tedious, or implement Update later if we have a need)
** If the cronjob fails between steps 3 and 4, we may send duplicate alerts (at-least-once semantics). As a "nice to have," if we choose an email delivery provider that supports idempotency keys, then we can get exactly-once delivery semantics.