Skip to content

[NA] [BE][FE] feat: Add Feishu as native alert destination#5775

Open
rusherman wants to merge 2 commits intocomet-ml:mainfrom
rusherman:rusherman/NA-add-feishu-alert-type
Open

[NA] [BE][FE] feat: Add Feishu as native alert destination#5775
rusherman wants to merge 2 commits intocomet-ml:mainfrom
rusherman:rusherman/NA-add-feishu-alert-type

Conversation

@rusherman
Copy link
Copy Markdown

Details

Add Feishu/Lark (飞书) as the fourth native alert type alongside General, Slack, and PagerDuty.
Uses Feishu Interactive Card format for rich alert display in group chats via custom bot webhooks, covering all 10 alert event types with color-coded cards and "View in Opik" action buttons.

  • Backend: DB migration, AlertType enum, 6 Feishu card model classes, FeishuWebhookPayloadMapper, AlertPayloadAdapter integration
  • Frontend: ALERT_TYPE enum, Feishu SVG icon, helpers config, DestinationSelector update
  • v1 scope: no Feishu signature verification (users can use IP allowlist); signature support planned for follow-up PR

Change checklist

  • User facing
  • Documentation update

Issues

  • NA (new feature contribution)

AI-WATERMARK

AI-WATERMARK: yes

  • If yes:
    • Tools: Claude Code CLI
    • Model(s): Claude Opus 4.6
    • Scope: Full implementation (backend models, mapper, migration, frontend types/icons/config, unit tests)
    • Human verification: Code reviewed, all 36 unit tests passing

Testing

Commands run:

cd apps/opik-backend
mvn test -pl . -Dtest="com.comet.opik.api.resources.v1.events.webhooks.feishu.FeishuWebhookPayloadMapperTest" -Dsurefire.useFile=false

Result: 36 tests run, 0 failures, 0 errors, 0 skipped

Scenarios validated:

  • Card structure: msg_type=interactive, header with plain_text, div with lark_md, action with primary button
  • Template colors: red (TRACE_ERRORS), orange (threshold/guardrail events), blue (prompt/experiment events)
  • All 10 event types: PROMPT_CREATED, PROMPT_DELETED, PROMPT_COMMITTED, TRACE_ERRORS, TRACE_FEEDBACK_SCORE, TRACE_THREAD_FEEDBACK_SCORE, TRACE_COST, TRACE_LATENCY, TRACE_GUARDRAILS_TRIGGERED, EXPERIMENT_FINISHED
  • Empty metadata handling for all event types
  • Default base_url fallback when alertMetadata is empty
  • Action URL routing: single project → project link, multiple projects → projects list, thread tab type for TRACE_THREAD_FEEDBACK_SCORE
  • Cost prefix ($) and latency suffix (s) formatting
  • Window duration formatting (seconds/minutes/hours/days)

Environment: macOS, Java 25, Maven, local process mode

Tests not run: Frontend build (requires full node_modules setup), E2E integration tests (require Docker infrastructure)

Documentation

No documentation update needed for this PR. Feishu alert type will be auto-discovered by existing UI and API patterns.

Add Feishu/Lark as the fourth native alert type alongside General,
Slack, and PagerDuty. Uses Feishu Interactive Card format for rich
alert display in group chats via custom bot webhooks.

Backend:
- DB migration to add 'feishu' to alert_type ENUM
- FEISHU enum value in AlertType
- Feishu card payload model classes (6 files)
- FeishuWebhookPayloadMapper handling all 10 event types
- AlertPayloadAdapter integration
- Unit tests (36 tests, all passing)

Frontend:
- ALERT_TYPE enum extended with 'feishu'
- Feishu SVG icon
- Labels, icons, and field mappings in helpers
- DestinationSelector updated
@github-actions github-actions bot added java Pull requests that update Java code Frontend Backend tests Including test files, or tests related like configuration. typescript *.ts *.tsx labels Mar 22, 2026
Comment on lines +75 to +85
private static String buildContent(@NonNull WebhookEvent<Map<String, Object>> event) {
List<?> metadata = (List<?>) event.getPayload().getOrDefault("metadata", List.of());
int count = metadata.size();
String eventTypeName = formatEventType(event.getEventType());

String summary = String.format("**%d** new %s event%s happened",
count, eventTypeName, count != 1 ? "s" : "");

var metadataUrl = event.getAlertMetadata().getOrDefault(BASE_URL_METADATA_KEY, DEFAULT_BASE_URL);
metadataUrl = metadataUrl.endsWith("/") ? metadataUrl : metadataUrl + "/";
String baseUrl = metadataUrl + event.getWorkspaceName();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata and metadataUrl can be null if the payload contains explicit null values, causing NPE on metadata.size() and metadataUrl.endsWith("/") — should we normalize both with Optional.ofNullable(...).orElse(...) at extraction (lines 75-84 and 108-115 in buildContent and buildActionUrl)?

Finding types: Explicit null contracts | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In
apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/events/webhooks/feishu/FeishuWebhookPayloadMapper.java
around lines 75 to 84 (buildContent method) and lines 108 to 115 (buildActionUrl
method), the code uses event.getPayload().getOrDefault("metadata", List.of()) and
event.getAlertMetadata().getOrDefault(BASE_URL_METADATA_KEY, DEFAULT_BASE_URL) which can
return null if the payload explicitly contains "metadata": null or base_url->null,
leading to NPE when calling metadata.size() or metadataUrl.endsWith("/"). Replace the
metadata assignment with var metadata = Optional.ofNullable((List<?>)
event.getPayload().get("metadata")).orElse(List.of()) in both methods. Replace the
metadataUrl assignment with var metadataUrl =
Optional.ofNullable(event.getAlertMetadata().get(BASE_URL_METADATA_KEY)).orElse(DEFAULT_BASE_URL)
(or ObjectUtils.defaultIfNull(...)) in both methods before calling endsWith. Apply these
changes consistently in both buildContent and buildActionUrl to ensure downstream logic
always receives non-null values.

Comment on lines +231 to +242
private static String formatMetricsAlertPayload(@NonNull MetricsAlertPayload payload, String type) {
try {
String windowDuration = formatWindowDuration(payload.windowSeconds());

String scope = (payload.projectIds() == null || payload.projectIds().isEmpty())
? "*Workspace-wide*"
: TemplateUtils.newST(PROJECTS_TEMPLATE)
.add("projectNames", payload.projectNames())
.render();

String valuePrefix = type.equals("Cost") ? "$" : "";
String valueSuffix = type.equals("Latency") ? " s" : "";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatMetricsAlertPayload catches Exception and hides unrelated failures, should we catch the template-specific exceptions from TemplateUtils.newST(...).render() (e.g. IllegalArgumentException) or at least add a TODO if the broad catch must remain?

Finding type: Use specific exceptions | Severity: 🟠 Medium


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In
apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/events/webhooks/feishu/FeishuWebhookPayloadMapper.java
around lines 231 to 261, the method formatMetricsAlertPayload currently catches
java.lang.Exception which masks unrelated failures. Replace the broad catch with
specific exception types thrown by the template/rendering path (for example
IllegalArgumentException and the template library's rendering exception — or
TemplateException if applicable), logging the error as now and returning the fallback
string only for those specific cases. If you must temporarily keep a broad catch while
we confirm the exact library exceptions, add a TODO above the catch that lists the
follow-up: replace the broad catch with the exact template/render exceptions (e.g.,
TemplateException, IllegalArgumentException) and include a short justification and
ticket reference.

Comment on lines +264 to +268
private static String formatWindowDuration(long seconds) {
Duration duration = Duration.ofSeconds(seconds);

if (duration.toDays() > 0) {
long days = duration.toDays();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatWindowDuration is duplicated across mappers — should we extract AlertFormattingUtils.formatWindowDuration and reuse it?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Comment on lines +291 to +295
private static String formatEventType(@NonNull AlertEventType eventType) {
return switch (eventType) {
case PROMPT_CREATED -> "Prompt Created";
case PROMPT_DELETED -> "Prompt Deleted";
case PROMPT_COMMITTED -> "Prompt Committed";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatEventType duplicates SlackWebhookPayloadMapper.formatEventType, should we extract the switch into AlertEventDescriptions.formatEventType and reuse it from both mappers?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Comment on lines +5 to +8
ALTER TABLE alerts
MODIFY COLUMN alert_type ENUM('general', 'slack', 'pagerduty', 'feishu') NOT NULL DEFAULT 'general';

--rollback ALTER TABLE alerts MODIFY COLUMN alert_type ENUM('general', 'slack', 'pagerduty') NOT NULL DEFAULT 'general';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollback tries to remove feishu from the enum but will fail if rows exist with alert_type = 'feishu', should we mark this migration forward-only and document remediation like -- forward-only; restore from backup or delete feishu rows before reverting?

Finding type: Migration rollback safety | Severity: 🟠 Medium


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In
apps/opik-backend/src/main/resources/liquibase/db-app-state/migrations/000059_add_feishu_alert_type.sql
around lines 5-8, the rollback currently tries to remove the 'feishu' enum value which
will fail if any rows use it. Replace the rollback line with a documented forward-only
note: remove the rollback SQL and add a comment like '-- forward-only: cannot safely
remove enum value; to revert, restore from backup or first migrate/delete rows with
alert_type = ''feishu'' before altering the column'. Ensure the changeset is clearly
marked as irreversible and include suggested remediation steps so operators know how to
safely revert if needed.

Comment on lines 12 to +16
public enum AlertType {
GENERAL("general"),
SLACK("slack"),
PAGERDUTY("pagerduty");
PAGERDUTY("pagerduty"),
FEISHU("feishu");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlertType added FEISHU but the OpenAPI/SDKs weren't regenerated so clients will throw on alert_type: "feishu"; should we update the API/SDKs or document compatibility and allow unknown enum members?

Finding type: Breaking Changes | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-backend/src/main/java/com/comet/opik/api/AlertType.java around lines 12-16,
the enum AlertType was extended with FEISHU which will break existing SDKs and OpenAPI
consumers. Update the API surface by either: 1) regenerating the OpenAPI definition and
all public SDKs/serializers (TypeScript AlertPublicAlertType and any core.enum
declarations) so they include "feishu", or 2) make enum deserialization
backward-compatible by adding an explicit UNKNOWN/UNRECOGNIZED enum constant and a
JsonCreator/custom deserializer that maps unknown string values to UNKNOWN (and keep the
JsonValue for known values). Also add a short compatibility note to the API changelog
and add a unit/integration test that verifies clients can handle an incoming alert_type:
"feishu" without throwing.

Comment on lines +369 to +374
[ALERT_TYPE.feishu]: [
{
sourceField: "name",
targetPath: "card.header.title.content",
},
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALERT_FIELD_MAPPINGS maps name to card.header.title.content for Feishu, overwriting the backend 'Opik Alert: ' prefix — should we remove that mapping for FEISHU or add the same prefix here?

Finding type: Type Inconsistency | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-frontend/src/v1/pages/AlertsPage/AddEditAlertPage/helpers.ts around lines
369-374, the ALERT_FIELD_MAPPINGS entry for ALERT_TYPE.feishu sets sourceField "name" ->
targetPath "card.header.title.content", which will overwrite the backend's prefixed
header ("Opik Alert: "+name). Remove this feishu mapping (or if you intentionally want
the UI example to include the prefix, change the mapping to prepend "Opik Alert: " to
the source) so the frontend example matches the backend payload. After making the
change, review
apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/events/webhooks/feishu/FeishuWebhookPayloadMapper.java
around the header.title builder to confirm the backend behavior is preserved.

@rusherman rusherman marked this pull request as ready for review March 22, 2026 03:22
@rusherman rusherman requested a review from a team as a code owner March 22, 2026 03:22
Copy link
Copy Markdown
Collaborator

@aadereiko aadereiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for your contribution!
The frontend code looks good, but could you please share some videos demonstrating your changes? It would help us review everything more thoroughly :)

Thanks again!

Copy link
Copy Markdown
Author

@rusherman rusherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aadereiko, thanks for the review!

Here's a video demo showing the Feishu alert integration in action:

🎬 Demo Video: https://www.youtube.com/watch?v=yE7wB1xJXLA

The video covers:

  • Selecting Feishu as an alert destination in the UI
  • Configuring the Feishu webhook URL
  • Alert cards delivered to Feishu group chat with color-coded headers and "View in Opik" action buttons

Let me know if you need anything else!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Frontend java Pull requests that update Java code tests Including test files, or tests related like configuration. typescript *.ts *.tsx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants