Problem
When using the HTTP webhook publishers (workflow_publisher for workflows, task_publisher for tasks), there is no observability into publish success or failure. All outcomes are only logged via SLF4J — no Prometheus/Micrometer counters, timers, or gauges are exposed.
This means operators cannot:
- Alert on webhook delivery failures
- Track publish latency or throughput
- Monitor notification queue depth or saturation
Meanwhile, the archive listeners in the same module already integrate with Monitors:
Archive listener — has metrics:
// ArchivingWorkflowStatusListener.java ✅
Monitors.recordWorkflowArchived(workflow.getWorkflowName(), workflow.getStatus());
Webhook publisher — no metrics:
// StatusChangePublisher.java ❌
LOGGER.debug("Workflow {} publish is successful.", statusChangeNotification.getWorkflowId());
// ... catch block: only LOGGER.error
Proposed Solution
Replicate the existing Monitors static facade pattern (used by archive listeners) into the webhook publishers. This is the same approach already proven in ArchivingWorkflowStatusListener and ArchivingWithTTLWorkflowStatusListener.
New methods in Monitors.java
// Counters
public static void recordWebhookPublishSuccess(String notificationType, String name, String status) { ... }
public static void recordWebhookPublishFailure(String notificationType, String name, String errorType) { ... }
public static void recordWebhookEnqueueFailure(String notificationType, String name) { ... }
// Gauge
public static void recordWebhookQueueDepth(String notificationType, int size) { ... }
| Metric |
Where |
Trigger |
| webhook_publish_success |
StatusChangePublisher.ConsumerThread.run() — after successful publishStatusChangeNotification() |
HTTP 200/202 from webhook endpoint |
| webhook_publish_success |
TaskStatusPublisher.ConsumerThread.run() — after successful publishTaskNotification() |
HTTP 200/202 from webhook endpoint |
| webhook_publish_failure |
Same two locations — in the catch blocks |
Any exception during publish |
| webhook_enqueue_failure |
StatusChangePublisher.enqueueWorkflow() and TaskStatusPublisher.enqueueTask() |
BlockingQueue.put() failure |
| webhook_queue_depth |
StatusChangePublisher and TaskStatusPublisher — after enqueue |
Queue size change |
Example: StatusChangePublisher (before → after)
Before (current — log only):
try {
workflow = blockingQueue.take();
statusChangeNotification = new StatusChangeNotification(workflow.toWorkflow());
publishStatusChangeNotification(statusChangeNotification);
LOGGER.debug("Workflow {} publish is successful.", statusChangeNotification.getWorkflowId());
} catch (Exception e) {
LOGGER.error("Error on publishing workflow", e);
}
After (with metrics):
try {
workflow = blockingQueue.take();
statusChangeNotification = new StatusChangeNotification(workflow.toWorkflow());
publishStatusChangeNotification(statusChangeNotification);
LOGGER.debug("Workflow {} publish is successful.", statusChangeNotification.getWorkflowId());
Monitors.recordWebhookPublishSuccess("WORKFLOW", workflow.getWorkflowName(), workflow.getStatus().name());
} catch (Exception e) {
LOGGER.error("Error on publishing workflow", e);
Monitors.recordWebhookPublishFailure("WORKFLOW", workflow.getWorkflowName(), e.getClass().getSimpleName());
}
Why This Approach
- Proven pattern — Archive listeners already use
Monitors the same way; this is not a new integration pattern
- Zero new dependencies — Both
workflow-event-listener and task-status-listener already depend on conductor-core, which transitively provides Monitors and the full Micrometer stack
- Consistent — Webhook publishers will use the same observability mechanism as archive listeners
Implementation Impact
Files to modify:
core/src/main/java/com/netflix/conductor/metrics/Monitors.java — Add new recordWebhookPublish* static methods
workflow-event-listener/.../statuschange/StatusChangePublisher.java — Add Monitors calls in consumer loop success/failure paths and enqueue failure
task-status-listener/.../TaskStatusPublisher.java — Add Monitors calls in consumer loop success/failure paths and enqueue failure
No build.gradle changes required.
Backward Compatible
Yes — fully backward compatible:
- Additive only — New methods added to Monitors, new call sites added to publishers. No existing method signatures change
- No config changes — No new properties or flags required
- Lazy metrics — Counters and gauges only materialize in the Prometheus endpoint when first incremented. If webhook listeners are not enabled, these metrics never appear
- No API changes — No REST endpoints, payloads, or wire formats change
Problem
When using the HTTP webhook publishers (
workflow_publisherfor workflows,task_publisherfor tasks), there is no observability into publish success or failure. All outcomes are only logged via SLF4J — no Prometheus/Micrometer counters, timers, or gauges are exposed.This means operators cannot:
Meanwhile, the archive listeners in the same module already integrate with
Monitors:Archive listener — has metrics:
Webhook publisher — no metrics:
Proposed Solution
Replicate the existing Monitors static facade pattern (used by archive listeners) into the webhook publishers. This is the same approach already proven in
ArchivingWorkflowStatusListenerandArchivingWithTTLWorkflowStatusListener.New methods in
Monitors.javaExample: StatusChangePublisher (before → after)
Before (current — log only):
After (with metrics):
Why This Approach
Monitorsthe same way; this is not a new integration patternworkflow-event-listenerandtask-status-listeneralready depend onconductor-core, which transitively provides Monitors and the full Micrometer stackImplementation Impact
Files to modify:
core/src/main/java/com/netflix/conductor/metrics/Monitors.java— Add new recordWebhookPublish* static methodsworkflow-event-listener/.../statuschange/StatusChangePublisher.java— Add Monitors calls in consumer loop success/failure paths and enqueue failuretask-status-listener/.../TaskStatusPublisher.java— Add Monitors calls in consumer loop success/failure paths and enqueue failureNo
build.gradlechanges required.Backward Compatible
Yes — fully backward compatible: