Skip to content

Conversation

@iyashjayesh
Copy link
Contributor

@iyashjayesh iyashjayesh commented Oct 25, 2025

Summary

Implements comprehensive scheduler and job lifecycle monitoring as proposed in #785. This PR adds the complete SchedulerMonitor interface with all planned metrics for tracking scheduler health, job execution, and performance.

Changes

  • Add SchedulerMonitor interface with 12 event methods
  • Integrate monitor callbacks throughout scheduler lifecycle (scheduler.go)
  • Add execution time and scheduling delay tracking (executor.go)
  • Add concurrency limit notifications for singleton and limit modes
  • Comprehensive test coverage for all events
  • Update README with detailed documentation and examples

Implemented Metrics

Lifecycle:

  • SchedulerStarted - Called when Start() is invoked
  • SchedulerStopped - Called when scheduler stops (before cleanup)
  • SchedulerShutdown - Called when Shutdown() completes

Job Management:

  • JobRegistered - Track jobs added to scheduler
  • JobUnregistered - Track jobs removed from scheduler
  • JobStarted - Job begins execution
  • JobRunning - Job is actively running

Performance:

  • JobExecutionTime - Enables AverageExecutionTime calculations
  • JobSchedulingDelay - Enables SchedulingLag detection
  • ConcurrencyLimitReached - Detects singleton/limit mode constraints

Job Execution:

  • JobCompleted - Track successful completions
  • JobFailed - Track failures (enables ErrorRate calculation)

Derived Metrics (calculable by monitor implementations):

  • Error Rate: JobFailed / (JobCompleted + JobFailed)
  • Average Execution Time: from JobExecutionTime events
  • Active Jobs: JobRegistered - JobUnregistered
  • Current Queue Depth: JobStarted - (JobCompleted + JobFailed)

Use Cases

  • Prometheus/Grafana: Export metrics for visualization and alerting
  • Custom Dashboards: Build internal monitoring dashboards
  • Alerting Systems: Trigger alerts on high error rates or scheduling lag
  • Performance Analysis: Identify slow jobs and bottlenecks
  • Capacity Planning: Track queue depths and concurrency limits

@JohnRoesler Ready for review! All metrics from the original plan are now implemented and tested.

…vents for lifecycle, performance, and concurrency limits.
…d `SchedulerMonitor` details with Prometheus example
@iyashjayesh iyashjayesh marked this pull request as ready for review November 21, 2025 19:04
@iyashjayesh iyashjayesh changed the title [WIP] feat: add scheduler lifecycle monitoring <> #785 feat: add scheduler lifecycle monitoring <> #785 Nov 24, 2025
@iyashjayesh
Copy link
Contributor Author

@JohnRoesler Apart from the scope we discussed, can you review the PR once?

If needed, we can add any missing metrics. So far, I’ve tried to cover all the ones we discussed.

@JohnRoesler JohnRoesler merged commit f4c6d14 into go-co-op:v2 Dec 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants