Skip to content

Implement external monitoring of public cluster #507

@jpswinski

Description

@jpswinski

Consider setting up a monitor that periodically makes a simple request to the public cluster (maybe every minute, or every 5 minutes), and if the request fails, there is a text message or email that is sent to the sliderule developers.

This could also be combined with (or implemented as) the monitoring functionality in Grafana. We have metrics on the number of container restarts, nodes registered, and discovery failures - could we put together some heuristics that determine when to alert the developers from those?

Metadata

Metadata

Assignees

No one assigned

    Labels

    InfrastructureInfrastructure as Code, Orchestrator, NetworkObservabilityLogs, metrics, traces, alerts, telemetry

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions