Closed
Description
Following the recent Prometheus outage on the OSDF director, implementing a health check for the Prometheus server would be highly beneficial. Querying process_start_time_seconds
should provide a simple way to verify if the server is responding with valid data.
Currently, the pelican_component_health_status
metric tracks the health of various Pelican services using a component
label. Adding a prometheus
label value to this metric would be the most effective way to integrate Prometheus health monitoring.