-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Prerequisites
- I searched existing issues
- I can reproduce this issue
Bug Description
I upgraded nvsentinel on a cluster with 250 nodes and it rolled out all the daemonsets
50 syslog health monitor were crashing with the error :
{"time":"2025-12-17T10:11:20.435006509Z","level":"ERROR","msg":"Fatal error","module":"syslog-health-monitor","version":"dev","error":"failed to create gRPC client after retries: platform connector socket file not found after retries: stat /var/run/nvsentinel.sock: no such file or directory"}
As platform connector is reponsible to initiate the socket, I had to restart platform connector on the node then restart syslog health monitor, and it worked.
So it seems that there is a race condition, and that we must ensure that platform connector is ready before syslog monitor starts
Component
Health Monitor
Steps to Reproduce
Cannot reproduce
Environment
- NVSentinel version: v0.3.0
- Kubernetes version: 1.29
- Deployment method: argocd/kustomize
Logs/Output
{"time":"2025-12-17T10:11:20.435006509Z","level":"ERROR","msg":"Fatal error","module":"syslog-health-monitor","version":"dev","error":"failed to create gRPC client after retries: platform connector socket file not found after retries: stat /var/run/nvsentinel.sock: no such file or directory"}
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working