- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2k
 
Description
When a Nomad client starts, it fingerprints its environment with a collection of “fingerprinters”. If these fingerprinter fail or timeout, the fingerprint will be incomplete.
A very small set of fingerprinters implement the Periodic method to re-run every 30s (Consul, Vault, secrets plugins, and dynamic host volume plugins) but these have to be carefully written to avoid a cluster-wide change like a brief Vault unavailability on a large cluster causing a widespread fingerprint update and subsequent millions of evaluations. (Ask me how I know about this 😁 )
The periodic fingerprinters all also implement Reload which re-runs the fingerprint on SIGHUP, when we reload the agent configuration file and a limited subset of the config. The CNI fingerprinter is also reloadable.
But that leaves the fingerprinters for CPU, memory, networking, storage, and cloud metadata (AWS, GCP, Azure, and Digital Ocean) as non-reloadable without restarting the Nomad agent, even though it's possible for these values to change after client start without rebooting the whole host.
We've had reports of the AWS metadata endpoint not coming up within the timeout, leaving an agent without its correct AWS fingerprint, and this meant the user had to script a check for that metadata and then restart the agent. There are also other open issues asking for more fingerprints to be reloadable:
- network fingerprinting can't detect changes without restart #23526
 - Refingerprint available memory on HUP/reload #18327
 
So it seems like it might be a good idea to run most of the fingerprinters again on SIGHUP. Alternately, we could use a different user signal to reload the fingerprints vs the agent configuration.