Description
Hi! The investigation on this issue is still ongoing, but I wanted to report issues with the latest release as soon as I got them.
Nomad version
Nomad v1.10.0
BuildDate 2025-04-09T16:40:54Z
Revision e26a2bd
Operating system and Environment details
Ubuntu 24.04.2 LTS
Issue
Memory usage steadily increasing for Nomad Server instances.
Context to this screenshot:
- The upgrade to v1.10.0 was yesterday between 12:00 and 16:00.
- Regular SIGHUP-reloads are happening due to certificate renewals by consul-template.
- The cluster has been wanting a new leader quite a lot since the upgrade. Easily once every 10 minutes.
All three server nodes seem affected by this. For example regular spikes up to 3GB and then down to 300MB.
Reproduction steps
- Have a 3-node Server cluster
- Probably unrelated: We have 3 clients, and then another 3 Server nodes in another region federated. The problematic region is the authoritative one
- The memory usage in the federated (non-authorattive) region seems unaffected - stable and low.
- Wait a day
Expected Result
Memory usage to stay roughly the same.
Actual Result
Memory usage increasing.
Job file (if appropriate)
Not applicable.
Nomad Server logs (if appropriate)
Here I am noticing that my cluster became very unstable since the 1.10 upgrade.
Also worth noting: Network latency between nodes is less than 1ms (same rack), but the nomad.raft.leader.lastContact
metric showed values between 200 and 600ms (+= 200ms). Not great. This metric on the non-authoritative region is about 1-100 += 26ms, which is great (despite being slower hardware). According to Consul:
# roundtrip time
Minimum 0.43ms
Median 0.7ms
Maximum 0.8ms
{"@level":"info","@message":"added peer, starting replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:05.332364Z","peer":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3"}
{"@level":"warn","@message":"appendEntries rejected, sending older logs","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:05.338291Z","next":154265,"peer":{"Suffrage":1,"ID":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","Address":"172.16.1.236:4647"}}
{"@level":"info","@message":"pipelining replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:05.400207Z","peer":{"Suffrage":1,"ID":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","Address":"172.16.1.236:4647"}}
{"@level":"warn","@message":"failed to contact","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:09.187027Z","server-id":"607347df-65b5-95b8-2ab3-6e0138a30db7","time":508388073}
{"@level":"warn","@message":"failed to contact quorum of nodes, stepping down","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:09.187102Z"}
{"@level":"info","@message":"entering follower state","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:09.187137Z","follower":{},"leader-address":"","leader-id":""}
{"@level":"info","@message":"aborting pipeline replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:09.249755Z","peer":{"Suffrage":1,"ID":"607347df-65b5-95b8-2ab3-6e0138a30db7","Address":"172.16.1.235:4647"}}
{"@level":"info","@message":"cluster leadership lost","@module":"nomad","@timestamp":"2025-04-11T13:19:09.323127Z"}
{"@level":"info","@message":"aborting pipeline replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:09.560093Z","peer":{"Suffrage":1,"ID":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","Address":"172.16.1.236:4647"}}
{"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.787542Z","last-leader-addr":"","last-leader-id":""}
{"@level":"info","@message":"entering candidate state","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.787590Z","node":{},"term":495}
{"@level":"info","@message":"pre-vote successful, starting election","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.852815Z","refused":0,"tally":2,"term":495,"votesNeeded":2}
{"@level":"info","@message":"election won","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.906849Z","tally":2,"term":495}
{"@level":"info","@message":"entering leader state","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.906884Z","leader":{}}
{"@level":"info","@message":"added peer, starting replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.906904Z","peer":"607347df-65b5-95b8-2ab3-6e0138a30db7"}
{"@level":"info","@message":"added peer, starting replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.906916Z","peer":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3"}
{"@level":"info","@message":"cluster leadership acquired","@module":"nomad","@timestamp":"2025-04-11T13:19:10.907793Z"}
{"@level":"info","@message":"pipelining replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.933525Z","peer":{"Suffrage":0,"ID":"607347df-65b5-95b8-2ab3-6e0138a30db7","Address":"172.16.1.235:4647"}}
{"@level":"info","@message":"pipelining replication","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:10.933853Z","peer":{"Suffrage":1,"ID":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","Address":"172.16.1.236:4647"}}
{"@level":"info","@message":"eval broker status modified","@module":"nomad","@timestamp":"2025-04-11T13:19:10.956148Z","paused":false}
{"@level":"info","@message":"blocked evals status modified","@module":"nomad","@timestamp":"2025-04-11T13:19:10.956327Z","paused":false}
{"@level":"info","@message":"Promoting server","@module":"nomad.autopilot","@timestamp":"2025-04-11T13:19:20.958465Z","address":"172.16.1.236:4647","id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","name":"nomad03-bre.q-mex.net.bremen"}
{"@level":"info","@message":"updating configuration","@module":"nomad.raft","@timestamp":"2025-04-11T13:19:20.958561Z","command":0,"server-addr":"172.16.1.236:4647","server-id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","servers":"[{Suffrage:Voter ID:84f3b5a8-d979-5d41-c659-badc9ecce162 Address:172.16.1.234:4647} {Suffrage:Voter ID:607347df-65b5-95b8-2ab3-6e0138a30db7 Address:172.16.1.235:4647} {Suffrage:Voter ID:69c7c3cc-256f-e12e-e82c-68bb3f29e6a3 Address:172.16.1.236:4647}]"}
{"@level":"warn","@message":"failed to contact","@module":"nomad.raft","@timestamp":"2025-04-11T13:20:05.373354Z","server-id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","time":505378093}
{"@level":"warn","@message":"failed retrieving server health","@module":"nomad.stats_fetcher","@timestamp":"2025-04-11T13:20:05.963132Z","error":"context deadline exceeded","server":"nomad01-bre.q-mex.net.bremen"}
{"@level":"warn","@message":"failed to contact","@module":"nomad.raft","@timestamp":"2025-04-11T13:21:40.059996Z","server-id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","time":501376848}
{"@level":"warn","@message":"failed to contact","@module":"nomad.raft","@timestamp":"2025-04-11T13:22:09.619104Z","server-id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","time":519836348}
{"@level":"warn","@message":"failed to contact","@module":"nomad.raft","@timestamp":"2025-04-11T13:22:10.196529Z","server-id":"69c7c3cc-256f-e12e-e82c-68bb3f29e6a3","time":506603638}
Nomad Client logs (if appropriate)
Not applicable.
Metadata
Metadata
Assignees
Type
Projects
Status