Description
Nomad version
Output from nomad version
:
Nomad v1.9.7
BuildDate 2025-03-11T09:07:15Z
Revision f869597+CHANGES
Operating system and Environment details
Debian 12
Issue
A task with a Consul WI & env = true
can successfully retrieve a Consul token that is kept alive an functional seemingly indefinitely. However, as per the jwt ttl
config as the jwt token is renewed, change_mode
is triggered even if the provided Consul token is still fully valid and functional.
As the guidance and warnings for env = true
suggest, if the token is indeed actually invalidated then only a change_mode = "restart"
will get it successfully into the env. However, this bug means that the task is restarted after every ttl renewal, even though that's not necessary. As a workaround one can just set change_mode
to noop
and hope for the best, but if indeed at a renewal a new Consul token is issued, then you're screwed.
I am unable to easily test this against v1.10.0
as the issue is preventing full migration to WI, which is mandatory from that version onwards.
Reproduction steps
- Configure Nomad Consul WI integration as per tutorial, sans in my case I've simplified the relevant task acl-binding to:
6acbee09-bcb1-a92b-d257-79f388eafeb7:
AuthMethod: nomad-workloads
Description:
BindType: policy
BindName: task_${value.nomad_task}
Selector: "nomad_task" in value
- Run the provided sample job & configure a matching
task_repro-task
policy with any sane ACL rules. - Observe the provisioned token (lightly sensored in sample output):
%> nomad alloc exec 24d69997 sh -c 'cat /secrets/consul_token && echo && env | grep CONSUL'
addd87ab-xxx
CONSUL_HTTP_TOKEN=addd87ab-xxx
CONSUL_TOKEN=addd87ab-xxx
- Verify the token is functional, for example:
CONSUL_HTTP_TOKEN=addd87ab-xxx consul kv get templates/foo
- Continue running the same test every ~1 minute. The sample job is purposefully very low TTL, but the same phenomenon happens with longer TTL such as 1h.
Expected Result
The token remains static and consistent (as per above test) as long as Nomad continues renewing the JWT side of things and no external force revokes the generated token. change_mode
is not triggered because everything rendered into the task is still the same, i.e. the Consul token SecretID
has not changed.
Actual Result
The test results remain consistent, the token rendered to env & /secrets/consul_token
remains stable and the token is fully functional.
However, in the task log, every minute we see:
Apr 25, '25 16:40:10 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:39:07 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:38:07 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:37:04 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:36:03 +0300 Signaling Identity[consul_default]: new Identity token acquired
If I had not downgraded the change_mode
to signal, every single one of these becomes a task restart. However, no signal or restart should be triggered because there is no new token to load into the env.
Job file (if appropriate)
job "consul-wi-repro" {
datacenters = ["dc1"]
type = "service"
group "main" {
count = 1
task "repro-task" {
driver = "docker"
consul {}
identity {
name = "consul_default"
aud = ["consul.io"]
ttl = "2m"
env = true
# Normally restart, set to signal for easier observation
change_mode = "signal"
change_signal = "SIGHUP"
}
config {
image = "alpine"
args = [
"tail", "-f", "/dev/null"
]
}
}
}
}
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
At INFO level, Only the task event is logged:
message="2025-04-25T16:41:15.370+0300 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=24d69997-c60c-05d9-6165-919aff092862 task=repro-task type=Signaling msg="Identity[consul_default]: new Identity token acquired" failed=false"
Other artifacts
For posterity, here's the sensored Consul token as pulled from Consul:
%> consul acl token read -meta -accessor-id b6506b8e-61cc-a2b9-ec48-47e89db80fd0
AccessorID: b6506b8e-61cc-a2b9-ec48-47e89db80fd0
SecretID: addd87ab-xxx
Description: token created via login: {"requested_by":"nomad_task_repro-task"}
Local: true
Auth Method: nomad-workloads (Namespace: )
Create Time: 2025-04-25 16:20:21.727917465 +0300 EEST
Hash: fc52e95f3e3210ddf6abbe08b8f4be237bd11314587cxxx
Create Index: 36397832
Modify Index: 36397832
Policies:
4126e063-80b3-28ad-387b-ac0f61879437 - task_repro-task
And the slightly censored Consul nomad-workloads
config:
Name: nomad-workloads
Type: jwt
Description:
Config:
{
"BoundAudiences": [
"consul.io"
],
"ClaimMappings": {
"nomad_job_id": "nomad_job_id",
"nomad_namespace": "nomad_namespace",
"nomad_service": "nomad_service",
"nomad_task": "nomad_task"
},
"JWKSURL": "https://nomad.example.com/.well-known/jwks.json",
"JWTSupportedAlgs": [
"RS256"
]
}
And example Consul policy used for the test:
ID: 4126e063-80b3-28ad-387b-ac0f61879437
Name: task_repro-task
Description:
Datacenters:
Rules:
# policy-name: task_repro-task
# General access
service_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "read"
}
# KV access
key_prefix "templates/" {
policy = "read"
}
Metadata
Metadata
Assignees
Type
Projects
Status