Skip to content

Identity[consul_default]: new Identity token acquired: triggers per JWT TTL, not Consul token validity #25761

Open
@jinnatar

Description

@jinnatar

Nomad version

Output from nomad version:

Nomad v1.9.7
BuildDate 2025-03-11T09:07:15Z
Revision f869597+CHANGES

Operating system and Environment details

Debian 12

Issue

A task with a Consul WI & env = true can successfully retrieve a Consul token that is kept alive an functional seemingly indefinitely. However, as per the jwt ttl config as the jwt token is renewed, change_mode is triggered even if the provided Consul token is still fully valid and functional.

As the guidance and warnings for env = true suggest, if the token is indeed actually invalidated then only a change_mode = "restart" will get it successfully into the env. However, this bug means that the task is restarted after every ttl renewal, even though that's not necessary. As a workaround one can just set change_mode to noop and hope for the best, but if indeed at a renewal a new Consul token is issued, then you're screwed.

I am unable to easily test this against v1.10.0 as the issue is preventing full migration to WI, which is mandatory from that version onwards.

Reproduction steps

  1. Configure Nomad Consul WI integration as per tutorial, sans in my case I've simplified the relevant task acl-binding to:
6acbee09-bcb1-a92b-d257-79f388eafeb7:
   AuthMethod:   nomad-workloads
   Description:  
   BindType:     policy
   BindName:     task_${value.nomad_task}
   Selector:     "nomad_task" in value
  1. Run the provided sample job & configure a matching task_repro-task policy with any sane ACL rules.
  2. Observe the provisioned token (lightly sensored in sample output):
%> nomad alloc exec 24d69997 sh -c 'cat /secrets/consul_token && echo && env | grep CONSUL'
addd87ab-xxx
CONSUL_HTTP_TOKEN=addd87ab-xxx
CONSUL_TOKEN=addd87ab-xxx
  1. Verify the token is functional, for example:
CONSUL_HTTP_TOKEN=addd87ab-xxx consul kv get templates/foo
  1. Continue running the same test every ~1 minute. The sample job is purposefully very low TTL, but the same phenomenon happens with longer TTL such as 1h.

Expected Result

The token remains static and consistent (as per above test) as long as Nomad continues renewing the JWT side of things and no external force revokes the generated token. change_mode is not triggered because everything rendered into the task is still the same, i.e. the Consul token SecretID has not changed.

Actual Result

The test results remain consistent, the token rendered to env & /secrets/consul_token remains stable and the token is fully functional.
However, in the task log, every minute we see:

Apr 25, '25 16:40:10 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:39:07 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:38:07 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:37:04 +0300 Signaling Identity[consul_default]: new Identity token acquired
Apr 25, '25 16:36:03 +0300 Signaling Identity[consul_default]: new Identity token acquired

If I had not downgraded the change_mode to signal, every single one of these becomes a task restart. However, no signal or restart should be triggered because there is no new token to load into the env.

Job file (if appropriate)

job "consul-wi-repro" {
  datacenters = ["dc1"]
  type        = "service"
  group "main" {
    count = 1
    task "repro-task" {
      driver = "docker"
      consul {}
      identity {
        name = "consul_default"
        aud  = ["consul.io"]
        ttl  = "2m"
        env  = true
        # Normally restart, set to signal for easier observation
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }
      config {
        image = "alpine"
        args = [
          "tail", "-f", "/dev/null"
        ]
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

At INFO level, Only the task event is logged:

message="2025-04-25T16:41:15.370+0300 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=24d69997-c60c-05d9-6165-919aff092862 task=repro-task type=Signaling msg="Identity[consul_default]: new Identity token acquired" failed=false"

Other artifacts

For posterity, here's the sensored Consul token as pulled from Consul:

%> consul acl token read -meta -accessor-id b6506b8e-61cc-a2b9-ec48-47e89db80fd0
AccessorID:       b6506b8e-61cc-a2b9-ec48-47e89db80fd0
SecretID:         addd87ab-xxx
Description:      token created via login: {"requested_by":"nomad_task_repro-task"}
Local:            true
Auth Method:      nomad-workloads (Namespace: )
Create Time:      2025-04-25 16:20:21.727917465 +0300 EEST
Hash:             fc52e95f3e3210ddf6abbe08b8f4be237bd11314587cxxx
Create Index:     36397832
Modify Index:     36397832
Policies:
   4126e063-80b3-28ad-387b-ac0f61879437 - task_repro-task

And the slightly censored Consul nomad-workloads config:

Name:          nomad-workloads
Type:          jwt
Description:   
Config:
{
  "BoundAudiences": [
    "consul.io"
  ],
  "ClaimMappings": {
    "nomad_job_id": "nomad_job_id",
    "nomad_namespace": "nomad_namespace",
    "nomad_service": "nomad_service",
    "nomad_task": "nomad_task"
  },
  "JWKSURL": "https://nomad.example.com/.well-known/jwks.json",
  "JWTSupportedAlgs": [
    "RS256"
  ]
}

And example Consul policy used for the test:

ID:           4126e063-80b3-28ad-387b-ac0f61879437
Name:         task_repro-task
Description:  
Datacenters:  
Rules:
# policy-name: task_repro-task

# General access
service_prefix "" {
        policy = "read"
}
node_prefix "" {
        policy = "read"
}

# KV access
key_prefix "templates/" {
        policy = "read"
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Roadmapping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions