Nomad CLI Monitor Dispatched Jobs by Juanadelacuesta · Pull Request #27541 · hashicorp/nomad

Juanadelacuesta · 2026-02-19T14:19:27Z

Description

This PR adds a more complete monitoring and follow up to the command 'nomad job dispatch' to be more similar to the output of the job run command.

Links

This is a request coming from the community:
NMD-396

Contributor Checklist

Changelog Entry If this PR changes user-facing behavior, please generate and add a
changelog entry using the make cl command.
Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
ensure regressions will be caught.
Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

Backport Labels Please add the correct backport labels as described by the internal
backporting document.
Commit Type Ensure the correct merge method is selected which should be "squash and merge"
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
within the public repository.

If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

gulducat

This looks really great! I do have one sticky issue for us to iron out though. Maybe worth discussing a bit as a team?

gulducat · 2026-02-23T18:35:31Z

+	// Running with desired=run is stable
+	if alloc.DesiredStatus == api.AllocDesiredStatusRun &&
+		alloc.ClientStatus == api.AllocClientStatusRunning {
+		return true
+	}


This behaves differently than a deployment, which has an update.min_healthy_time parameter. Here if a task doesn't fail immediately, then it passes though running until it exits, before getting restarted and going back to pending, then running, etc.

Example job sleepy-fail.nomad.hcl:

job "sleepy-fail" { type = "batch" parameterized {} group "g" { task "t" { driver = "raw_exec" config { command = "bash" args = ["-xc", "sleep 5; exit 1"] } } restart { attempts = 2 delay = "2s" } reschedule { attempts = 0 } } }

and this script dispatch.sh watches the status change over time:

#!/usr/bin/env bash job="${1:-sleepy-fail}" eval="$(nomad operator api -X PUT /v1/job/$job/dispatch/payload | jq -r '.EvalID')" [ -z "$eval" ] && { echo 'no eval...'; exit 1; } echo "eval: $eval" idx=0 s='na' while true; do eval "$( nomad operator api "/v1/evaluation/$eval/allocations?index=$idx" \ | jq -r '.[0] | "idx=\(.ModifyIndex); s=\(.ClientStatus)"' )" echo "$(date +'%H:%M:%S.%N') idx=$idx status=$s" [ "$s" == 'failed' ] && break [ -z "$s" ] && { echo 'no status...'; exit 1; } done

Here you can see it show running for the 5 seconds that the task sleeps without exiting, then get restarted a couple times before finally failing.

$ nomad run sleepy-fail.nomad.hcl Job registration successful $ ./dispatch.sh eval: 601f0f65-4293-1157-f567-ffc2ee8a9ef0 13:33:40.830825835 idx=838 status=pending 13:33:41.701519927 idx=840 status=running 13:33:45.968495758 idx=841 status=pending 13:33:48.630084597 idx=842 status=running 13:33:53.497547555 idx=843 status=pending 13:33:55.955814207 idx=844 status=running 13:34:01.027398738 idx=845 status=failed

I'm not sure what to do about it... Maybe since these are batch jobs, we should expect that the alloc must be complete,failed,lost,stopped,evicted, and ignore running altogether?

If we do that, then I think this new behavior should be gated behind a flag, because people may be dispatching stuff that takes a long time (or possibly even never exits, like a service job!), so we can't really default to waiting potentially forever...

After talking to some people, the idea was to monitor the job, not the "deployment" and in that case, as Daniel mentioned, the change is too big to leave as default so it is placed under the flag '-wait', it will wait for all the allocs in all the task groups are in a final state

gulducat

-wait is exactly what I was hoping for, thanks!

I have a couple comments being picky about words, but also another potential problem. Do we also want to monitor reschedule attempts? At present, it exits after the first alloc fails.

gulducat · 2026-02-24T18:22:08Z

+		// Count healthy (running) and unhealthy (failed) allocations
+		switch alloc.ClientStatus {
+		case api.AllocClientStatusRunning:
+			state.HealthyAllocs++
+		case api.AllocClientStatusFailed:
+			state.UnhealthyAllocs++
+		}


Most of my problems are caused by me fixating on words 😅

Here, while my job's task is running, but before it exits, it shows as "Healthy", but in truth Nomad doesn't know anything about its health. Service deployments are different because they actually inspect health checks.

With my sleepy-fail job, it's really not healthy, it just hasn't failed yet. This is what it shows until the task completes (fails), then it switches to Unhealthy=1 and exits.

⠴ Monitoring allocations for job "sleepy-f"... Deployed Task Group Desired Placed Healthy Unhealthy g 1 1 1 0

Personally I would prefer not to translate between concepts, and instead just say "Running" or "Failed", because that's all we actually know about the alloc.

gulducat · 2026-02-24T18:31:06Z

+			// Task group is complete when all desired allocations are terminal
+			if terminalCount < state.DesiredTotal {
+				allTaskGroupsComplete = false
+			}


I noticed something about my test job, not caused by only these lines, but related.

If I set reschedule.attempts = 2, then the command exits after the first allocation fails. Subsequent allocs are not monitored.

gulducat · 2026-02-24T18:32:31Z

+		if allTaskGroupsComplete {
+			d.Close()
+			if hasFailures {
+				return 2 // Scheduling failure


nit about the comment: my test job is scheduled successfully, but then I get this if my task exits non-0. I don't really think that's a "scheduling failure"

it is not, is more of a "we dont know what happen but your job didnt finish"

Juanadelacuesta · 2026-04-28T13:38:34Z

This PR fails because of a bug addressed #27852, once that one ie merged, this will work.

Juanadelacuesta requested review from a team as code owners February 19, 2026 14:19

vercel Bot deployed to Preview February 19, 2026 14:21 View deployment

Juanadelacuesta force-pushed the NMD-396-job-dispatch branch from 71312a8 to 20af021 Compare February 19, 2026 14:23

vercel Bot deployed to Preview February 19, 2026 14:24 View deployment

Juanadelacuesta added theme/cli backport/1.11.x backport to 1.11.x release line labels Feb 19, 2026

jrasell requested review from gulducat and pkazmierczak February 20, 2026 14:55

Juanadelacuesta force-pushed the NMD-396-job-dispatch branch from 20af021 to be99d7d Compare February 23, 2026 15:16

vercel Bot deployed to Preview February 23, 2026 15:17 View deployment

Juanadelacuesta mentioned this pull request Feb 23, 2026

Nomad Docs job dispatch command output hashicorp/web-unified-docs#1886

Draft

15 tasks

gulducat reviewed Feb 23, 2026

View reviewed changes

vercel Bot deployed to Preview February 24, 2026 17:16 View deployment

Juanadelacuesta requested a review from gulducat February 24, 2026 17:26

gulducat reviewed Feb 24, 2026

View reviewed changes

vercel Bot deployed to Preview March 4, 2026 14:38 View deployment

vercel Bot deployed to Preview April 28, 2026 12:37 View deployment

vercel Bot deployed to Preview April 28, 2026 12:39 View deployment

Juanadelacuesta force-pushed the NMD-396-job-dispatch branch from c062f0c to 1640f12 Compare April 28, 2026 12:46

vercel Bot deployed to Preview April 28, 2026 12:47 View deployment

vercel Bot deployed to Preview April 28, 2026 13:01 View deployment

Juanadelacuesta requested a review from gulducat April 28, 2026 13:38

Juanadelacuesta added 6 commits May 1, 2026 16:23

func: add monitoring to the job dispatch command

e714c3d

style: add changelog

724bb2f

func: put the new behaviour under the flag -wait

5325737

first fully functional draft

190f812

style: clean up commented code and add date

d551c49

style: formatting

8b0d4a4

fix: update tests to new functionality

d8e2c9b

Juanadelacuesta force-pushed the NMD-396-job-dispatch branch from 05ccc75 to d8e2c9b Compare May 1, 2026 14:23

vercel Bot deployed to Preview May 1, 2026 14:25 View deployment

style: linter fix

798c0ac

vercel Bot deployed to Preview May 1, 2026 14:30 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad CLI Monitor Dispatched Jobs#27541

Nomad CLI Monitor Dispatched Jobs#27541
Juanadelacuesta wants to merge 8 commits intomainfrom
NMD-396-job-dispatch

Juanadelacuesta commented Feb 19, 2026 •

edited

Loading

Uh oh!

gulducat left a comment •

edited

Loading

Uh oh!

gulducat Feb 23, 2026

Uh oh!

gulducat Feb 23, 2026

Uh oh!

Juanadelacuesta Feb 24, 2026

Uh oh!

gulducat left a comment

Uh oh!

gulducat Feb 24, 2026

Uh oh!

gulducat Feb 24, 2026

Uh oh!

gulducat Feb 24, 2026

Uh oh!

Juanadelacuesta Mar 2, 2026

Uh oh!

Juanadelacuesta commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Juanadelacuesta commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Links

Contributor Checklist

Reviewer Checklist

Changes to Security Controls

Uh oh!

gulducat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gulducat Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gulducat Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Juanadelacuesta Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gulducat left a comment

Choose a reason for hiding this comment

Uh oh!

gulducat Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gulducat Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gulducat Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Juanadelacuesta Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Juanadelacuesta commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Juanadelacuesta commented Feb 19, 2026 •

edited

Loading

gulducat left a comment •

edited

Loading