Skip to content

Allow Agent Reload with API - /v1/agent/reload#27106

Open
benjamin-lykins wants to merge 28 commits intohashicorp:mainfrom
benjamin-lykins:feat/config-reload-api
Open

Allow Agent Reload with API - /v1/agent/reload#27106
benjamin-lykins wants to merge 28 commits intohashicorp:mainfrom
benjamin-lykins:feat/config-reload-api

Conversation

@benjamin-lykins
Copy link
Copy Markdown
Contributor

@benjamin-lykins benjamin-lykins commented Nov 14, 2025

Description

Provide an API endpoint to reload agent config without elevated permissions on the infrastructure running Nomad.

  • Provides ability to reload config remotely.
  • Leverage Nomad Token + ACL to reload.

Testing

Reloading log level without ACL

curl -k  -X PUT \
  https://localhost:4646/v1/agent/reload
{"message":"agent configuration reloaded"}

Attempting without token (ACL Enabled):

curl -k  -X PUT \  
https://localhost:4646/v1/agent/reload
Permission denied

Attempting with token (ACL Enabled; Agent = Write):

curl -k -X PUT \                                                    
  --header "X-Nomad-Token: 79d77f6a-e168-3bbb-8fa7-e668b2b09346" \
  https://localhost:4646/v1/agent/reload 
{"message":"agent configuration reloaded"}

Attempting with token (ACL Enabled; Agent = Read):

curl -k -X PUT \
  --header "X-Nomad-Token: 2cde0074-d0ac-53e1-a6dc-3f416a8e1737" \
  https://localhost:4646/v1/agent/reload
Permission denied

Invalid Config File:

curl -k -X PUT \
  --header "X-Nomad-Token: 79d77f6a-e168-3bbb-8fa7-e668b2b09346" \
  https://localhost:4646/v1/agent/reload
Error loading /Users/benjamin.lykins/git/nomad/nomad/bin/config/general.hcl: failed to decode HCL file /Users/benjamin.lykins/git/nomad/nomad/bin/config/general.hcl: At 3:16: literal not terminated

Links

Fixes: #24048

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

@benjamin-lykins benjamin-lykins requested review from a team as code owners November 14, 2025 19:19
@tgross tgross self-requested a review November 17, 2025 15:09
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamin-lykins there's a failing test not touched here but clearly related to this work. Let's get that resolved before we review:

=== FAIL: command/agent TestServer_Reload_TLS_Certificate (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x30fefa3]

goroutine 87962 [running]:
testing.tRunner.func1.2({0x3898300, 0x69a5e70})
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1875 +0x35b
panic({0x3898300?, 0x69a5e70?})
	/opt/hostedtoolcache/go/1.25.3/x64/src/runtime/panic.go:783 +0x132
github.com/hashicorp/nomad/command/agent.(*Agent).Reload(0xc00c83c700, 0xc00d476d08)
	/home/runner/work/nomad/nomad/command/agent/agent.go:1533 +0x63
github.com/hashicorp/nomad/command/agent.TestServer_Reload_TLS_Certificate(0xc00a9f8540)
	/home/runner/work/nomad/nomad/command/agent/agent_test.go:1197 +0x1d7
testing.tRunner(0xc00a9f8540, 0x42c8720)
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1997 +0x465

DONE 5542 tests, 12 skipped, 4 failures in 182.433s

@benjamin-lykins
Copy link
Copy Markdown
Contributor Author

@benjamin-lykins there's a failing test not touched here but clearly related to this work. Let's get that resolved before we review:

=== FAIL: command/agent TestServer_Reload_TLS_Certificate (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x30fefa3]

goroutine 87962 [running]:
testing.tRunner.func1.2({0x3898300, 0x69a5e70})
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1875 +0x35b
panic({0x3898300?, 0x69a5e70?})
	/opt/hostedtoolcache/go/1.25.3/x64/src/runtime/panic.go:783 +0x132
github.com/hashicorp/nomad/command/agent.(*Agent).Reload(0xc00c83c700, 0xc00d476d08)
	/home/runner/work/nomad/nomad/command/agent/agent.go:1533 +0x63
github.com/hashicorp/nomad/command/agent.TestServer_Reload_TLS_Certificate(0xc00a9f8540)
	/home/runner/work/nomad/nomad/command/agent/agent_test.go:1197 +0x1d7
testing.tRunner(0xc00a9f8540, 0x42c8720)
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1997 +0x465

DONE 5542 tests, 12 skipped, 4 failures in 182.433s

I think I know what is causing this. I'll get it fixed.

@benjamin-lykins
Copy link
Copy Markdown
Contributor Author

@tgross fixed the test, which then uncovered another broken test after another I changed the return response. both of those were cleaned up and looks happier now here.

Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start @benjamin-lykins!

In addition to the comments I've left, make sure you've added a changelog entry file. You can do that via make cl. Once this is getting closer to being ready, we'll need a docs PR to go with this, which is now going to be over in the web-unified-docs repo.

Comment thread api/agent.go Outdated
Comment thread command/agent/agent_endpoint_test.go Outdated
Comment thread command/agent/agent_endpoint_test.go
Comment thread command/agent/agent_endpoint.go Outdated

newConf := DefaultConfig()

for _, path := range currConf.Files {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the -config flag in the command line arguments for this agent points to a directory, this Files field will contain only the files that were in that directory at the time we started the agent. If you add a new file, it won't be present on reload. We may need to figure out a way to get the original -config flag attached to the config for this to work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this one, wasn't thinking of it or realized when I originally ran through it. Made some modifications to save the -config flag values and tested locally without issue. Tested if I passed over two -config as well and worked as expected. I was also able to reuse the logic in LoadConfig to check each path, which seems to work out well. Lastly, I tested adding files with bad configuration and got a decent response back, but I could also just send a canned message back

curl -k  -X PUT \
  https://localhost:4646/v1/agent/reload
Error loading /Users/benjamin.lykins/git/nomad/nomad/bin/config1/badfile.hcl: failed to decode HCL file /Users/benjamin.lykins/git/nomad/nomad/bin/config1/badfile.hcl: At 1:6: key 'asdf' expected start of object ('{') or assignment ('=')

Comment thread command/agent/agent_endpoint.go Outdated

newConf.Files = append([]string(nil), currConf.Files...)

if err := s.agent.Reload(newConf); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at the signal handler for reloading, you'll see that this step only reloads the agent configuration and won't push any changes to the server, client, or HTTP server configuration. We should at least have parity with the signal handler here, which calls (*Command) handleReload. We may need to figure out a way to extract that function so that it's reachable here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be done, and I'll start looking ahead at how to get this going. I saw this when I was mucking around with TLS rotations with server/client, and it was only updating the more general agent config.

@benjamin-lykins
Copy link
Copy Markdown
Contributor Author

@tgross

  • Testing : Made the updates with the testing, I realized I did see that in the contributor guides. I just realized I have some lines I need to clean up from when I was moving them from testify to test.
  • -config : Updated to store the config values which can be passed when starting up an agent. Tested on my end without issue, and made additional notes under your comment.

@tgross tgross added theme/api HTTP API and SDK issues type/enhancement labels Dec 11, 2025
Comment thread command/agent/config.go
Comment thread command/agent/agent.go Outdated
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see where you're going with this, but there's an architectural issue. See my comment for more details.

Comment thread command/agent/command.go
Comment thread command/agent/agent.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

theme/api HTTP API and SDK issues type/enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

agent configuration reload API

3 participants