Allow agent to override fleet settings #11524

michalpristas · 2025-12-02T13:49:29Z

This pull request introduces support for a new "Fleet override" capability, allowing persisted configuration from a file to override configuration received from Fleet, but only if explicitly enabled by capabilities. The changes include the implementation of the new capability type, its integration into the configuration processing flow, and comprehensive unit tests to ensure correct behavior.

Opted for capabilities rather than FF or agent config option as it is clear this is set on purpose. It is not coming from fleet and cannot override itself.

Fleet override capability implementation and integration:

Added a new capability type, fleet_override, to the capabilities system, including parsing from YAML, internal representation, and logic to determine if Fleet override is allowed (AllowFleetOverride).
Updated the capabilitiesManager to support the new fleetOverrideCaps and wire it into the capability loading process.

Configuration override logic:

Implemented the applyPersistedConfig function, which applies persisted configuration from a file on disk if the Fleet override capability is enabled. This function is now called during configuration processing in the coordinator.
Added the necessary import for os to support file operations in the coordinator.

Testing:

Added unit tests for applyPersistedConfig to verify correct merging of persisted configuration based on the Fleet override capability, including both enabled and disabled scenarios.

Manual testing:

For manual testing build and enroll to fleet to a policy with monitoring disabled.
Verify http server is not running with curl http://localhost:6774/liveness -I
Update elastic-agent.yml with following content

agent:
  monitoring:
    enabled: true
    use_output: default
    logs: false
    metrics: true
    http:
      port: 6774

Create capabilities file like so:

capabilities:
- fleet_override:
  rule: allow

Restart agent and perform curl again curl http://localhost:6774/liveness -I

elasticmachine · 2025-12-02T13:49:33Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

swiatekm

The logic looks fine to me but I don't like where it's located in the codebase. Ideally, this change should try to make reasoning about config loading easier, not harder.

internal/pkg/agent/application/coordinator/coordinator.go

swiatekm

LGTM

cmacknz · 2025-12-03T21:11:51Z

internal/pkg/agent/application/fleet_config_patcher.go

+	return func(change coordinator.ConfigChange) coordinator.ConfigChange {
+		newConfig := change.Config()
+		if err := newConfig.Merge(rawConfig); err != nil {
+			log.Errorf("error merging fleet config into config change: %v", err)


Are you sure you just want to log on failure? In the context of agentless, this would leave the pod running without valid configuration wouldn'it it?

How would you alert on that or gate promotion on this?

i see this as non fatal for use cases we have today. we want to allow monitoring of agent, liveness checks... what's more important getting data in or being observable. my take is former, but if you think this should be fatal I can fail hard

Looking forward a bit I think we can likely use this to inject a rate limit processor required by all agentless integrations, using the OTel processors section that can be used to add global processors with some other not quite done work.

I think if you only look at the liveness issue it isn't fatal, but I can see this being used for things in the future that should be fatal.

In the context of agentless I would rather have incomplete configs be fatal errors so we always know there is a problem, than silently continue to run with a configuration we don't want (e.g. missing rate limit for example).

internal/pkg/agent/application/coordinator/testdata/overrides.yml

cmacknz · 2025-12-03T21:18:42Z

Capabilities is a nice way to control this, good idea 👍

The capabilities.yml file has to be read from the config path, do we do permissions checks on ownership in this directory? This new capability is quite powerful so I think we want to do some extra work to make sure that the file was not created by an arbitrary user. For example if agent is root, the config path and capabilities.yml file need to be owned by root too.

I don't see any on the capabilities.yml file itself:

elastic-agent/internal/pkg/capabilities/capabilities.go

Lines 43 to 58 in 7d6cf59

    
           func LoadFile(capsFile string, log *logger.Logger) (Capabilities, error) { 
        
           	// load capabilities from file 
        
           	fd, err := os.Open(capsFile) 
        
           	if errors.Is(err, fs.ErrNotExist) { 
        
           		// No file, return an empty capabilities manager 
        
           		log.Infof("Capabilities file not found in %s", capsFile) 
        
           		return &capabilitiesManager{}, nil 
        
           	} 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	// We successfully opened the file, pass it through to Load 
        
           	defer fd.Close() 
        
           	return Load(fd, log) 
        
           }

michalpristas · 2025-12-04T10:58:23Z

@swiatekm i'm reverting to previous implementation.
with this one it becomes a mess, we have one enriched config (with overrides that comes from fleet) then we have stored one that is applied on start without enrichments, so server first starts with persisted config and as fleet does not send another update we stick with this one until update, then suddenly we have updated values...

whole patchers is a bit cumbersome, and need more thorough refactoring that i don't want to do in this PR.
to make this work i need patchers applied at loadConfig (very first thing agent do), then i need another patchers that works with config.Config, on cfgManager, different patchers may require different things, as my required capabilities, rootChecks and fleetManaged checks, these things are done again later on. so all in all, it was very not nice

…ow-fleet-config-override

michalpristas · 2025-12-04T12:03:51Z

@cmacknz i added check for capabilities permissions

cmacknz · 2025-12-04T21:49:19Z

Maybe I have code blindness at the end of the day, but I don't see permissions checks, maybe forgotten push?

…ristas/elastic-agent into feat/allow-fleet-config-override

michalpristas · 2025-12-05T10:50:20Z

no code blindness, i lost the work
just a note that due to fleet initiated privileges level change, we fix permissions at the start to proper user, so i need to do the check before this step. This means we could potentially have very tiny window where capabilities is wrong.
are we worried about elastic-agent config permissions not being checked at all?

…eat/allow-fleet-config-override

elasticmachine · 2025-12-05T14:58:04Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 7bc72a1

Failed CI Steps

Win2022:non-sudo:default

History

💔 Build #31558 failed b13ce20
💔 Build #31552 failed aa22ec2
💚 Build #31463 succeeded cb7fe0c
💚 Build #31373 succeeded 1651558
💚 Build #31307 succeeded 4ae51b3

cc @michalpristas

cmacknz

Main concern is limiting security implications because this new capability is pretty powerful for a Fleet managed agent.

If we can restrict this to only working for containers that would help a lot IMO, so it's scoped to only where we need it, and we don't to worry about arbitrary users on a machine disabling the policy output and similar things via the new override capability.

cmacknz · 2025-12-05T18:45:27Z

internal/pkg/agent/application/coordinator/coordinator.go

+	// override retrieved config from Fleet with persisted config from AgentConfig file
+


Suggested change

// override retrieved config from Fleet with persisted config from AgentConfig file

// override retrieved config from Fleet with persisted config from AgentConfig file

cmacknz · 2025-12-05T19:24:17Z

pkg/utils/perm_unix.go

 // owner of the file and that the owner of the file is the same as the UID or root.
-func HasStrictExecPerms(path string, uid int) error {
-	info, err := os.Stat(path)
+func HasStrictExecPerms(path string, uid int, checkUID bool) error {


This godoc reads like it was supposed to be used with checkUID set to true all the time but instead it ignored the UID argument completely which is... interesting.

I almost think we'd be better off with two functions so it's clearer what happens.

HasStrictExecPerms with just the path argument and a new HasStrictUIDExecPerms or similar with the uid argument that always did the UID check, and calls the HasStrictExecPerms function.

Also the new HasStrictUIDExecPerms should have tests (or if you disagree entirely with this suggestion, the there should be tests for the checkUID argument).

cmacknz · 2025-12-05T19:29:14Z

pkg/utils/perm_windows.go

 // HasStrictExecPerms ensures that the path is executable by the owner and that the owner of the file
 // is the same as the UID or root.
-func HasStrictExecPerms(path string, uid int) error {
+func HasStrictExecPerms(path string, uid int, _ bool) error {


We should fix this, an arbitrary user being able to control capabilities.yml now has the ability override agent configurations from Fleet (preventing upgrades also would not be very nice).

Is there a way to scope the capability to only work in containers instead? Then you would have a way to avoid impact on endpoint use cases for example and avoid windows completely.

My main concern right now are the security implications of this.

capabilities will check user on linux/darwin. on windows we don't check, we could check permissions like osquery does.
i'm fine with limiting this to containers with uid check kept in place

michalpristas added 2 commits December 2, 2025 14:41

Allow agent to override fleet settings

e2eb13a

Remove duplicate test

df77bf7

michalpristas self-assigned this Dec 2, 2025

michalpristas requested a review from a team as a code owner December 2, 2025 13:49

michalpristas added the enhancement New feature or request label Dec 2, 2025

michalpristas requested review from blakerouse and swiatekm December 2, 2025 13:49

michalpristas added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-skip skip-changelog labels Dec 2, 2025

nil check

4ae51b3

swiatekm reviewed Dec 3, 2025

View reviewed changes

internal/pkg/agent/application/coordinator/coordinator.go Show resolved Hide resolved

internal/pkg/agent/application/coordinator/coordinator.go Show resolved Hide resolved

michalpristas added 2 commits December 3, 2025 16:16

moved implementation to custom patcher

1929087

added missing headers

1651558

swiatekm previously approved these changes Dec 3, 2025

View reviewed changes

cmacknz reviewed Dec 3, 2025

View reviewed changes

internal/pkg/agent/application/coordinator/testdata/overrides.yml Show resolved Hide resolved

add output to test

3a5a76f

michalpristas dismissed swiatekm’s stale review via 3a5a76f December 4, 2025 08:57

michalpristas added 3 commits December 4, 2025 11:59

reverted patcher

58a2557

updated tests

d80817c

Merge branch 'main' of github.com:elastic/elastic-agent into feat/all…

cb7fe0c

…ow-fleet-config-override

michalpristas requested a review from swiatekm December 4, 2025 11:20

michalpristas added 2 commits December 5, 2025 10:41

Merge branch 'main' into feat/allow-fleet-config-override

34717c0

check permissions for root for capabilities

a8f379c

michalpristas added 3 commits December 5, 2025 11:13

Merge branch 'feat/allow-fleet-config-override' of github.com:michalp…

cfb47b3

…ristas/elastic-agent into feat/allow-fleet-config-override

wrapping because different uid type on windows

7ff467c

ignore not found

aa22ec2

michalpristas added 4 commits December 5, 2025 13:05

differentiate between desired and accidental root

feb9132

differentiate between desired and accidental root

b13ce20

skip check uid owner for backward compatibility

a34fd86

t push Merge branch 'main' of github.com:elastic/elastic-agent into f…

7bc72a1

…eat/allow-fleet-config-override

cmacknz reviewed Dec 5, 2025

View reviewed changes

		// override retrieved config from Fleet with persisted config from AgentConfig file

Allow agent to override fleet settings #11524

Are you sure you want to change the base?

Allow agent to override fleet settings #11524

Conversation

michalpristas commented Dec 2, 2025

Uh oh!

elasticmachine commented Dec 2, 2025

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmacknz commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalpristas commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalpristas commented Dec 4, 2025

Uh oh!

cmacknz commented Dec 4, 2025

Uh oh!

michalpristas commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Dec 5, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

cmacknz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cmacknz commented Dec 3, 2025 •

edited

Loading

michalpristas commented Dec 4, 2025 •

edited

Loading

michalpristas commented Dec 5, 2025 •

edited

Loading