Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always launch more recent watcher #4491

Merged
merged 7 commits into from
Apr 17, 2024

Conversation

pchila
Copy link
Member

@pchila pchila commented Mar 27, 2024

What does this PR do?

This PR changes the way elastic-agent launches the upgrade watcher in the final phases of the upgrade process.
Before this change, the elastic-agent binary from the incoming version (the one specified in the upgrade command) was always used but this meant that during a downgrade an older version of elastic agent would be launched, potentially not supporting new fields in the upgrade marker generated during the upgrade.

Now the agent will check the current and incoming version and always select the newer one to launch the watcher process: during a downgrade this means that the current agent binary will function as a watcher to check that the incoming version functions correctly after the restart.

Why is it important?

We want to avoid issues during watch/cleanup/rollback phases of upgrades even if the incoming version is older than the current install.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Mar 27, 2024
@pchila pchila self-assigned this Mar 27, 2024
Copy link
Contributor

mergify bot commented Mar 27, 2024

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@pchila pchila force-pushed the always-launch-more-recent-watcher branch 8 times, most recently from 7de5860 to b7bfbb2 Compare April 10, 2024 08:24
@pchila pchila marked this pull request as ready for review April 10, 2024 08:26
@pchila pchila requested a review from a team as a code owner April 10, 2024 08:26
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

internal/pkg/agent/application/upgrade/upgrade.go Outdated Show resolved Hide resolved
switch major {
case 8:
return NewParsedSemVer(7, 17, 10, psv.Prerelease(), psv.BuildMetadata()), nil
for i := 0; i < minPrereleaseTokens; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem is this solving? Do we need this code to handle something in our current or planned release versioning scheme or is it just in case?

Copy link
Member Author

@pchila pchila Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct implementation for comparing 2 semver version strings. Prerelease strings should be compared token by token as strings or numbers depending on their format.

Right now we only use "SNAPSHOT" prerelease string and the previous impl would just compare 2 versions based on the presence of prerelease string without comparing those, leading to versions like
1.2.3-SNAPSHOT , 1.2.3-alpha, 1.2.3-SNAPHOT.extraprereleasetoken not to have an definite order.

This new implementation is able to parse the tokens and correctly determine precedence based on the prerelease token: an example when this is gonna come into play is of we decide to use repackaged agents in upgrade tests as 1.2.3-SNAPSHOT.repackaged now has a defined order when compared with 1.2.3-SNAPSHOT

This is a part of a general overhaul of agent version handling correct versioning both at packaging and runtime which I then extracted to put in a separate PR as it would make this change too big.

Edit: this is the other PR's commit containing the rest of changes 863a638

@pchila pchila mentioned this pull request Apr 15, 2024
7 tasks
internal/pkg/agent/application/upgrade/upgrade.go Outdated Show resolved Hide resolved
func selectWatcherExecutable(topDir string, previous agentInstall, current agentInstall) string {
// check if the upgraded version is less than the previous (currently installed) version
if current.parsedVersion.Less(*previous.parsedVersion) {
// use the current agent executable for watch, if downgrading the old agent doesn't understand the current agent's path structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment says use the current as currently installed but it's not current as a variable name. it gets confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but this is the existing names that follow where the symlink points.
So after rotating the symlink the currently running agent is named previous and the new one is named current.
I don't know if this is more confusing or swapping the variable names so when we look at it in different places in the code it gets confusing...

Copy link
Contributor

@michalpristas michalpristas Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maybe rephrase
use the current agent executable for watch, if downgrading the old agent doesn't understand the current agent's path structure.
to
if downgrading the target agent may not understand the current agent's path structure.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really don't like the hard-coding of 8.13.0.

@pchila
Copy link
Member Author

pchila commented Apr 16, 2024

I don't really like that this is hard-coded. Will this need to be updated when we create a new release? Why not use version.Agent?

@blakerouse
we need to be sure we have a parseable version for the agent that is orchestrating the upgrade. We store that as a string in the package.version file and today it can be any string. I agree it's a pretty slim chance that we have a wrongly formatted version string but we have to handle the possibility. The alternative here is to perform a rollback of the already installed agent or figure out some kind of fallback behaviour (which where the 8.13.0 comes in because it's the version where the new update marker has been introduced).

If you want I can change that fallback to always run the newly installed agent watcher (dangerous for downgrade) or stop the upgrade process at this point and roll back everything...

@pchila pchila force-pushed the always-launch-more-recent-watcher branch from ece5816 to c3821e7 Compare April 16, 2024 16:50
@pchila
Copy link
Member Author

pchila commented Apr 16, 2024

Removed getParsedVersionWithFallback function in favor of reading the current parsed version at startup from the package.version file

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes.

Like this implement of the pressed version string much better.

@pchila
Copy link
Member Author

pchila commented Apr 17, 2024

buildkite test this

Copy link

Quality Gate passed Quality Gate passed

The SonarQube Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
84.0% 84.0% Coverage on New Code
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

@pchila pchila enabled auto-merge (squash) April 17, 2024 19:09
@pchila pchila merged commit 761f4b4 into elastic:main Apr 17, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent should always launch the more recent watcher version when downgrading
5 participants