Skip to content

Orchestrator rolling updates with job definiton #564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jakubno
Copy link
Member

@jakubno jakubno commented Apr 11, 2025

Description

  • This will allow us to update orchestrator job definition without causing downtime
  • Also it improves observability how many orchestrators are running for the current version

How it works

We generate new unique id if either job definition, secret or orchestrator binary changed
We use this ID to generate a new job, which has the new job definition
We save the ID to nomad as a variable and theres's a prestart check, which compares the ID of the job with the latest ID (the one saved in nomad), if they don't match the orchestrator is not started

This means there will be multiple jobs for orchestrator, but new orchestrator will start only for the latest job

@jakubno jakubno added the improvement Improvement for current functionality label Apr 11, 2025
@jakubno jakubno self-assigned this Apr 11, 2025
Copy link
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Could you please describe how would a migration from the current solution look like? Is there any migration necessary?

@ValentaTomas
Copy link
Member

ValentaTomas commented Apr 12, 2025

I think the migration could be the following:

  1. Adjust the priority of the new job so it is evaluated before the old orchestrator job—it will block the old orchestrator job from being deployed for the new nodes
  2. Deploy the new job once and roll all orchestrators
  3. Remove the old orchestrator job

The only question left is how to delete the old jobs that are unused.

@ValentaTomas
Copy link
Member

ValentaTomas commented Apr 22, 2025

@jakubno The priority on the new job cannot make it so that the new job is evaluated before the currently running orchestrator? Thinking if we even need the wait at all.

@ValentaTomas
Copy link
Member

The only question left is how to delete the old jobs that are unused.

Also we need to solve this before merging.

@jakubno jakubno force-pushed the rolling-new-job branch from 0ab9633 to 040fd6f Compare May 7, 2025 16:52
@jakubno
Copy link
Member Author

jakubno commented May 7, 2025

The only question left is how to delete the old jobs that are unused.

Also we need to solve this before merging

There won't be that many of them, deploying new orchestrator version is now rather slow process

@jakubno jakubno enabled auto-merge (squash) May 7, 2025 17:13
@jakubno jakubno force-pushed the rolling-new-job branch from eb393e9 to 87db388 Compare May 15, 2025 12:13
@jakubno
Copy link
Member Author

jakubno commented May 15, 2025

Wait for #647 is deployed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement for current functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants