Skip to content

Conversation

@jayanth-tjvrr
Copy link

@jayanth-tjvrr jayanth-tjvrr commented Jun 11, 2025

This resolves KAAP-634

This PR adds the pf9-byohost-agent service upgrade feature. So, we have a install.sh file which runs on the node everytime there is a change in the cluster MD. We were using this feature for BYOH cluster upgrades as well. Now, we added a logic to create a systemd service for agent service upgrade from the install.sh file itself. So, the process is:

  1. A change in MD, invokes the install.sh on node.
  2. We pull the latest deb file and invoke the systemd service for agent upgrade.
  3. Download required k8s bundle. (previous install.sh file flow)
  4. install the k8s bundle and join the cluster. (previous install.sh file flow)

systemd service for agent upgrade logic:

We have a UPGRADE_MARKER file in ~/.byoh/, this file is created when the upgrade happens for the first time.

So, the flow is, the upgrade service gets called which will run independent of the BYOH agent service and logs in /var/log/pf9/agent-upgrade-*.log. This upgrade service, takes the back up of ~/.byoh/config stops the BYOH agent service restores ~/.byoh/config and installs the new bundle and create the UPGRADE_MARKER file and then checks if the BYOH agent service has come up running or not.

Once, this upgrade service is successfully ran, the BYOH agent service restarts with the new bundle. In this iteration, the upgrade service doesn't get invoked because we are invoking the upgrade service only if the UPGRADE_MARKER is not present, if it is present, we continue only with the installation.

Testing:

Invoking the upgrade service in the foreground

I0610 18:40:16.652929  338453 host_reconciler.go:169]  "msg"="executing install script" "ByoHost"={"name":"jayanth-byo-upg-2","namespace":"byo-upg-3-default-service"} "controller"="byohost" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="ByoHost" "name"="jayanth-byo-upg-2" "namespace"="byo-upg-3-default-service" "reconcileID"="61957579-d479-4667-b905-24beee28c25f"
[2025-06-10 18:40:16] Starting installation/upgrade
[2025-06-10 18:40:16] Agent upgrade check...
[2025-06-10 18:40:16] Starting agent upgrade check
[INFO] Using temporary directory: /tmp/tmp.jnyftVF6e9
[INFO] Downloading latest agent package...
[INFO] Installing agent package version: 1.0
[DEBUG] Starting upgrade process in the foreground...
[DEBUG] Upgrade script path: /tmp/tmp.IgzTdx9cTP
[DEBUG] DEB file: /tmp/tmp.jnyftVF6e9/pf9-byohost-agent.deb
I0610 18:40:18.787410  338453 internal.go:581]  "msg"="Stopping and waiting for non leader election runnables" 
I0610 18:40:18.787475  338453 internal.go:585]  "msg"="Stopping and waiting for leader election runnables" 
I0610 18:40:18.787707  338453 controller.go:248]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="byohost" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="ByoHost"

Upgrade service logs :

[2025-06-10 18:40:18] [UPGRADE] === Starting agent upgrade process ===
[2025-06-10 18:40:18] [UPGRADE] Starting agent upgrade
Hostname: jayanth-byo-upg-2
Current user: root
Home directory: /root
[2025-06-10 18:40:18] [UPGRADE] Backing up existing configuration...
[2025-06-10 18:40:18] [UPGRADE] Configuration backed up to /tmp/byoh-config-backup-1749580818
[2025-06-10 18:40:18] [UPGRADE] Performing clean reinstall...
[2025-06-10 18:40:18] [UPGRADE] Stopping and removing existing package...
(Reading database ... 64089 files and directories currently installed.)
Removing pf9-byohost-agent (1.0) ...
Starting uninstallation of pf9-byoh-hostagent...
WARNING: pf9-byoh-hostagent could not be stopped before uninstallation
Stopping and disabling pf9-byoh-hostagent service...
Service stopped successfully
Systemd daemon reloaded
Removing binary...
Binary removed successfully
Removing service file...
Service file removed successfully
Removing log files...
Log files removed successfully
Conf files already removed or not found 
Removing Config File
Config file removed successfully
Removing packages directory...
packages dir removed successfully
Removing .byoh directory...
.byoh dir removed successfully
 | not removed dependencies              |
 | socat conntrack ebtables and ethtools |
 | remove it according to your need      |
byohctl already removed or not found
Uninstallation of pf9-byoh-hostagent completed successfully
Post removal script of pf9-BYOHOST-agent package
Purging configuration files for pf9-byohost-agent (1.0) ...
Post removal script of pf9-BYOHOST-agent package
[2025-06-10 18:40:19] [UPGRADE] Restoring .byoh directory...
[2025-06-10 18:40:19] [UPGRADE] Installing new package...
Selecting previously unselected package pf9-byohost-agent.
(Reading database ... 64084 files and directories currently installed.)
Preparing to unpack .../pf9-byohost-agent.deb ...
Unpacking pf9-byohost-agent (1.0) ...
Setting up pf9-byohost-agent (1.0) ...
after pf9-byohost-agent installation
Created symlink /etc/systemd/system/multi-user.target.wants/pf9-byohost-agent.service → /etc/systemd/system/pf9-byohost-agent.service.
[2025-06-10 18:40:20] [UPGRADE] Agent upgrade completed
[2025-06-10 18:40:20] [UPGRADE] Created upgrade marker: /root/.byoh/upgrade_complete
[2025-06-10 18:40:20] [UPGRADE] See  for details

Once, the upgrade is done, the installation script is continued in the next call

I0610 19:13:13.432656  340876 host_reconciler.go:169]  "msg"="executing install script" "ByoHost"={"name":"jayanth-byo-upg-2","namespace":"byo-upg-3-default-service"} "controller"="byohost" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="ByoHost" "name"="jayanth-byo-upg-2" "namespace"="byo-upg-3-default-service" "reconcileID"="3f9f494d-d5b6-4368-8739-64f055e7f7d1"
[2025-06-10 19:13:13] Starting installation/upgrade
[INFO] Proceeding with bundle download
Firewall stopped and disabled on system startup
etc/
etc/modules-load.d/
etc/modules-load.d/containerd.conf
etc/sysctl.d/
etc/sysctl.d/99-kubernetes-cri.conf
* Applying /etc/sysctl.d/10-console-messages.conf ...
kernel.printk = 4 4 1 7
* Applying /etc/sysctl.d/10-ipv6-privacy.conf ...
net.ipv6.conf.all.use_tempaddr = 2
net.ipv6.conf.default.use_tempaddr = 2
* Applying /etc/sysctl.d/10-kernel-hardening.conf ...
kernel.kptr_restrict = 1
* Applying /etc/sysctl.d/10-link-restrictions.conf ...
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/10-magic-sysrq.conf ...
kernel.sysrq = 176
* Applying /etc/sysctl.d/10-network-security.conf ...
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2
* Applying /etc/sysctl.d/10-ptrace.conf ...
kernel.yama.ptrace_scope = 1
* Applying /etc/sysctl.d/10-zeropage.conf ...
vm.mmap_min_addr = 65536
* Applying /usr/lib/sysctl.d/50-default.conf ...

Summary by Bito

This pull request enhances the BYOH agent service upgrade process and containerd installation flow by implementing a systemd service for asynchronous upgrades with improved logging and error handling. It also refactors containerd installation to use tar extraction and dynamic configuration generation, resulting in more streamlined installation and upgrade processes on target nodes.

@bito-code-review
Copy link

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Agent Upgrade Service Integration

install.sh.tmpl - Introduced a new upgrade mechanism that creates a systemd service to handle the BYOH agent upgrade process with proper lock acquisition, cleanup routines, and logging.

Feature Improvement - Containerd Installation Flow Update

install.sh.tmpl - Updated the containerd installation process by replacing the dpkg install with tar extraction and configuration generation, ensuring a smoother setup.

Copy link

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #79fd0b

Actionable Suggestions - 1
  • installer/internal/algo/ubuntu-templates/install.sh.tmpl - 1
    • Inner trap handler overwrites outer handler · Line 89-100
Additional Suggestions - 2
  • installer/internal/algo/ubuntu-templates/install.sh.tmpl - 2
    • Hardcoded image URL should be parameterized · Line 104-104
    • Undefined variable reference in upgrade script · Line 189-189
Review Details
  • Files reviewed - 1 · Commit Range: d96f12e..4bc4875
    • installer/internal/algo/ubuntu-templates/install.sh.tmpl
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines +89 to +100
# Cleanup function for this scope
cleanup() {
local exit_code=$?
echo "[INFO] Starting cleanup of temporary files"
if [ -n "$AGENT_TEMP_DIR" ] && [ -d "$AGENT_TEMP_DIR" ]; then
rm -rf "$AGENT_TEMP_DIR"
echo "[INFO] Removed temporary directory: $AGENT_TEMP_DIR"
AGENT_TEMP_DIR=""
fi
exit $exit_code
}
trap cleanup EXIT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inner trap handler overwrites outer handler

The inner cleanup function in upgrade_agent_if_needed overwrites the outer trap handler but doesn't restore it, potentially causing resource leaks if the outer handler isn't executed.

Code suggestion
Check the AI-generated fix before applying
Suggested change
# Cleanup function for this scope
cleanup() {
local exit_code=$?
echo "[INFO] Starting cleanup of temporary files"
if [ -n "$AGENT_TEMP_DIR" ] && [ -d "$AGENT_TEMP_DIR" ]; then
rm -rf "$AGENT_TEMP_DIR"
echo "[INFO] Removed temporary directory: $AGENT_TEMP_DIR"
AGENT_TEMP_DIR=""
fi
exit $exit_code
}
trap cleanup EXIT
# Save the previous trap handler
local previous_trap=$(trap -p EXIT)
# Cleanup function for this scope
cleanup() {
local exit_code=$?
echo "[INFO] Starting cleanup of temporary files"
if [ -n "$AGENT_TEMP_DIR" ] && [ -d "$AGENT_TEMP_DIR" ]; then
rm -rf "$AGENT_TEMP_DIR"
echo "[INFO] Removed temporary directory: $AGENT_TEMP_DIR"
AGENT_TEMP_DIR=""
fi
# Restore the previous trap handler if it exists
if [ -n "$previous_trap" ]; then
eval "$previous_trap"
fi
}
trap cleanup EXIT

Code Review Run #79fd0b


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

@cruizen cruizen requested review from mithilarun and psarwate June 11, 2025 05:41
@mithilarun mithilarun removed their request for review September 9, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant