Skip to content

Conversation

@vaibhavd21
Copy link

@vaibhavd21 vaibhavd21 commented Jul 15, 2025

kaap-834

image

byohost-controller reconciles after every ByohHostReconcilePeriod which is 60 seconds - validates if lastHeartbeatTime is within configurable HeartbeatTimeoutPeriod ( defaults to 120 seconds ) and accordingly updates byohost status conditions, connected field and lastHeartbeatCheckTime.

Updated agent side host_reconciler to reconcile every time with the change in the status field of respective byohost by adding ResourceVersionChangedPredicate predicate. At every reconciliation, host_reconciler updates lastHeartbeatTime.

Testing:

root@testbyoh:~# systemctl status pf9-byohost-agent.service
● pf9-byohost-agent.service - Platform9 Kubernetes Management Agent Service
     Loaded: loaded (/etc/systemd/system/pf9-byohost-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2025-07-15 15:42:53 UTC; 5min ago
   Main PID: 2684 (bash)
      Tasks: 9 (limit: 4647)
     Memory: 12.4M
        CPU: 301ms
     CGroup: /system.slice/pf9-byohost-agent.service
             ├─2684 /bin/bash -c "/binary/pf9-byoh-hostagent-linux-amd64 --bootstrap-kubeconfig \"\$BOOTSTRAP_KUBECONFIG\" --namespace \"\$NAMESPACE\" --label \"\$REGION\" >> /var/log/pf9/byoh/byoh-agent.log 2>&1"
             └─2685 /binary/pf9-byoh-hostagent-linux-amd64 --bootstrap-kubeconfig /etc/pf9-byohost-agent.service.d/bootstrap-kubeconfig.yaml --namespace du-on-devdp4-default-service --label pcd-kaapi.pf9.io/region=regionone

Jul 15 15:42:53 testbyoh systemd[1]: Started Platform9 Kubernetes Management Agent Service.

status:
  conditions:
  - lastTransitionTime: "2025-07-15T15:42:54Z"
    status: "True"
    type: AgentConnected
  - lastTransitionTime: "2025-07-15T15:42:54Z"
    reason: WaitingForMachineRefToBeAssigned
    severity: Info
    status: "False"
    type: K8sNodeBootstrapSucceeded
  connected: true
  hostinfo:
    architecture: amd64
    osimage: Ubuntu 22.04.2 LTS
    osname: linux
  lastHeartbeatCheckTime: "2025-07-15T15:42:54Z"
  lastHeartbeatTime: "2025-07-15T15:42:54Z"
  network:
  - connected: true
    ipAddrs:
    - 127.0.0.1/8
    - ::1/128
    macAddr: ""
    networkInterfaceName: lo
  - connected: true
    ipAddrs:
    - 10.149.101.101/24
    - fe80::f816:3eff:fe64:dad2/64
    isDefault: true
    macAddr: fa:16:3e:64:da:d2
    networkInterfaceName: ens3

If either the agent service is stopped or the machine is down ( which will anyway stop the service )

root@testbyoh:~# systemctl stop pf9-byohost-agent.service
root@testbyoh:~# systemctl status pf9-byohost-agent.service
○ pf9-byohost-agent.service - Platform9 Kubernetes Management Agent Service
     Loaded: loaded (/etc/systemd/system/pf9-byohost-agent.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Tue 2025-07-15 17:12:24 UTC; 2s ago
    Process: 2684 ExecStart=/bin/bash -c /binary/pf9-byoh-hostagent-linux-amd64 --bootstrap-kubeconfig "$BOOTSTRAP_KUBECONFIG" --namespace "$NAMESPACE" --label "$REGION" >> /var/log/pf9/byoh/byoh-agent.log 2>&1 (code=killed, signal=TERM)
   Main PID: 2684 (code=killed, signal=TERM)
        CPU: 2.652s

Jul 15 15:42:53 testbyoh systemd[1]: Started Platform9 Kubernetes Management Agent Service.
Jul 15 17:12:24 testbyoh systemd[1]: Stopping Platform9 Kubernetes Management Agent Service...
Jul 15 17:12:24 testbyoh systemd[1]: pf9-byohost-agent.service: Deactivated successfully.
Jul 15 17:12:24 testbyoh systemd[1]: Stopped Platform9 Kubernetes Management Agent Service.
Jul 15 17:12:24 testbyoh systemd[1]: pf9-byohost-agent.service: Consumed 2.652s CPU time.


status:
  conditions:
  - lastTransitionTime: "2025-07-15T17:15:55Z"
    message: Heartbeat timeout detected
    reason: HeartbeatTimeout
    severity: Warning
    status: "False"
    type: AgentConnected
  - lastTransitionTime: "2025-07-15T15:42:54Z"
    reason: WaitingForMachineRefToBeAssigned
    severity: Info
    status: "False"
    type: K8sNodeBootstrapSucceeded
  connected: false
  hostinfo:
    architecture: amd64
    osimage: Ubuntu 22.04.2 LTS
    osname: linux
  lastHeartbeatCheckTime: "2025-07-15T17:15:55Z"
  lastHeartbeatTime: "2025-07-15T17:11:55Z"
  network:
  - connected: true
    ipAddrs:
    - 127.0.0.1/8
    - ::1/128
    macAddr: ""
    networkInterfaceName: lo
  - connected: true
    ipAddrs:
    - 10.149.101.101/24
    - fe80::f816:3eff:fe64:dad2/64
    isDefault: true
    macAddr: fa:16:3e:64:da:d2
    networkInterfaceName: ens3

Starting service again -

root@testbyoh:~# systemctl start pf9-byohost-agent.service
root@testbyoh:~# systemctl status pf9-byohost-agent.service
● pf9-byohost-agent.service - Platform9 Kubernetes Management Agent Service
     Loaded: loaded (/etc/systemd/system/pf9-byohost-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2025-07-15 17:16:58 UTC; 2s ago
   Main PID: 2767 (bash)
      Tasks: 9 (limit: 4647)
     Memory: 12.2M
        CPU: 120ms
     CGroup: /system.slice/pf9-byohost-agent.service
             ├─2767 /bin/bash -c "/binary/pf9-byoh-hostagent-linux-amd64 --bootstrap-kubeconfig \"\$BOOTSTRAP_KUBECONFIG\" --namespace \"\$NAMESPACE\" --label \"\$REGION\" >> /var/log/pf9/byoh/byoh-agent.log 2>&1"
             └─2768 /binary/pf9-byoh-hostagent-linux-amd64 --bootstrap-kubeconfig /etc/pf9-byohost-agent.service.d/bootstrap-kubeconfig.yaml --namespace du-on-devdp4-default-service --label pcd-kaapi.pf9.io/region=regionone

Jul 15 17:16:58 testbyoh systemd[1]: Started Platform9 Kubernetes Management Agent Service.


status:
  conditions:
  - lastTransitionTime: "2025-07-15T17:16:59Z"
    status: "True"
    type: AgentConnected
  - lastTransitionTime: "2025-07-15T15:42:54Z"
    reason: WaitingForMachineRefToBeAssigned
    severity: Info
    status: "False"
    type: K8sNodeBootstrapSucceeded
  connected: true
  hostinfo:
    architecture: amd64
    osimage: Ubuntu 22.04.2 LTS
    osname: linux
  lastHeartbeatCheckTime: "2025-07-15T17:16:59Z"
  lastHeartbeatTime: "2025-07-15T17:16:59Z"
  network:
  - connected: true
    ipAddrs:
    - 127.0.0.1/8
    - ::1/128
    macAddr: ""
    networkInterfaceName: lo
  - connected: true
    ipAddrs:
    - 10.149.101.101/24
    - fe80::f816:3eff:fe64:dad2/64
    isDefault: true
    macAddr: fa:16:3e:64:da:d2
    networkInterfaceName: ens3

Summary by Bito

This PR implements and enhances a heartbeat mechanism for monitoring byohost connectivity with a configurable timeout period and refined validation logic. The changes include CRD schema updates, configuration improvements through chart updates, improved logging, and condition checks for better host connectivity monitoring. Updates to controller and main application components ensure proper timeout value usage and system reliability.

@bito-code-review
Copy link

bito-code-review bot commented Jul 15, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Heartbeat Mechanism Integration

host_reconciler.go - Introduced heartbeat update logic and added ResourceVersionChangedPredicate to trigger reconciliation based on status changes.

byohost_types.go - Added new fields (LastHeartbeatTime, Connected, LastHeartbeatCheckTime) to support heartbeat monitoring in the CRD.

condition_consts.go - Defined new constants for heartbeat-related conditions to standardize agent connectivity statuses.

manager_extraconfig_patch.yaml - Included a heartbeat timeout argument in the deployment patch configuration to control agent behavior.

sample-values.yaml - Provided a default value for the byohost agent heartbeat timeout to ensure consistent configuration.

infrastructure.cluster.x-k8s.io_byohosts.yaml - Extended the CRD schema with heartbeat fields for detailed status reporting and monitoring.

byohost_controller.go - Enhanced the reconciliation logic to validate heartbeat timestamps and update connectivity conditions based on agent reports.

main.go - Integrated a new flag for heartbeat timeout and wired it into the controller setup to control the heartbeat validation period.

Other Improvements - Agent Package Update

constants.go - Updated the byoh agent package URL to a newer version, ensuring stability and compatibility.

Copy link

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #443e41

Actionable Suggestions - 3
  • main.go - 1
    • Incorrect initialization order for HeartbeatTimeoutPeriod variable · Line 116-116
  • agent/reconciler/host_reconciler.go - 2
    • Missing update to Connected status field · Line 71-73
    • Overly restrictive controller reconciliation predicate · Line 226-226
Additional Suggestions - 1
  • apis/infrastructure/v1beta1/condition_consts.go - 1
    • Inconsistent constant type declaration pattern · Line 92-98
      The new constants `HeartbeatReceivedReason` and `HeartbeatTimeoutReason` are defined as string type, but other similar reason constants in the file are untyped. For consistency, consider removing the explicit string type.
      code suggestion
       @@ -94,6 +94,6 @@
      -	// HeartbeatReceivedReason is used when a heartbeat is received within the timeout.
      -	HeartbeatReceivedReason string = "HeartbeatReceived"
      -
      -	// HeartbeatTimeoutReason is used when a heartbeat is not received within the timeout.
      -	HeartbeatTimeoutReason string = "HeartbeatTimeout"
      +	// HeartbeatReceivedReason is used when a heartbeat is received within the timeout.
      +	HeartbeatReceivedReason = "HeartbeatReceived"
      +
      +	// HeartbeatTimeoutReason is used when a heartbeat is not received within the timeout.
      +	HeartbeatTimeoutReason = "HeartbeatTimeout"
Review Details
  • Files reviewed - 9 · Commit Range: 3ca6981..3ca6981
    • agent/reconciler/host_reconciler.go
    • apis/infrastructure/v1beta1/byohost_types.go
    • apis/infrastructure/v1beta1/condition_consts.go
    • chart-generator/byoh-chart/manager_extraconfig_patch.yaml
    • chart-generator/sample-values.yaml
    • cmd/byohctl/service/constants.go
    • config/crd/bases/infrastructure.cluster.x-k8s.io_byohosts.yaml
    • controllers/infrastructure/byohost_controller.go
    • main.go
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

return ctrl.NewControllerManagedBy(mgr).
For(&infrastructurev1beta1.ByoHost{}).
WithEventFilter(predicates.ResourceNotPaused(ctrl.LoggerFrom(ctx))).
WithEventFilter(predicate.ResourceVersionChangedPredicate{}).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly restrictive controller reconciliation predicate

Adding ResourceVersionChangedPredicate will cause the controller to only reconcile when the resource version changes, potentially missing important reconciliation events. This predicate should be used with caution as it filters out many reconciliation triggers.

Code suggestion
Check the AI-generated fix before applying
Suggested change
WithEventFilter(predicate.ResourceVersionChangedPredicate{}).

Code Review Run #443e41


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever byohost_controller makes changes in the status of byohost, there is version change and we want it to be considered as a reconcile even for host_reconciler.

Copy link

@cruizen cruizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you resolve the feedback from Bito with appropriate comments?


// ByohAgentDebPackageURL is the URL to download the agent package
ByohAgentDebPackageURL = "quay.io/platform9/byoh-agent-deb:0.1.200"
ByohAgentDebPackageURL = "quay.io/platform9/byoh-agent-deb:0.1.276"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we read this tag from a config file / parameter so that we can update it as metadata without changing code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, We should do this through the byohctl.
Which version of agent-service this host should run.

@bito-code-review
Copy link

bito-code-review bot commented Jul 23, 2025

Code Review Agent Run #6ca147

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 3ca6981..d94e7bf
    • controllers/infrastructure/byohost_controller.go
    • main.go
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@mithilarun mithilarun removed their request for review September 9, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants