Skip to content

[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

Open
@dougsland

Description

@dougsland

Describe the bug

Let's imagine we had a crash in bluechi-controller or bluechi-agent. For example: https://github.com/containers/eclipse-bluechi/issues/425

  • Should bluechi (same apply to agent) service keep down due: bluechi.service: Start request repeated too quickly. ?
  • keep trying until is able to restore? (i.e: a new config was sent to network) but how long to wait until to try the restart? - What's the minimum possible wait until to restart the node or redeploy? (agents depend on manager node to report)

There are systemd service keys that might help this behavior: StartLimitInterval and StartLimitBurst

Output of systemctl status bluechi-controller:

× hirte.service - Hirte systemd service controller manager daemon
     Loaded: loaded (/usr/local/lib/systemd/system/hirte.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Wed 2023-08-02 05:45:53 UTC; 1s ago
   Duration: 3ms
       Docs: man:hirte(1)
             man:hirte.conf(5)
    Process: 214542 ExecStart=/usr/bin/hirte -c /etc/hirte/hirte.conf (code=exited, status=1/FAILURE)
   Main PID: 214542 (code=exited, status=1/FAILURE)
        CPU: 3ms

Aug 02 05:45:53 control systemd[1]: hirte.service: Scheduled restart job, restart counter is at 5.
Aug 02 05:45:53 control systemd[1]: Stopped Hirte systemd service controller manager daemon.
Aug 02 05:45:53 control systemd[1]: hirte.service: Start request repeated too quickly.
Aug 02 05:45:53 control systemd[1]: hirte.service: Failed with result 'exit-code'.
Aug 02 05:45:53 control systemd[1]: Failed to start Hirte systemd service controller manager daemon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backlogThis is next up in prioritybugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions