Skip to content

Fix issue: pmon services's restart count is not cleared during config reload#4314

Open
stephenxs wants to merge 1 commit intosonic-net:masterfrom
stephenxs:fix-pmon-restart-count-not-clear
Open

Fix issue: pmon services's restart count is not cleared during config reload#4314
stephenxs wants to merge 1 commit intosonic-net:masterfrom
stephenxs:fix-pmon-restart-count-not-clear

Conversation

@stephenxs
Copy link
Collaborator

What I did

Currently, when "config reload" is executed, services' restart count are cleared to avoid reaching restart limit. This is done by listing all services using command systemctl list-dependencies --plain .target. However, this doesn't include pmon service, neither all other services that don't have WantedBy=sonic.target, which means pmon's start count is not cleared.

How I did it

Sometimes pmon fails to restart due to reaching start limit (3 times in 1200 seconds). The pmon service can be started by featured, syncd during config reload. Before multi-ASIC, pmon depends on syncd. The dependency is removed after multi-ASIC, which means pmon can restart immediately triggered by sonic.target which is once more restarting. As a result the pmon service is more likely to reach the restart limit.

How to verify it

Clear restart count also for services that have reverse dependency on sonic.target.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

… reload

What I did
Currently, when "config reload" is executed, services' restart count are cleared to avoid reaching restart limit. This is done by listing all services using command systemctl list-dependencies --plain .target.
However, this doesn't include pmon service, neither all other services that don't have WantedBy=sonic.target, which means pmon's start count is not cleared.

After multi-ASIC PRs are merged, there is a high probability that pmon fails to restart due to reaching start limit (3 times in 1200 seconds).
The pmon service can be started by featured, syncd during config reload.
Before multi-ASIC, pmon depends on syncd. The dependency is removed after multi-ASIC, which means pmon can restart immediately triggered by sonic.target which is once more restarting. As a result the pmon service is more likely to reach the restart limit.

How I did it
Clear restart count also for services that have reverse dependency on sonic.target.

How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs stephenxs marked this pull request as draft February 27, 2026 06:44
@stephenxs stephenxs marked this pull request as ready for review March 2, 2026 07:16
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants