Fix rsyslogd memory growth in syncd swss containers over long term#25874
Fix rsyslogd memory growth in syncd swss containers over long term#25874tirupatihemanth wants to merge 1 commit intosonic-net:masterfrom
Conversation
Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR addresses rsyslogd memory growth in the syncd and swss containers by reducing PID churn that was causing rsyslog's imuxsock ratelimiter to accumulate entries for short-lived senders. Two strategies are applied: suppressing unnecessary output from phc_ctl in phcsync.sh, and anchoring syslog messages to a stable PID ($$) in syncd_common.sh and swss.sh.
Changes:
phcsync.shnow runsphc_ctlwith-q -Qflags and redirects stdout to/dev/nullto suppress normal output, with explicit error logging on non-zero exit.syncd_common.shandswss.shdebug()functions uselogger --id=$$to emit all messages under the parent shell's PID, preventing a new ratelimiter entry perloggerinvocation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
platform/mellanox/docker-syncd-mlnx/phcsync.sh |
Adds -q -Q flags to silence normal phc_ctl output; redirects only stdout to /dev/null, removing the previous 2>/dev/null stderr suppression |
files/scripts/syncd_common.sh |
Adds --id=$$ to logger in the debug() function to anchor all log messages to the parent shell's PID |
files/scripts/swss.sh |
Same --id=$$ fix as syncd_common.sh for the debug() function in the swss service script |
| # set CLOCK_REALTIME | ||
| "$PHC_CTL" "$dev" set 2>/dev/null | ||
| # Keep successful syncs silent to avoid rsyslogd ratelimit memory issue due to PID churn. | ||
| "$PHC_CTL" -q -Q "$dev" set >/dev/null |
There was a problem hiding this comment.
The removal of 2>/dev/null from the phc_ctl invocation means that when phc_ctl fails, its own error output on stderr will be forwarded to syslog via supervisord (which has stderr_logfile=syslog in supervisord.conf.j2). Combined with the custom error message echoed to stderr at line 67, this will produce duplicate error entries in syslog on failure: one from phc_ctl's own stderr output and one from the explicit echo "Error: Failed to sync clock for $dev..." at line 67.
To avoid this double-logging on failure, the 2>/dev/null should be added back (or replaced with 2>&1) so that phc_ctl's own stderr is still suppressed while the explicit error message at line 67 provides a cleaner, controlled error notification. Alternatively, phc_ctl's stderr could be captured and included in the custom error message.
| "$PHC_CTL" -q -Q "$dev" set >/dev/null | |
| "$PHC_CTL" -q -Q "$dev" set >/dev/null 2>&1 |
Why I did it
Work item tracking
How I did it
phc_ctl -q -Q ... >/dev/null 2>&1logger -i "$$" -- "$1"in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).logger commands
before
After
How to verify it
Which release branch to backport (provide reason below if selected)