-
Notifications
You must be signed in to change notification settings - Fork 799
Description
Recently we observed 'dbmemory' errors in sonic-mgmt tests. These errors are triggered when a redis client queue consumes more than 10MB of memory. The failures were attributed to route_check.py, which is executed by monit every 5 minutes.
Root cause
When route_check.py runs after a config reload, the check_frr_pending_routes() routine can take a long time to execute (30+ seconds). This delay occurs either because the show ip route json command is slow when there are lots of routes, or because it times out and retries multiple times (FRR_CHECK_RETRIES=3, FRR_WAIT_TIME=15 seconds).
During this delay, the ASIC_DB subscriber remains open and route updates continue to accumulate in the subscriber queue, causing redis client memory to grow beyond the 10MB threshold.
Solution
Close the subscriber immediately after reading ASIC_DB updates and before calling check_frr_pending_routes(). This prevents queue accumulation during the slow FRR check.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status