Skip to content

route_check.py redis client memory usage causing dbmemory errors #4216

@roeebar-arista

Description

@roeebar-arista

Recently we observed 'dbmemory' errors in sonic-mgmt tests. These errors are triggered when a redis client queue consumes more than 10MB of memory. The failures were attributed to route_check.py, which is executed by monit every 5 minutes.

Root cause

When route_check.py runs after a config reload, the check_frr_pending_routes() routine can take a long time to execute (30+ seconds). This delay occurs either because the show ip route json command is slow when there are lots of routes, or because it times out and retries multiple times (FRR_CHECK_RETRIES=3, FRR_WAIT_TIME=15 seconds).

During this delay, the ASIC_DB subscriber remains open and route updates continue to accumulate in the subscriber queue, causing redis client memory to grow beyond the 10MB threshold.

Solution

Close the subscriber immediately after reading ASIC_DB updates and before calling check_frr_pending_routes(). This prevents queue accumulation during the slow FRR check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions