Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 9, 2026

Description

Refactored the stormond daemon to establish a single connection to CONFIG_DB at initialization instead of reconnecting every time get_configdb_intervals() is called. This prevents resource leaks and excessive open connections.

Changes made:

  • Moved CONFIG_DB connection establishment from get_configdb_intervals() to __init__() method
  • Added try-except error handling for CONFIG_DB connection with graceful degradation
  • Updated get_configdb_intervals() to check for and reuse the shared connection
  • Added logging for connection success/failure and configuration retrieval errors
  • Added comprehensive comments explaining the design rationale
  • Updated existing test testget_configdb_intervals to verify no redundant connections
  • Added 3 new unit tests:
    • test_configdb_connection_established_in_init: Validates connection at initialization
    • test_configdb_connection_failure_in_init: Tests error handling for connection failures
    • test_get_configdb_intervals_with_no_connection: Tests graceful degradation when CONFIG_DB unavailable

Files changed:

  • sonic-stormond/scripts/stormond (+32 lines, -7 lines)
  • sonic-stormond/tests/test_DaemonStorage.py (+77 lines, -4 lines)

Motivation and Context

Previously, get_configdb_intervals() created a new Redis connection to CONFIG_DB on every invocation. In long-running daemon processes, this pattern causes:

  • Resource leaks from accumulated connections
  • Potential connection exhaustion
  • Unnecessary overhead from repeated connection establishment

The single connection pattern is the standard approach for long-running processes and prevents these issues.

How Has This Been Tested?

  • Unit Tests: All existing tests pass with updated connection model
  • New Tests: Added 3 comprehensive tests covering connection establishment, failure handling, and graceful degradation
  • Code Review: Automated review completed with no issues found
  • Security Scan: CodeQL analysis passed with 0 vulnerabilities
  • Syntax Validation: Python syntax validation passed for all modified files
  • Manual Validation: Verified that daemon continues with default intervals when CONFIG_DB unavailable

Additional Information (Optional)

Backward Compatibility: Fully backward compatible. When CONFIG_DB is available, daemon behavior is identical to previous implementation. When CONFIG_DB is unavailable, the daemon now gracefully degrades to use default intervals instead of failing.

Key Improvements:

  • Resource efficiency: Single connection vs. N connections over daemon lifetime
  • Stability: Eliminates connection leak risk
  • Reliability: Better error handling with fallback to defaults
  • Maintainability: Clear documentation of connection lifecycle
Original prompt

Refactor the stormond daemon to establish a single connection to CONFIG_DB at initialization instead of reconnecting every time get_configdb_intervals() is called. The CONFIG_DB connection should be reused for all operations during the daemon's lifetime. Previously, get_configdb_intervals() would re-establish a new Redis connection each time it was called, which could cause resource leaks and excessive open connections.

Steps:

  • Move self.config_db = daemon_base.db_connect("CONFIG_DB") from get_configdb_intervals() to the init() method of DaemonStorage.
  • Remove redundant connection code from get_configdb_intervals().
  • Ensure get_configdb_intervals() uses the class member self.config_db.
  • Add comments explaining the design change.
  • Verify that the rest of the code uses the shared connection correctly.
  • Add a log message in init() indicating that the CONFIG_DB connection is established once.
  • Test for error handling in case the connection fails at init.

This change will prevent resource exhaustion and ensure stable Redis connectivity for the daemon.

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 9, 2026

CLA Not Signed

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gechiang
Copy link
Contributor

gechiang commented Jan 9, 2026

@copilot can you share the local commits so I can see the changed files?

Copy link
Contributor Author

Copilot AI commented Jan 9, 2026

@copilot can you share the local commits so I can see the changed files?

The local commit is e11447f. Here are the changed files:

Files Changed:

  • sonic-stormond/scripts/stormond (+32, -7 lines)
  • sonic-stormond/tests/test_DaemonStorage.py (+77, -4 lines)

Summary:

  • Moved CONFIG_DB connection from get_configdb_intervals() to __init__()
  • Added error handling for connection failures with graceful degradation
  • Updated get_configdb_intervals() to reuse the shared connection
  • Added 3 new unit tests for connection behavior

You can view the full diff with:

git show e11447f

@gechiang
Copy link
Contributor

gechiang commented Jan 9, 2026

@copilot just show me the changes needed for sonic-stormond/scripts/stormond and sonic-stormond/tests/test_DaemonStorage.py
I will raise the PR myself with your changes

Copy link
Contributor Author

Copilot AI commented Jan 9, 2026

@copilot just show me the changes needed for sonic-stormond/scripts/stormond and sonic-stormond/tests/test_DaemonStorage.py
I will raise the PR myself with your changes

Here are the complete changes for both files:

1. sonic-stormond/scripts/stormond

Changes in __init__() method (around line 56):

# BEFORE:
        self.state_db = None
        self.config_db = None
        self.device_table = None
        self.storage = StorageDevices()

# AFTER:
        self.state_db = None
        self.device_table = None
        self.storage = StorageDevices()
        
        # Establish a single CONFIG_DB connection at initialization to prevent
        # resource exhaustion from repeated reconnections during daemon lifetime.
        # This connection will be reused by get_configdb_intervals() and any other
        # methods that need to access CONFIG_DB.
        self.config_db = None
        try:
            self.config_db = daemon_base.db_connect("CONFIG_DB")
            self.log.log_notice("CONFIG_DB connection established successfully")
        except Exception as e:
            self.log.log_error("Failed to establish CONFIG_DB connection: {}".format(str(e)))
            # Continue initialization even if CONFIG_DB connection fails
            # The daemon will use default intervals if CONFIG_DB is unavailable

Changes in get_configdb_intervals() method (around line 111):

# BEFORE:
    def get_configdb_intervals(self):
        self.config_db = daemon_base.db_connect("CONFIG_DB")
        config_info = dict(self.config_db.hgetall('STORMOND_CONFIG|INTERVALS'))
        self.timeout = int(config_info.get('daemon_polling_interval', STORMOND_PERIODIC_STATEDB_SYNC_SECS))
        self.fsstats_sync_interval = int(config_info.get('fsstats_sync_interval', STORMOND_SYNC_TO_DISK_SECS))

        self.log.log_notice("Polling Interval set to {} seconds".format(self.timeout))
        self.log.log_notice("FSIO JSON file Interval set to {} seconds".format(self.fsstats_sync_interval))

# AFTER:
    # This function is used to configure the polling and sync intervals for the daemon.
    # Uses the CONFIG_DB connection established once during initialization to avoid
    # creating redundant connections that could lead to resource leaks.
    def get_configdb_intervals(self):
        # Use the shared CONFIG_DB connection established in __init__
        if self.config_db is None:
            self.log.log_warning("CONFIG_DB connection not available, using default intervals")
            return
            
        try:
            config_info = dict(self.config_db.hgetall('STORMOND_CONFIG|INTERVALS'))
            self.timeout = int(config_info.get('daemon_polling_interval', STORMOND_PERIODIC_STATEDB_SYNC_SECS))
            self.fsstats_sync_interval = int(config_info.get('fsstats_sync_interval', STORMOND_SYNC_TO_DISK_SECS))

            self.log.log_notice("Polling Interval set to {} seconds".format(self.timeout))
            self.log.log_notice("FSIO JSON file Interval set to {} seconds".format(self.fsstats_sync_interval))
        except Exception as e:
            self.log.log_error("Failed to retrieve CONFIG_DB intervals: {}".format(str(e)))
            self.log.log_notice("Using default intervals: polling={}, fsstats_sync={}".format(
                STORMOND_PERIODIC_STATEDB_SYNC_SECS, STORMOND_SYNC_TO_DISK_SECS))

2. sonic-stormond/tests/test_DaemonStorage.py

Replace the existing testget_configdb_intervals test and add 3 new tests (around line 109):

    @patch('sonic_py_common.daemon_base.db_connect')
    def testget_configdb_intervals(self, mock_db_connect):
        # Test that get_configdb_intervals() reuses the connection from __init__
        # and does not create a new connection
        mock_db = MagicMock()
        mock_db.hgetall = MagicMock(return_value={})
        mock_db_connect.return_value = mock_db

        # Connection should be made once in __init__
        stormon_daemon = stormond.DaemonStorage(log_identifier)
        initial_call_count = mock_db_connect.call_count
        
        # get_configdb_intervals() should NOT create a new connection
        stormon_daemon.get_configdb_intervals()
        
        # Verify no additional connection was made
        assert mock_db_connect.call_count == initial_call_count

    @patch('sonic_py_common.daemon_base.db_connect')
    def test_configdb_connection_established_in_init(self, mock_db_connect):
        # Test that CONFIG_DB connection is established during initialization
        mock_config_db = MagicMock()
        mock_state_db = MagicMock()
        
        # First call returns STATE_DB, second call returns CONFIG_DB
        mock_db_connect.side_effect = [mock_state_db, mock_config_db]
        
        stormon_daemon = stormond.DaemonStorage(log_identifier)
        
        # Verify db_connect was called twice (once for STATE_DB, once for CONFIG_DB)
        assert mock_db_connect.call_count == 2
        # Verify CONFIG_DB connection is stored
        assert stormon_daemon.config_db is not None

    @patch('sonic_py_common.daemon_base.db_connect')
    def test_configdb_connection_failure_in_init(self, mock_db_connect):
        # Test that daemon continues even if CONFIG_DB connection fails
        mock_state_db = MagicMock()
        
        # First call returns STATE_DB successfully, second call fails for CONFIG_DB
        def side_effect_fn(db_name):
            if db_name == "STATE_DB":
                return mock_state_db
            elif db_name == "CONFIG_DB":
                raise Exception("Connection failed")
        
        mock_db_connect.side_effect = side_effect_fn
        
        # Daemon should initialize successfully even if CONFIG_DB fails
        stormon_daemon = stormond.DaemonStorage(log_identifier)
        
        # Verify CONFIG_DB connection is None after failure
        assert stormon_daemon.config_db is None
        # Verify daemon still has STATE_DB connection
        assert stormon_daemon.state_db is not None

    @patch('sonic_py_common.daemon_base.db_connect')
    def test_get_configdb_intervals_with_no_connection(self, mock_db_connect):
        # Test that get_configdb_intervals() handles missing CONFIG_DB connection gracefully
        mock_state_db = MagicMock()
        
        # Only STATE_DB connection succeeds, CONFIG_DB fails
        def side_effect_fn(db_name):
            if db_name == "STATE_DB":
                return mock_state_db
            elif db_name == "CONFIG_DB":
                raise Exception("Connection failed")
        
        mock_db_connect.side_effect = side_effect_fn
        
        stormon_daemon = stormond.DaemonStorage(log_identifier)
        
        # Store original timeout values
        original_timeout = stormon_daemon.timeout
        original_sync_interval = stormon_daemon.fsstats_sync_interval
        
        # Call get_configdb_intervals() with no CONFIG_DB connection
        stormon_daemon.get_configdb_intervals()
        
        # Verify default values are retained
        assert stormon_daemon.timeout == original_timeout
        assert stormon_daemon.fsstats_sync_interval == original_sync_interval

Summary:

  • stormond: +32 lines, -7 lines (moved CONFIG_DB connection to init with error handling)
  • test_DaemonStorage.py: +77 lines, -4 lines (updated 1 test, added 3 new tests)

Copilot AI changed the title [WIP] Refactor daemon to reuse CONFIG_DB connection [stormond] Refactor daemon to reuse CONFIG_DB connection Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants