Skip to content

DomInfoUpdateTask polling can take longer than DOM_INFO_UPDATE_PERIOD_SECS due to 1 sec wait after every iteration #759

@aditya-nexthop

Description

@aditya-nexthop

DomInfoUpdateTask polling loop is structured as follows:

            for physical_port, logical_ports in self.port_mapping.physical_to_logical.items():
                # Process pending link change events and update diagnostic
                # information in the database. Ensures timely handling of link
                # change events and avoids duplicate updates in case of breakout ports.
                port_change_observer.handle_port_update_event() <<=============
                # Process each port in the pending link change set based on the
                # corresponding time to update the DB after the link change.
                for link_changed_port in list(self.link_change_affected_ports.keys()):
                    if self.task_stopping_event.is_set():
                        self.log_notice("Stop event generated during DOM link change event processing")
                        break
                    if self.link_change_affected_ports[link_changed_port] <= datetime.datetime.now():
                        self.log_notice(f"Updating port db diagnostics post link change for port {link_changed_port}")
                        self.update_port_db_diagnostics_on_link_change(link_changed_port)
                        del self.link_change_affected_ports[link_changed_port]

                if self.task_stopping_event.is_set():
                    self.log_notice("Stop event generated during DOM monitoring loop")
                    break

                if not is_periodic_db_update_needed:
                    # If periodic db update is not needed, skip the rest of the loop
                    continue

                # Get the first logical port name since it corresponds to the first subport
                # of the breakout group
                logical_port_name = logical_ports[0]
...

port_change_observer.handle_port_update_event() call goes to xcvrd/xcvrd_utilities/port_event_helper.py:

    def handle_port_update_event(self):
        """
        Select PORT update events, notify the observers upon a port update in CONFIG_DB
        or a XCVR insertion/removal in STATE_DB

        Returns:
            bool: True if there's at least one update event; False if there's no update event.
        """
        has_event = False
        if not self.stop_event.is_set():
            (state, _) = self.sel.select(SELECT_TIMEOUT_MSECS)
            if state == swsscommon.Select.TIMEOUT:
                return has_event
            if state != swsscommon.Select.OBJECT:
                self.logger.log_warning('sel.select() did not return swsscommon.Select.OBJECT')
                return has_event
...

Here, SELECT_TIMEOUT_MSECS = 1000

So when there are no port change events, we spend 1 sec waiting for an event in handle_port_update_event.
On systems with a large number of transceivers or with many VDM enabled transceivers (where polling takes longer), we are unable to poll information from all transceivers in DOM_INFO_UPDATE_PERIOD_SECS (currently 60 secs)

This leads to flags and DOM information updating slower than 60 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions