Skip to content

fix the abnormal error of port breakout #3876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

burnCalories
Copy link

@burnCalories burnCalories commented May 8, 2025

What I did

When my SONiC device was port breakout, I found some abnormal errors that caused docker to restart abnormally. I tested it repeatedly and found that the abnormal ports were 8-12, 96-100, and other cases involving digital carry. The error displayed by monitoring sairedis.rec is that the order of the ports re-issued after the port breakout is abnormal. For example, breakout Ethernet8 4x100G, the re-issued port sequence should be Ethernet8 10 12 14. There is a sequence check in syncd, but the issued port sequence is Ethernet10 12 14 8. Since the content here is a string, I suspect that the configdb.mod_config function uses the default sorting method when processing the issuance of new ports. The default string sorting is based on lexicographical order, which is equivalent to comparing according to the ASCII value of the character. When comparing two strings, start with the first character and compare character by character until the first different character is found.

How I did it

I decided to split the configdb.mod_config(sonic_cfggen.FormatConverter.output_to_db(data)) function. The content of sonic_cfggen.FormatConverter.output_to_db(data) is {'PORT': {'Ethernet96': {'alias': 'Eth13/1(Port13)', 'lanes': '129,130', 'speed': '100000', 'index': '13'}, 'Ethernet98': {'alias': 'Eth13/2(Port13)', 'lanes': '131,132', 'speed': '100000', 'index': '13'}, 'Ethernet100': {'alias': 'Eth13/3(Port13)', 'lanes': '133,134', 'speed': '100000', 'index': '13'}, 'Ethernet102': {'alias': 'Eth13/4(Port13)', 'lanes': '135,136', 'speed': '100000', 'index': '13'}}} The configdb.mod_config function is used to process and sort all at once and write to the redis-4 database. Now I changed it to

db = sonic_cfggen.FormatConverter.output_to_db(data)
for port, config in db['PORT'].items():
single_port_config = {'PORT': {port: config}}
configdb.mod_config(single_port_config) 

batch processing and writing

How to verify it

Execution port breakout successful! Problem solved

Previous command output (if the output of a command-line utility has changed)

Previous error printing

Mar 19 15:28:16.288205 9716-203 INFO ConfigMgmt: shutdown Interfaces: {'PORT': {'Ethernet8': {'admin_status': 'down'}}}
Mar 19 15:28:16.288383 9716-203 INFO ConfigMgmt: Writing in Config DB
Mar 19 15:28:16.288576 9716-203 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet8': {'admin_status': 'down'}}}
Mar 19 15:28:16.289385 9716-203 INFO ConfigMgmt: Writing in Config DB
Mar 19 15:28:16.289825 9716-203 WARNING pmon#xcvrd: $$$ Ethernet8 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth2(Port2)', 'index': '2', 'lanes': '65,66,67,68,69,70,71,72', 'speed': '400000', 'admin_status': 'down'}
Mar 19 15:28:16.289825 9716-203 WARNING pmon#xcvrd: *** Ethernet8CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth2(Port2)', 'index': '2', 'lanes': '65,66,67,68,69,70,71,72', 'speed': '400000', 'admin_status': 'down', 'key': 'Ethernet8', 'asic_id': 0, 'op': 'SET'}
Mar 19 15:28:16.289899 9716-203 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet8': None}}
Mar 19 15:28:16.290607 9716-203 INFO ConfigMgmt: Verify Port Deletion from Asic DB, Wait...
Mar 19 15:28:16.291322 9716-203 INFO ConfigMgmt: Check Key in Asic DB: ASIC_STATE:SAI_OBJECT_TYPE_PORT:oid:0x1000000000975
Mar 19 15:28:16.292779 9716-203 NOTICE swss#portmgrd: :- doTask: Configure Ethernet8 admin status to down
Mar 19 15:28:16.292779 9716-203 NOTICE swss#portmgrd: :- doTask: Delete Port: Ethernet8
Mar 19 15:28:16.293538 9716-203 NOTICE lldp#lldpmgrd[39]: :- pops: Miss table key PORT_TABLE:Ethernet8, possibly outdated
Mar 19 15:28:16.293664 9716-203 NOTICE swss#orchagent: :- doPortTask: Deleting Port Ethernet8
Mar 19 15:28:16.294018 9716-203 WARNING pmon#xcvrd: $$$ Ethernet8 handle_port_update_event() : op=DEL DB:CONFIG_DB Table:PORT fvp {}
Mar 19 15:28:16.294018 9716-203 WARNING pmon#xcvrd: *** Ethernet8CONFIG_DBPORT handle_port_update_event() fvp {'index': '-1', 'key': 'Ethernet8', 'asic_id': 0, 'op': 'DEL'}
Mar 19 15:28:16.294628 9716-203 ERR syncd#syncd: [none] brcm_sai_get_port_attribute:2568 Error processing port attribute 108
Mar 19 15:28:16.295029 9716-203 NOTICE swss#orchagent: :- deInitPort: De-Initialized port Ethernet8
Mar 19 15:28:16.295045 9716-203 NOTICE swss#orchagent: :- doPortTask: Removing hostif d000000000997 for Port Ethernet8
Mar 19 15:28:16.297159 9716-203 NOTICE swss#portsyncd: :- onMsg: nlmsg type:17 key:Ethernet8 admin:0 oper:0 addr:90:2d:77:d0:08:00 ifindex:147 master:0
Mar 19 15:28:16.297159 9716-203 WARNING pmon#xcvrd: $$$ Ethernet8 handle_port_update_event() : op=DEL DB:STATE_DB Table:PORT_TABLE fvp {}
Mar 19 15:28:16.297159 9716-203 WARNING pmon#xcvrd: *** Ethernet8STATE_DBPORT_TABLE handle_port_update_event() fvp {'index': '-1', 'key': 'Ethernet8', 'asic_id': 0, 'op': 'DEL'}
Mar 19 15:28:16.297285 9716-203 NOTICE swss#portsyncd: :- onMsg: Delete Ethernet8(ok) from state db
Mar 19 15:28:16.335493 9716-203 NOTICE swss#orchagent: :- setPortAdminStatus: Set admin status DOWN host_tx_ready to false for port Ethernet8
Mar 19 15:28:16.335678 9716-203 WARNING pmon#xcvrd: $$$ Ethernet8 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'host_tx_ready': 'false'}
Mar 19 15:28:16.335761 9716-203 WARNING pmon#xcvrd: *** Ethernet8STATE_DBPORT_TABLE handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'key': 'Ethernet8', 'asic_id': 0, 'op': 'SET'}
Mar 19 15:28:16.335994 9716-203 INFO syncd#syncd: [none] brcm_sai_set_port_attribute:543 Set bcm port(44) admin(0)
Mar 19 15:28:16.337350 9716-203 ERR syncd#syncd: [none] brcm_sai_get_port_attribute:2568 Error processing port attribute 108
Mar 19 15:28:16.337778 9716-203 NOTICE swss#orchagent: :- meta_port_remove_validation: all objects related to port oid:0x1000000000975 are in default state, can be remove
Mar 19 15:28:16.338247 9716-203 ERR syncd#syncd: [none] brcm_sai_get_port_attribute:2568 Error processing port attribute 108
Mar 19 15:28:16.338247 9716-203 NOTICE syncd#syncd: :- collectPortRelatedObjects: obtained 33 port oid:0x10000002c related RIDs
Mar 19 15:28:16.346773 9716-203 NOTICE syncd#syncd: :- postPortRemove: removed 8 lanes from redis lane map for port RID oid:0x10000002c
Mar 19 15:28:16.346773 9716-203 NOTICE syncd#syncd: :- postPortRemove: post port remove actions succeeded
Mar 19 15:28:16.347341 9716-203 NOTICE swss#orchagent: :- post_port_remove: success executing post port remove actions: oid:0x1000000000975
Mar 19 15:28:16.347365 9716-203 NOTICE swss#orchagent: :- removePort: Remove port 1000000000975
Mar 19 15:28:16.347365 9716-203 NOTICE swss#orchagent: :- removePortFromLanesMap: Removing port Ethernet8 from lanes map
Mar 19 15:28:16.347376 9716-203 NOTICE swss#orchagent: :- removePortFromPortListMap: Removing port-id 1000000000975 from port list map
Mar 19 15:28:16.347376 9716-203 NOTICE swss#orchagent: :- doPortTask: Removed port Ethernet8
Mar 19 15:28:17.295946 9716-203 INFO ConfigMgmt: Check Key in Asic DB: ASIC_STATE:SAI_OBJECT_TYPE_PORT:oid:0x1000000000975
Mar 19 15:28:17.296191 9716-203 INFO ConfigMgmt: Writing in Config DB
Mar 19 15:28:17.296372 9716-203 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet8': {'alias': 'Eth2/1(Port2)', 'lanes': '65,66', 'speed': '100000', 'index': '2'}, 'Ethernet10': {'alias': 'Eth2/2(Port2)', 'lanes': '67,68', 'speed': '100000', 'index': '2'}, 'Ethernet12': {'alias': 'Eth2/3(Port2)', 'lanes': '69,70', 'speed': '100000', 'index': '2'}, 'Ethernet14': {'alias': 'Eth2/4(Port2)', 'lanes': '71,72', 'speed': '100000', 'index': '2'}}}
Mar 19 15:28:17.297898 9716-203 WARNING pmon#xcvrd: $$$ Ethernet10 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth2/2(Port2)', 'index': '2', 'lanes': '67,68', 'speed': '100000'}
Mar 19 15:28:17.298113 9716-203 WARNING pmon#xcvrd: *** Ethernet10CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth2/2(Port2)', 'index': '2', 'lanes': '67,68', 'speed': '100000', 'key': 'Ethernet10', 'asic_id': 0, 'op': 'SET'}
Mar 19 15:28:17.298737 9716-203 ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to KeyError(None)
Mar 19 15:28:17.300058 9716-203 ERR pmon#xcvrd: Traceback (most recent call last):
Mar 19 15:28:17.300119 9716-203 ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1717, in run
Mar 19 15:28:17.300119 9716-203 ERR pmon#xcvrd: self.task_worker()
Mar 19 15:28:17.300379 9716-203 ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1427, in task_worker
Mar 19 15:28:17.300379 9716-203 ERR pmon#xcvrd: self.port_dict[lport]['host_tx_ready'] = self.get_host_tx_status(lport)
Mar 19 15:28:17.300379 9716-203 ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1299, in get_host_tx_status
Mar 19 15:28:17.300379 9716-203 ERR pmon#xcvrd: state_port_tbl = self.xcvr_table_helper.get_state_port_tbl(asic_index)
Mar 19 15:28:17.300379 9716-203 ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2631, in get_state_port_tbl
Mar 19 15:28:17.300471 9716-203 ERR pmon#xcvrd: return self.state_port_tbl[asic_id]
Mar 19 15:28:17.300471 9716-203 ERR pmon#xcvrd: KeyError: None
Mar 19 15:28:17.300509 9716-203 ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None)
Mar 19 15:28:17.301906 9716-203 ERR pmon#xcvrd: Exiting main loop as child thread raised exception!
Mar 19 15:28:17.303318 9716-203 INFO snmp#supervisord: portmonitor_trap Ethernet10 status change admin_status oper_status
Mar 19 15:28:17.305351 9716-203 INFO pmon#supervisord 2024-03-19 15:28:17,304 INFO exited: xcvrd (terminated by SIGKILL; not expected)
Mar 19 15:28:17.306615 9716-203 INFO snmp#supervisord: portmonitor_trap will trap ip: 10.1.2.133:0
Mar 19 15:28:17.306680 9716-203 INFO snmp#supervisord: portmonitor_trap tarp to: 10.1.2.133:0
Mar 19 15:28:17.306680 9716-203 INFO snmp#supervisord: portmonitor_trap will trap ip: 10.1.2.133:162
Mar 19 15:28:17.306680 9716-203 INFO snmp#supervisord: portmonitor_trap dial udp: address 65536: invalid port
Mar 19 15:28:17.306680 9716-203 INFO snmp#supervisord: portmonitor_trap tarp to: 10.1.2.133:162
Mar 19 15:28:17.306680 9716-203 INFO snmp#supervisord: portmonitor_trap will trap ip: 10.1.2.133:65536
Mar 19 15:28:17.309255 9716-203 ERR syncd#syncd: [none] brcm_sai_create_port:3232 Invalid lane list passed
Mar 19 15:28:17.309255 9716-203 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Mar 19 15:28:17.310466 9716-203 ERR swss#orchagent: :- addPortBulk: Failed to create ports with bulk operation, rv:-1
Mar 19 15:28:17.310491 9716-203 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_PORT, status: SAI_STATUS_FAILURE
Mar 19 15:28:17.310491 9716-203 NOTICE swss#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
Mar 19 15:28:17.311398 9716-203 NOTICE syncd#syncd: :- processNotifySyncd: Invoking SAI failure dump
Mar 19 15:28:17.319012 9716-203 NOTICE swss#orchagent: :- sai_redis_notify_syncd: invoked DUMP succeeded
Mar 19 15:28:17.319012 9716-203 ERR swss#orchagent: :- handleSaiFailure: MercuryNos encountered an unrecoverable exception 0.
Mar 19 15:28:17.319012 9716-203 ERR swss#orchagent: :- addPortBulk: PortsOrch bulk create failure
Mar 19 15:28:17.319301 9716-203 INFO swss#supervisord: message repeated 4 times: [ orchagent ]
Mar 19 15:28:17.319301 9716-203 INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
Mar 19 15:28:17.319328 9716-203 INFO swss#supervisord: orchagent what(): :- addPortBulk: PortsOrch bulk create failure
Mar 19 15:28:17.900866 9716-203 INFO swss#supervisord 2024-03-19 15:28:17,900 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Mar 19 15:28:18.000783 9716-203 INFO coredump_gen_handler.py[671145]: Global rate_limit_interval period has not passed. Techsupport Invocation is skipped
Mar 19 15:28:18.309159 9716-203 INFO pmon#supervisord 2024-03-19 15:28:18,308 INFO spawned: 'xcvrd' with pid 128875
Mar 19 15:28:18.417562 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,417 INFO exited: enable_counters (exit status 0; expected)
Mar 19 15:28:18.421262 9716-203 INFO swss#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
Mar 19 15:28:18.421542 9716-203 NOTICE swss#supervisor-proc-exit-listener: :- publish: EVENT_PUBLISHED: {"sonic-events-host:process-exited-unexpectedly":{"ctr_name":"swss","process_name":"orchagent","timestamp":"2024-03-19T15:28:18.421136Z"}}
Mar 19 15:28:18.426676 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,422 WARN received SIGTERM indicating exit request
Mar 19 15:28:18.426676 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,422 INFO waiting for supervisor-proc-exit-listener, rsyslogd, portsyncd, coppmgrd, arp_update, ndppd, neighsyncd, vlanmgrd, intfmgrd, portmgrd, buffermgrd, vrfmgrd, nbrmgrd, vxlanmgrd, fdbsyncd, tunnelmgrd, containercfgd to die
Mar 19 15:28:18.434706 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,434 INFO stopped: containercfgd (exit status 143)
Mar 19 15:28:18.435698 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,435 INFO stopped: tunnelmgrd (terminated by SIGTERM)
Mar 19 15:28:18.436635 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,436 INFO stopped: fdbsyncd (terminated by SIGTERM)
Mar 19 15:28:18.437758 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,437 INFO stopped: vxlanmgrd (terminated by SIGTERM)
Mar 19 15:28:18.438987 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,438 INFO stopped: nbrmgrd (terminated by SIGTERM)
Mar 19 15:28:18.440591 9716-203 INFO swss#supervisord 2024-03-19 15:28:18,440 INFO stopped: vrfmgrd (terminated by SIGTERM)

New command output (if the output of a command-line utility has changed)

Copy link

linux-foundation-easycla bot commented May 8, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants