-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Issue #266: WR node hangs in endless BOOTP loop, when WR PTP link is not established
dietrichb commented on Feb 2, 2021 In rare cases it might happen that a WR node is unable to establish a WR PTP link. In this case the following symptoms are commonly observed. node issues a BOOTP request via the network ~once every second (BOOTP server replies with IP) when reading the relevant register of the WR core, the IP matches the one from the BOOTP server reply: it seems the reply from the BOOTP server has been received by the node (I believer the console claims 'in training' for IP) the node continues issuing BOOTP requests until the WR PTP link is successfully established I am not sure if this is an issue with the way how the WR core is instantiated on Arria II devices or of this is an issue of the WR core itself. Should we file this issue on OHWR?
Issue #256: connection between White Rabbbit node and switch unreliable after reboot of WRS
dietrichb commented on Feb 2, 2021 • a variety of symptoms is observed when rebooting a WRS to which is WR node is connected. This can happen during maintenance, WRS reboot on purpose, or when recovering from a power-cut. no White Rabbit lock, occasionally; WRS port claims WA_MSG (waiting for message); node is accessible via the network no Ethernet link; rarely; WRS ports claims 'link down'; node inaccessible 'hang up'; WRS port claims 'WA_MSG' and node MAC is detected by the WRS; node inaccessible via the network In all cases, power-cycling the WR node helps In cases '1' and '2' it is usually possible to recover by 'eb-reset' of the node. In case '1', forcing a sequence port up->down->up on the WRS helps in some cases In case '2', forcing a sequence port up->down->up on the WRS does not help In case '3', the node seems to be almost dead. Access to the node is possible neither from the timing network nor from the host system (no chance for eb-reset). Forcing port ->down->up on the WRS does not help. Autorecovery of the WR node via the 'watchdog' implemented on the SCU does not work. A powercycle helps.
Issue #111: WR port not reachable after power cycle of WR switch
dietrichb commented on Dec 15, 2018 symptoms WRS ports shows MAC and ptp state 6 (looks good) node eb-mon shows LINK_UP and TRACKING (looks good) node not reachable via timing network (all EB requests time out) when after reboot power cycle of WRS it may take a few power cycles of the WRS to trigger the bug workaround power cycle or restart FPGA using eb-reset
dietrichb commented on Aug 20, 2019 solved for Arria5 based platforms requires major work (PHY control update) for Arria II based devices (SCU and VETAR)
Issue #51: WR port of node remains down after power cycle of node AND WR switch
dietrichb commented on Oct 23, 2017
There seems to be an annoying bug that seems to occur when a node (SCU) and WRS are switched-on simultaneously after a power cut.
The symptoms are the following
PPS LED not blinking, activity LED not blinking, link LED off
eb-mon -v dev/wbm0 shows "LINK_DOWN" and "NO_SYNC"
eb-console dev/wbm0 causes freezing of the ssh shell
node fails to get an IP via BOOTP
(but the WRS shows both "link up" and "activity" LEDs)
node is not accessible via the WR network
resetting the FPGA of the node via its Reset controller is possible and cures the symptom.
Suspicion: The FPGA of the node is much faster with "booting" compared to the WRS. It somehow misses to detect "link up" after WRS starts and remains trapped in "link down" state.
This issue is causing real annoyance in cases were major parts of the facility need to be recovered after a major power-cut.
Maybe this is linked to another issue:
dietrichb commented on Aug 20, 2019 solved for Arria 5 not solved for Arria II (SCU and Vetar) a fix for Arria would require a major effort
dietrichb commented on Feb 2, 2021 update (January 2021): in rare cases this is also observed with fallout gateware
Metadata
Metadata
Assignees
Labels
No labels