Skip to content

ethernet: stm32: Data reception not working until register rewrite #107024

@icsys-omh

Description

@icsys-omh

Describe the bug

On a custom board using an STM32F767ZI microcontroller together with a TI DP83826E PHY, we let the user control the Ethernet speed and duplex. By default, the board uses auto-negotiation, but some users need to be able to set a fixed speed. We therefore have a stored setting with the desired Ethernet speed/duplex, and apply this on bootup using the phy_configure_link() function, potentially with the PHY_FLAG_AUTO_NEGOTIATION_DISABLED flag set. When doing this, we get a callback from the phy that the link state has changed, which is handled in phy_link_state_changed() in eth_stm32_hal_common.c.

In most cases this leaves the board in a very strange state - link appears up (link LEDs are lit and the PHY reports that all is fine) and Ethernet transmission works fine, but reception is broken, e.g. broadcast packets flow out of the board, but it does not respond to pings or any other data sent to it.

Regression

  • This is a regression.

Steps to reproduce

I don't currently have access to a devkit, so haven't attempted reproducing it on one, but I believe this would be the steps required:

  1. Enable CONFIG_NET_SHELL and compile a networking sample that would normally respond to ping
  2. Boot the board connected to a device with auto-negotiate enabled and start a continuous ping
  3. Use the shell to force link speed to 10 Mbps half-duplex:
  4. Observe that the link comes back up, but that the board stops responding to ping.
  5. Somehow rewrite ETH->MACCR with the same value it already has, either with a debugger, custom shell command or similar:
    volatile uint32_t* r = (uint32_t *) 0x40028000;
    *r = *r;
  1. See that the ping continues.

It seems that occasionally replugging the network cable is enough to continue the transmission, but in most cases the register rewrite seems required. I'm also not sure if this is an issue only with the TI DP83826 PHY, or if it also happens with other PHYs.

Relevant log output

// Modified debug output to dump MACCR in eth_stm32_hal_start() and _stop()
[00:00:04.051,000] <dbg> phy_mii: update_link_state: PHY (1) Starting MII PHY auto-negotiate sequence
[00:00:04.151,000] <dbg> phy_mii: check_autonegotiation_completion: PHY (1) auto-negotiate sequence completed
[00:00:04.151,000] <inf> phy_mii: PHY (1) Link speed 100 Mb, full duplex
[00:00:04.151,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200ce00
[00:00:04.151,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 0200ce00
[00:00:04.151,000] <inf> eth_stm32_hal: Set MAC config 16384, 2048
[00:00:04.154,000] <inf> eth_stm32_hal: Starting ETH HAL driver, 0200ce00
[00:00:04.157,000] <inf> eth_stm32_hal: Started ETH HAL driver, 0200ce0c
// After boot, ping works
uart:~$ net iface set_link 1 no-autoneg 10 h
[00:00:16.025,000] <inf> phy_mii: PHY (1) is down
[00:00:16.025,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200ce0c
[00:00:16.028,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 0200ce00
[00:00:41.033,000] <inf> phy_mii: PHY (1) Link speed 10 Mb, half duplex
[00:00:41.033,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200ce00
[00:00:41.033,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 0200ce00
[00:00:41.033,000] <inf> eth_stm32_hal: Set MAC config 0, 0
[00:00:41.035,000] <inf> eth_stm32_hal: Starting ETH HAL driver, 02008600
[00:00:41.039,000] <inf> eth_stm32_hal: Started ETH HAL driver, 0200860c
// Link comes up again, UDP broadcast works, but no ping responses
[00:00:57.042,000] <inf> phy_mii: PHY (1) is down
[00:00:57.042,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200860c
[00:00:57.045,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:01:10.048,000] <inf> phy_mii: PHY (1) Link speed 10 Mb, half duplex
[00:01:10.048,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 02008600
[00:01:10.048,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:01:10.048,000] <inf> eth_stm32_hal: Set MAC config 0, 0
[00:01:10.050,000] <inf> eth_stm32_hal: Starting ETH HAL driver, 02008600
[00:01:10.053,000] <inf> eth_stm32_hal: Started ETH HAL driver, 0200860c
// Replugging network cable - didn't help
[00:01:28.057,000] <inf> phy_mii: PHY (1) is down
[00:01:28.057,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200860c
[00:01:28.060,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:01:36.562,000] <inf> phy_mii: PHY (1) Link speed 10 Mb, half duplex
[00:01:36.562,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 02008600
[00:01:36.562,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:01:36.562,000] <inf> eth_stm32_hal: Set MAC config 0, 0
[00:01:36.564,000] <inf> eth_stm32_hal: Starting ETH HAL driver, 02008600
[00:01:36.568,000] <inf> eth_stm32_hal: Started ETH HAL driver, 0200860c
// Replugging network cable again - didn't help
[00:01:59.072,000] <inf> phy_mii: PHY (1) is down
[00:01:59.072,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 0200860c
[00:01:59.076,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:02:13.078,000] <inf> phy_mii: PHY (1) Link speed 10 Mb, half duplex
[00:02:13.078,000] <inf> eth_stm32_hal: Stopping ETH HAL driver, 02008600
[00:02:13.078,000] <inf> eth_stm32_hal: Stopped ETH HAL driver, 02008600
[00:02:13.078,000] <inf> eth_stm32_hal: Set MAC config 0, 0
[00:02:13.081,000] <inf> eth_stm32_hal: Starting ETH HAL driver, 02008600
[00:02:13.084,000] <inf> eth_stm32_hal: Started ETH HAL driver, 0200860c
// Reading and rewriting MACCR with a custom shell command
uart:~$ mem read 32 0x40028000
0x40028000: 0x0200860c
uart:~$ mem write 32 0x40028000 0x200860c
Wrote 0x200860c to 0x40028000
// At this point ping resumes
uart:~$

Impact

Major – Severely degrades functionality; workaround is difficult or unavailable.

Environment

  • OS: Windows (but likely irrelevant)
  • Toolchain: Zephyr SDK 0.16.5 (but likely irrelevant)
  • Zephyr commit: b08ea66 (v4.3-branch)

Additional Context

After lots of debugging and digging, it seems this is caused by a bug in the HAL. The errata for the STM32F427/429 series describes the following issue:
Image
(from this)

This is not included in the errata for F7, but both the F4 and F7 HALs have workarounds for this in their HAL_ETH_Start() functions.

Unfortunately, that same workaround is missing after the write to the RE bit in HAL_ETH_Start_IT(), which is what is used as long as CONFIG_ETH_STM32_HAL_API_V2 is enabled (which is default):
Image
(from this file)

Looking at the upstream HAL, it seems that this was correct at a point in time:
Image
(from here)

But upstream master is currently wrong.

For now, we work around this issue with a west patch like this (changed both F4 and F7, even though we've only tested on F7):

diff --git a/stm32cube/stm32f4xx/drivers/src/stm32f4xx_hal_eth.c b/stm32cube/stm32f4xx/drivers/src/stm32f4xx_hal_eth.c
index 8530e810..090f6336 100644
--- a/stm32cube/stm32f4xx/drivers/src/stm32f4xx_hal_eth.c
+++ b/stm32cube/stm32f4xx/drivers/src/stm32f4xx_hal_eth.c
@@ -804,15 +804,15 @@ HAL_StatusTypeDef HAL_ETH_Start_IT(ETH_HandleTypeDef *heth)
     /* Enable the MAC transmission */
     SET_BIT(heth->Instance->MACCR, ETH_MACCR_TE);
 
+    /* Enable the MAC reception */
+    SET_BIT(heth->Instance->MACCR, ETH_MACCR_RE);
+
     /* Wait until the write operation will be taken into account :
     at least four TX_CLK/RX_CLK clock cycles */
     tmpreg1 = (heth->Instance)->MACCR;
     HAL_Delay(ETH_REG_WRITE_DELAY);
     (heth->Instance)->MACCR = tmpreg1;
 
-    /* Enable the MAC reception */
-    SET_BIT(heth->Instance->MACCR, ETH_MACCR_RE);
-
     /* Enable ETH DMA interrupts:
     - Tx complete interrupt
     - Rx complete interrupt
diff --git a/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_eth.c b/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_eth.c
index 73f72d1e..808143b5 100644
--- a/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_eth.c
+++ b/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_eth.c
@@ -804,15 +804,15 @@ HAL_StatusTypeDef HAL_ETH_Start_IT(ETH_HandleTypeDef *heth)
     /* Enable the MAC transmission */
     SET_BIT(heth->Instance->MACCR, ETH_MACCR_TE);
 
+    /* Enable the MAC reception */
+    SET_BIT(heth->Instance->MACCR, ETH_MACCR_RE);
+
     /* Wait until the write operation will be taken into account :
     at least four TX_CLK/RX_CLK clock cycles */
     tmpreg1 = (heth->Instance)->MACCR;
     HAL_Delay(ETH_REG_WRITE_DELAY);
     (heth->Instance)->MACCR = tmpreg1;
 
-    /* Enable the MAC reception */
-    SET_BIT(heth->Instance->MACCR, ETH_MACCR_RE);
-
     /* Enable ETH DMA interrupts:
     - Tx complete interrupt
     - Rx complete interrupt

(According to the suggested workaround in the errata, there should be no need to wait twice, and waiting just once seems to work in practice, but if some ST employee knows otherwise, I'm all ears.)

It would be good to have this fixed properly, but I'm a bit unsure what steps I should take. Should I create a PR against zephyrproject-rtos/hal_stm32 or against STMicroelectronics/stm32f7xx-hal-driver?

(It also seems as if this should have been mentioned in the F7 errata, but I suppose this needs to be reported through an ST support request.)

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions