Skip to content

[BUG] RP2350 ostest stuck when SMP enabled #16133

Open
@avgoor

Description

@avgoor

Description / Steps to reproduce the issue

A RaspberryPi's PICO 2W board with a RP2350 MCU is experiencing weird hangs in ostest during the nested signals testing when SMP is enabled. It loops forever inside the spin_lock_notrace function.
The traces I managed to collect:

info threads
Index Tid  Pid  Cpu  Thread                Info                                                                             Frame
 0    0    0    0 '\000' Thread 0x20003d08     (Name: CPU0 IDLE, State: Assigned, Priority: 0, Stack: 1008) 0x100105b2  up_idle() at chip/rp23xx_idle.c:94
 1    1    0    1 '\001' Thread 0x20003dd8     (Name: CPU1 IDLE, State: Assigned, Priority: 0, Stack: 1008) 0x100105b2  up_idle() at chip/rp23xx_idle.c:94
 2    2    2    0 '\000' Thread 0x20006698     (Name: nsh_main, State: Waiting,Semaphore, Priority: 100, Stack: 2008) 0x10005086        nxsem_wait_slow() at semaphore/sem_wait.c:207
 12   12   12   0 '\000' Thread 0x20007880     (Name: ostest, State: Waiting,Semaphore, Priority: 100, Stack: 2016) 0x10005086  nxsem_wait_slow() at semaphore/sem_wait.c:207
 13   13   13   1 '\001' Thread 0x200084e8     (Name: ostest, State: Assigned, Priority: 100, Stack: 8120)  No symbol with pc
*14   62   13   1 '\001' Thread 0x2000a9d8     (Name: ostest, State: Running, Priority: 101, Stack: 8176)   0x1000290c  enter_critical_section_wo_note() at include/nuttx/spinlock.h:199
*15   63   13   0 '\000' Thread 0x2000aab8     (Name: ostest, State: Running, Priority: 102, Stack: 8176)   0x1000290c  enter_critical_section_wo_note() at include/nuttx/spinlock.h:199

bt
#0  0x1000290c in spin_lock_notrace (lock=0x200040c8 <g_cpu_irqlock> "\001") at include/nuttx/spinlock.h:199
#1  enter_critical_section_wo_note () at irq/irq_csection.c:183
#2  0x1000c754 in uart_xmitchars (dev=0x2000121c <g_uart0port>) at serial/serial_io.c:62
#3  0x10000e54 in up_interrupt (irq=49, context=0x0, arg=0x2000121c <g_uart0port>) at chip/rp23xx_serial.c:617
#4  0x10002836 in irq_dispatch (irq=49, context=0x0) at irq/irq_dispatch.c:144
#5  0x10001b64 in exception_direct () at armv8-m/arm_doirq.c:62
#6  <signal handler called>
#7  spin_lock_notrace (lock=0x200040c8 <g_cpu_irqlock> "\001") at include/nuttx/spinlock.h:199
#8  enter_critical_section_wo_note () at irq/irq_csection.c:234
#9  0x10005376 in nxsig_deliver (stcb=0x2000aab8) at signal/sig_deliver.c:178
#10 0x10001e9e in arm_sigdeliver () at armv8-m/arm_sigdeliver.c:107
#11 0x10005fb8 in nxsched_remove_self (tcb=0x40) at sched/sched_removereadytorun.c:280
#12 0x00000000 in ?? ()

list
194     {
195     #ifdef CONFIG_TICKET_SPINLOCK
196       int ticket = atomic_fetch_add(&lock->next, 1);
197       while (atomic_read(&lock->owner) != ticket)
198     #else /* CONFIG_TICKET_SPINLOCK */
199       while (up_testset(lock) == SP_LOCKED)
200     #endif
201         {
202           UP_DSB();
203           UP_WFE();

info args
lock = 0x200040c8 <g_cpu_irqlock> "\001"

Additional facts:

  • console to the board is connected via UART0
  • the issue reproduces 100% of times when running the ostest utility
  • the smp utility runs without problems, no issues found
  • the issue does not reproduce on the older RP2040 MCU (different ARM cores)
  • the issue does not reproduce when CONFIG_SMP_NCPUS=1 but SMP is enabled
  • the issue reproduces even when RP23XX_TESTSET_SPINLOCK is changed from 0 to 31 (see the RP2350-E2 erratum)
  • the issue reproduces with today's master

The output of the ostest utility often times is partially cut off:

...
user_main: nested signal handler test
signest_test: Starting signal waiter task at priority 101
signest_test: Started waiter_main pid=62
waiter_main: Waiter started
signest_test: Starting interfering task at priority 102
waiter_main: Setting signal mask
interfere_main: Waiting on semaphore
waiter_main: Registering signal handler
signest_test: Started interfere_main pid=63
waiter_main: Waiting on semaphore
signest_test: Simple case:
  Total signalled

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

ArchLinux, Debian

NuttX Version

master

Issue Architecture

[Arch: arm]

Issue Area

[Area: Kernel]

Host information

I use two build environments, in both the issue is reproducing 100% of times.
1: x86_64 PC with ArchLinux and the arm-none-eabi-* embedded toolchain
2: aarch64 VM with Debian and the arm-none-eabi-* embedded toolchain

Verification

  • I have verified before submitting the report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arch: armIssues related to ARM (32-bit) architectureArea: KernelKernel issuesOS: LinuxIssues related to Linux (building system, etc)Type: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions