[Bug] Crash of client during clean-up in the reconnection mechanism

### Describe the bug

We see two types of crashes (hardfaults) during reconnection.
1) Hardfault in the publishing thread.
```
>>> bt
#0  exception_common () at NuttX/nuttx/arch/arm/src/armv7-m/gnu/arm_exception.S:144
#1  <signal handler called>
#2  0x08121bdc in _z_vec_get (v=0x30002798, i=0) at zenoh-pico/src/collections/vec.c:114
#3  0x081370b4 in _z_iosli_vec_get (v=0x30002798, pos=0) at zenoh-pico/include/zenoh-pico/protocol/iobuf.h:68
#4  0x0813781e in _z_wbuf_get_iosli (wbf=0x30002798, idx=0) at zenoh-pico/src/protocol/iobuf.c:287
#5  0x08137978 in _z_wbuf_write (wbf=0x30002798, b=37 '%') at zenoh-pico/src/protocol/iobuf.c:384
#6  0x08135fbc in _z_transport_message_encode (wbf=0x30002798, msg=0x30010b7c) at zenoh-pico/src/protocol/codec/transport.c:587
#7  0x0812ac90 in _z_transport_tx_send_n_msg_inner (ztc=0x30002728, n_msg=0x30010d38, reliability=Z_RELIABILITY_RELIABLE, peers=0x0) at zenoh-pico/src/transport/common/tx.c:227
#8  0x0812ae6c in _z_transport_tx_send_n_msg (ztc=0x30002728, n_msg=0x30010d38, reliability=Z_RELIABILITY_RELIABLE, cong_ctrl=Z_CONGESTION_CONTROL_BLOCK, peers=0x0) at zenoh-pico/src/transport/common/tx.c:294
#9  0x0812b168 in _z_send_n_msg (zn=0x30002710, z_msg=0x30010d38, reliability=Z_RELIABILITY_RELIABLE, cong_ctrl=Z_CONGESTION_CONTROL_BLOCK, peer=0x0) at zenoh-pico/src/transport/common/tx.c:473
#10 0x08122c58 in _z_write (zn=0x30002710, keyexpr=..., payload=..., encoding=0x30010fd4, kind=Z_SAMPLE_KIND_PUT, cong_ctrl=Z_CONGESTION_CONTROL_BLOCK, priority=Z_PRIORITY_INTERACTIVE_HIGH, is_express=false, timestamp=0x0, attachment=..., reliability=Z_RELIABILITY_RELIABLE, source_info=0x0) at zenoh-pico/src/net/primitives.c:246
#11 0x0811fbb2 in z_publisher_put (pub=0x2407c480, payload=0x30011028, options=0x0) at zenoh-pico/src/api/api.c:1135
```
The hardfault happens because `_val == 0x0` and `v->_val[i];` is executed.

2) Hardfault in the read task.
```
>>> bt
exception_common@0x080202a8 (nuttx/arch/arm/src/armv7-m/gnu/arm_exception.S:144)
<signal handler called>@0xffffffe9 (Unknown Source:0)
file_socket@0x08036f14 (nuttx/fs/socket/socket.c:192)
sockfd_socket@0x08036f38 (nuttx/fs/socket/socket.c:209)
recvfrom@0x0803dc10 (nuttx/net/socket/recvfrom.c:207)
_z_read_udp_unicast@0x081315ac (zenoh-pico/src/system/unix/network.c:386)
_z_f_link_udp_read_socket@0x08133cf6 (zenoh-pico/src/link/unicast/udp.c:175)
_z_link_socket_recv_zbuf@0x08132e62 (zenoh-pico/src/link/link.c:169)
_z_unicast_client_read@0x0812f90e (zenoh-pico/src/transport/unicast/read.c:128)
_zp_unicast_read_task@0x0812f98a (zenoh-pico/src/transport/unicast/read.c:353)
pthread_startup@0x08030d8e (nuttx/libs/libc/pthread/pthread_create.c:59)
```
The hardfault happens because the inode was cleared (`f_inode == 0x0`) and `f_inode->i_flags` is executed.

### Workaround
For now we resolved hardfault 1) by synchronizing our publisher threads with `_z_common_transport_clear` 
and hardfault 2) by enforcing `pthread_join` for the read task instead of `pthread_detach` during reconnection clean-up.



### To reproduce

1. Publish and receive data on a zenoh pico client with `Z_FEATURE_AUTO_RECONNECT` enabled.
2. Disconnect from `zenohd` (e.g. by restarting the router process or unplugging the ethernet cable)
3. Repeat 2. until you trigger a hardfault.

**Note:**
This is not always reproducible. It roughly happens 1/15 times.
We run several publishing threads. 
The problem is easier to reproduce if the publishing threads have higher priority than the lease task.


### System info

- STM32H7
- Zenoh Pico (1.4.0) on NuttX (Unix)
- Configuration: Client mode with `Z_FEATURE_AUTO_RECONNECT` enabled in UDP unicast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Crash of client during clean-up in the reconnection mechanism #1033

Describe the bug

Workaround

To reproduce

System info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Crash of client during clean-up in the reconnection mechanism #1033

Description

Describe the bug

Workaround

To reproduce

System info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions