Skip to content

Conversation

@vortigont
Copy link

use of lwip core locking is preferable (if enabled globally) then using context switching callbacks

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes TCP operations for systems with CONFIG_LWIP_TCPIP_CORE_LOCKING enabled by implementing direct LwIP core locking instead of context switching through tcpip_api_call. The change aims to improve performance by reducing overhead when core locking is available.

Key changes:

  • Updated the CONFIG_LWIP_TCPIP_CORE_LOCKING macro check to verify both definition and value
  • Added direct implementations for TCP operations (_tcp_output, _tcp_write, _tcp_recved, _tcp_close, _tcp_abort, _tcp_connect, _tcp_bind, _tcp_listen_with_backlog) using tcp_core_guard when core locking is enabled
  • Changed function linkage from static to inline for wrapper functions to enable inlining optimizations
  • Marked context-switching callback functions with [[maybe_unused]] since they're only called conditionally
  • Minor formatting improvements to comment blocks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// https://github.com/espressif/arduino-esp32/issues/10526
namespace {
#ifdef CONFIG_LWIP_TCPIP_CORE_LOCKING
#if defined CONFIG_LWIP_TCPIP_CORE_LOCKING && CONFIG_LWIP_TCPIP_CORE_LOCKING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it to avoid using that if -DCONFIG_LWIP_TCPIP_CORE_LOCKING=0 is set ?
If so, should it be instead more clear and be: #if CONFIG_LWIP_TCPIP_CORE_LOCKING == 1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not found any guidance if such a case could be used when CONFIG_LWIP_TCPIP_CORE_LOCKING=0,
in IDF examples I found only when it's defined as 1, but checks are exist to ensure it's true, quite confusing.
So, yes, it's to check it that it's not just defined, but defined as true. You think == 1 would be more appropriate here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would stick with what Arduino Core and ESP-IDF do. They would fail way before than us. And in their code they only use:

#ifdef CONFIG_LWIP_TCPIP_CORE_LOCKING and in esp-idf: #if CONFIG_LWIP_TCPIP_CORE_LOCKING

So maybe just #if CONFIG_LWIP_TCPIP_CORE_LOCKING ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #if CONFIG_LWIP_TCPIP_CORE_LOCKING will fail on non esp32 platforms, this CONFIG_ part belongs to IDF.
Maybe check LWIP_TCPIP_CORE_LOCKING itself? I do not have Libretiny's at hand, but will try to check what's there in those SDK's. Any clue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have Libretiny's at hand,

neither I...

So maybe it is better to leave it like you did to handle all the combinations...

Or we do like arduino: ifdef CONFIG_LWIP_TCPIP_CORE_LOCKING, but this opens the door to unexpected behavior like you mentioned with CONFIG_LWIP_TCPIP_CORE_LOCKING=0

Probably better to stick with what you did...

@mathieucarbou
Copy link
Member

@vortigont : I think this change is a good idea.

Just adding that as a ref:

cpip_api_call()

Synchronously calls function in TCPIP thread and waits for its completion. It is recommended to use LWIP_TCPIP_CORE_LOCKING (preferred) or LWIP_NETCONN_SEM_PER_THREAD. If not, a semaphore is created and destroyed on every call which is usually an expensive/slow operation.

Comment on lines +581 to +589
#if defined CONFIG_LWIP_TCPIP_CORE_LOCKING && CONFIG_LWIP_TCPIP_CORE_LOCKING
tcp_core_guard tcg;
return tcp_output(*pcb);
#else // CONFIG_LWIP_TCPIP_CORE_LOCKING
tcp_api_call_t msg;
msg.pcb = pcb;
tcpip_api_call(_tcp_output_api, (struct tcpip_api_call_data *)&msg);
return msg.err;
#endif // CONFIG_LWIP_TCPIP_CORE_LOCKING
Copy link
Member

@mathieucarbou mathieucarbou Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vortigont @willmmiles : do you think we could go even further with this cleanup and only use LWIP_TCPIP_CORE_LOCKING and force it if not defined, which means we even remove the macros and always use the guard ? and remove the usage of tcpip_api_call ?

According to lwip doc, my understanding is that we have the choice...

What I do not know if what is happening if we force locking and do not go through the context switching API, and another lib which is using lwip also does not use global locking but instead go through tcpip_api_call.

Locking would probably still work on Arduino 2 I suppose (I could test) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Arduino lwip is pre-build, so I do not thinks we could enforce it for older core, but need to check, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Arduino lwip is pre-build, so I do not thinks we could enforce it for older core, but need to check, yes.

yes that was my point (pre-build)... LWIP_TCPIP_CORE_LOCKING was not used before so wondering what is happening if we force it whatever the platform, but we have arduino 2 internals (like sntp for ntp time or asyncudp lib) using tcpip_api_call.

AsyncTCP is also used with Libretiny also.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining LWIP_TCPIP_CORE_LOCKING affects compiling LwIP's own sources. We have to use whatever setting it was compiled with -- it's not a decision made on the outside.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying that up for me :-)

use of lwip core locking is preferable (if enabled globaly) then using context switching callbacks
Copy link

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this adds a pile of extra #ifdefs and duplication for no advantages. When LWIP_TCP_CORE_LOCKING is enabled, tcp_api_call() uses the lock and does not switch contexts. See https://github.com/lwip-tcpip/lwip/blob/4599f551dead9eac233b91c0b9ee5879f5d0620a/src/api/tcpip.c#L489

I would vote against this change.

@mathieucarbou
Copy link
Member

I think this adds a pile of extra #ifdefs and duplication for no advantages. When LWIP_TCP_CORE_LOCKING is enabled, tcp_api_call() uses the lock and does not switch contexts. See https://github.com/lwip-tcpip/lwip/blob/4599f551dead9eac233b91c0b9ee5879f5d0620a/src/api/tcpip.c#L489

I would vote against this change.

Thanks Will for pointing at the code. Like we say, the code never lies ;-)

I had a wrong understanding with the lwip API I posted in my comment above (#99 (comment)). My understanding was that if we do not use LWIP_TCPIP_CORE_LOCKING, a semaphore is used.

My understanding was that we had the choice, but as you pointed out, we have to stick with how lwip was compiled so we cannot forcibly use LWIP_TCPIP_CORE_LOCKING (which was my thinking at first - that we had a way to bypass this semaphore cost).

Thanks for the clarification!

@vortigont : so maybe we can close your PR ?

@willmmiles
Copy link

I had a wrong understanding with the lwip API I posted in my comment above (#99 (comment)). My understanding was that if we do not use LWIP_TCPIP_CORE_LOCKING, a semaphore is used.

Well, that is more-or-less correct -- if LWIP_TCPIP_CORE_LOCKING is NOT defined, tcp_api_call() uses a mailbox semaphore to pass the callback to LwIP's own execution context (on FreeRTOS, its task), and wait for the result. The design intent is that no other task is permitted to call LwIP functions, so the core itself does not have to deal with any internal locking. This can be a performance improvement on non-preemptive (or just non-) multitasking systems -- or also if a given program doesn't expect to do many calls in other contexts.

If LWIP_TCPIP_CORE_LOCKING IS defined, then a mutex is available for protecting LwIP's internals; so tcp_api_call() becomes a wrapper that acquires the mutex and immediately calls your function without a context switch. (This is also a particularly helpful API for C code, where there's no easy way to have a scoped mutex guard; it's up to the programmer to ensure that every exit path releases the lock.)

I do agree that - given that one of AsyncTCP's key purposes is to isolate user callbacks from the LwIP task - from a practical perspective, LWIP_TCPIP_CORE_LOCKING will almost always be a performance improvement for us. We can definitely recommend it in the documentation.

Lastly: it is technically true that marshalling the call state in to a struct on the stack so as to make a tcp_api_call() is extra work for the CPU given LWIP_TCPIP_CORE_LOCKING; there's potentially a tiny performance advantage to managing the mutex ourselves. From a practical perspective, I think the maintainence burden of duplicating the call code is worse than that cost. With that in mind, given that LWIP_TCPIP_CORE_LOCKING should generally offer better performance, we could organize the code to go the other way around: let the LWIP_TCPIP_CORE_LOCKING implementation be the "main" implementation, and pass through tcp_api_call() only when required. This could maybe be done with some macros and lambda magic.

#ifdef LWIP_TCPIP_CORE_LOCKING

#define LWIP_LOCKED_SECTION_BEGIN esp_err_t err = ERR_OK;  tcp_core_guard tcg; {
#define LWIP_LOCKED_SECTION_END }; return err;

#else

template<typename T> void call_through_pointer(T* p) { return p->operator()(); };

#define LWIP_LOCKED_SECTION_BEGIN esp_err_t err  = ERR_OK; auto f_lambda = [=,&err](){
#define LWIP_LOCKED_SECTION_END }; tcpip_api_call(&call_through_pointer<decltype(f_lambda)>, (void*) &f_lambda ); return err;

#endif

esp_err_t _tcp_close(tcp_pcb **pcb, AsyncClient *client) {
    LWIP_LOCKED_SECTION_BEGIN 
    if (pcb && *pcb) {
      _reset_tcp_callbacks(*pcb, client);
      if (tcp_close(*pcb) != ERR_OK) {
        // We do not permit failure here: abandon the pcb anyways.
        tcp_abort(*pcb);
        err = ERR_ABRT;   // if we called tcp_abort here, then we must return ERR_ABRT
      }
      *pcb = nullptr;     // PCB is now the property of LwIP
    } else {
      // Ensure there is not an error event queued for this client
      err = _remove_events_for_client(client) ? ERR_OK : ERR_CONN;
    }
    LWIP_LOCKED_SECTION_END
}

(Note: totally untested and the 'err' logic is messy...)

@vortigont
Copy link
Author

thanks @mathieucarbou @willmmiles for your feedback, I missed that shortcut in LWIP. I think that's why it is advised to use locks directly. In that case let's stick to the old callback API, though it looks very confusing and cumbersome to prepare for a callback that is in fact executed immediately.
Let's leave those macro tricks for later exercise. Personally I do not like it much, it looks cool but it drives intellisense and syntax highlighter crazy and are more way harder to read then #ifdefs.

@vortigont vortigont closed this Nov 6, 2025
@willmmiles
Copy link

though it looks very confusing and cumbersome to prepare for a callback that is in fact executed immediately.
Let's leave those macro tricks for later exercise. Personally I do not like it much, it looks cool but it drives intellisense and syntax highlighter crazy and are more way harder to read then #ifdefs.

Totally agree that the extra marshalling for the callback API is cumbersome and a waste. That said, personally I'll take macro tricks over code copy-and-paste 9 times out of 10, especially when there's conditional compilation involved -- I've been burned too many times by forgetting to make some change the other copy that's not my active test case. Double especially for a concurrency correctness cases like this code, where being wrong will still "work" 99.9% of the time and then randomly explode after hours of uptime.

That said: I'm sure there's a cleaner way to wrap up that code so that it doesn't confuse the analysis engines quite as much; I only spent a few minutes trying to concoct something. One approach would be to always wrap the lock scope portion in a lambda: a sufficiently smart compiler could unwrap and inline it for the lock case. (Though that's definitely a check-with-godbolt problem...)

#ifdef LWIP_TCPIP_CORE_LOCKING

template<typename T>
static inline esp_err_t call_with_lwip_lock(T& func) {  tcp_core_guard tcg; return func(); }

#else

template<typename T> void call_through_pointer(T* p) { return p->operator()(); };
template<typename T>
static inline esp_err_t call_with_lwip_lock(T& func) {  
    esp_err_t err = ERR_OK;
    auto outer_lambda = [&]() { err = func(); };
    tcp_api_call(&call_through_pointer<decltype(outer_lambda)>, &outer_lambda);
    return err;
}

#endif

esp_err_t _tcp_close(tcp_pcb **pcb, AsyncClient *client) {
    auto do_close = [=]() {
        if (pcb && *pcb) {
            esp_err_t err = ERR_OK;
          _reset_tcp_callbacks(*pcb, client);
          if (tcp_close(*pcb) != ERR_OK) {
            // We do not permit failure here: abandon the pcb anyways.
            tcp_abort(*pcb);
            err = ERR_ABRT;   // if we called tcp_abort here, then we must return ERR_ABRT
          }
          *pcb = nullptr;     // PCB is now the property of LwIP
          return err;
        } else {
          // Ensure there is not an error event queued for this client
          return _remove_events_for_client(client) ? ERR_OK : ERR_CONN;
        }
    };

    return call_with_lwip_lock(do_close);
}

@vortigont
Copy link
Author

vortigont commented Nov 7, 2025

One approach would be to always wrap the lock scope portion in a lambda: a sufficiently smart compiler could unwrap and inline it for the lock case. (Though that's definitely a check-with-godbolt problem...)

yeah, sounds reasonable! I'll try to invest some time on this a bit later, we'll definitely find some feasible way!
Now I'm working on half-closed connections so will do this "old way" first then get back to this locking enhancement.
Thanks for your valuable hints, @willmmiles!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants