fix: refuse to start a second lemond when the port is already in use (#2255) by siavashhub · Pull Request #2258 · lemonade-sdk/lemonade

siavashhub · 2026-06-15T21:21:30Z

Summary

Starting a second lemond on a port that's already in use (e.g. 13305) used to silently succeed. The second instance now detects the conflict, prints error and exits with a non-zero code.

Root cause

The front HTTP listeners inherited cpp-httplib's default_socket_options, which enable address/port reuse:

Linux/macOS → SO_REUSEPORT: multiple sockets may bind the same host:port and the kernel load-balances connections across them.
Windows → SO_REUSEADDR: the second socket can hijack the port.

Because the bind appeared to succeed, the existing duplicate-instance guard in the run loop never fired and the process kept running as a confusing second listener.

Changes

Server (core fix)

Exclusive bind: A duplicate bind now fails with EADDRINUSE.
Early, readable failure: Added a preflight bind probe at the top of Server::run() that detects an in-use port before the WebSocket server starts.
Fast exit: On a port conflict, will write to stderr and exit immediately via std::_Exit(1) (gated by a new startup_failed() flag), skipping the multi-second destructor teardown so the error stays as the last visible line and the process returns a non-zero exit code for scripting.

Files: src/cpp/server/server.cpp, src/cpp/include/lemon/server.h, src/cpp/server/main.cpp.

CI pipeline (required by the stricter bind)

Making lemond refuse an in-use port means the non-containerized self-hosted jobs could now hard-fail: they share the host network, so a concurrent job may already hold the default port 13305.

.github/actions/install-lemonade-deb: new optional port input (default 13305, or auto). In auto mode it picks a free loopback port, retries on a fresh port if it loses the pick→bind race, and exports LEMONADE_PORT / LEMONADE_TEST_PORT so the downstream test steps and the lemonade CLI target the resolved port.
.github/workflows/cpp_server_build_test_release.yml: the self-hosted jobs now pass port: auto to the action. (The containerized test-cli-endpoints-linux job keeps the default port — it has network isolation.)

Tests (required by the stricter bind)

test/test_llamacpp_system_backend.py repeatedly stops and restarts lemond. With the stricter bind, two latent problems became real failures:

Fresh port per instance To allow immediate rebind when lemond shuts down
Mock llama-server handles --version Setting the system backend makes lemond run llama-server --version to read its version. The test's fake llama-server didn't handle --version and instead ran forever, so that call hung. It now prints a version and exits, like the real binary. (This only started failing once the fresh-port change above stopped accidentally hiding it.)
Pull targets the live port One test helper was still pointing at the old fixed port; it now uses the current instance's port.

Behavior

Before:
the second instance kept running and competed for requests.

After:

[Server] ERROR: Port 13305 on 127.0.0.1 is already in use. Another Lemonade server (lemond) is likely already running on this port. This instance will now exit.

(The reported IP is whichever of the resolved IPv4/IPv6 loopback addresses is in use.)

Testing

Built lemond on Windows (MSVC).
Started instance 1 on port 13305 (confirmed listening).
Started instance 2 on the same port → prints the error above as the final
line and exits with code 1; instance 1 keeps serving uninterrupted.

…ns, allows TIME_WAIT rebinds for faster restart which will fix the ci tests

…o fix/2255

The exclusive socket-options override and the over-aggressive bind-failure handling broke startup on dual-stack hosts (e.g. macOS CI), which surfaced as backend subprocesses like whisper-server "failing to start or become ready" even though the main server briefly reported reachable. Two issues: 1. The real listeners dropped SO_REUSEPORT via exclusive_socket_options. Because httplib binds IPv6 with IPV6_V6ONLY=0, the IPv6 wildcard "::" overlaps the already-bound IPv4 wildcard "0.0.0.0", and only SO_REUSEPORT lets the two coexist. Restore httplibs default socket options on the real listeners and rely solely on the preflight port_is_available() probe for duplicate-instance detection. 2. The run loop treated ANY listener bind failure as fatal and called stop(), so on a dual-stack host an IPv6 bind failure would tear down a working IPv4 listener right after the readiness check passed. Only treat a TOTAL failure (no family bound) as fatal; a partial failure now keeps serving on the family that bound successfully, matching the original behavior.

…o fix/2255

…lemond

…test

fl0rianr

Thanks, this looks like the right direction for the sequential duplicate-start case from #2255.

One concern: the new check is a preflight bind probe, but the probe socket is closed before the real httplib listeners bind, and the real listeners intentionally keep the default reuse options. That means two lemond processes started at nearly the same time can both pass the preflight and then both bind via the real listener path. If the intended guarantee is only “second process started after the first is already listening”, this is probably fine, but it may be worth documenting the remaining TOCTOU window or adding a Lemonade-specific per-host/port lock.

I’d also strongly suggest adding a direct regression test for #2255: start one lemond on a free port, attempt a second lemond on the same port, assert non-zero exit + expected stderr, and assert the first server is still healthy.

Thanks for the hard work iterating with CI, the remaining error is unrelated and will be fixed once #2303 is merged.

siavashhub · 2026-06-18T16:13:43Z

I’d also strongly suggest adding a direct regression test for #2255: start one lemond on a free port, attempt a second lemond on the same port, assert non-zero exit + expected stderr, and assert the first server is still healthy.

Thanks for the hard work iterating with CI, the remaining error is unrelated and will be fixed once #2303 is merged.

No problem.
Don't have access to my pc today, I will add the test tomorrow or on the weekend.

…lemond

…o fix/2255

siavashhub · 2026-06-20T01:26:01Z

added regression test as well

fl0rianr

LGTM — thanks for adding the direct duplicate-port regression test.

The implementation now covers the reported #2255 scenario: an existing lemond is already listening, a second lemond on the same port exits non-zero with a clear error, and the original server remains healthy. The added endpoint test exercises exactly that behavior.

Assuming the remaining CI failures are unrelated, this looks good to merge.

fix: refuse to start a second lemond on an in-use port

9df338a

github-actions Bot added bug Something isn't working cpp labels Jun 15, 2026

siavashhub added 15 commits June 16, 2026 11:44

Merge branch 'main' into fix/2255

8ab2c45

fix: make port preflight probe match the real listener's socket optio…

429e36d

…ns, allows TIME_WAIT rebinds for faster restart which will fix the ci tests

Merge branch 'fix/2255' of https://github.com/siavashhub/lemonade int…

9010753

…o fix/2255

Merge branch 'main' into fix/2255

1eaffda

refactor: simplify in-use-port detection in server startup

8d5b385

Merge branch 'fix/2255' of https://github.com/siavashhub/lemonade int…

79cc8c6

…o fix/2255

Merge branch 'main' into fix/2255

853a139

Merge branch 'main' into fix/2255

731c980

fix: updating ci to free the port before starting to remove duplicates

b991626

fix: updating ci to free the port more authoratively

81cc719

ci: start lemond on a job-unique port for non-containerized .deb tests

c156695

Merge branch 'lemonade-sdk:main' into fix/2255

64d5c91

test: make llamacpp-system server start robust to stop/start race

75be6af

Merge branch 'fix/2255' of https://github.com/siavashhub/lemonade int…

f2ee3c6

…o fix/2255

siavashhub self-assigned this Jun 17, 2026

siavashhub added 5 commits June 17, 2026 14:35

test(llamacpp-system): wait for IPv6 port release before relaunching …

0553c07

…lemond

test: use fresh port per lemond instance to fix flaky system-backend …

ed14ea4

…test

Merge branch 'main' into fix/2255

f9f52a8

fix(test): stop llamacpp-system test_006-008 hanging on /internal/set

19c0d84

Merge branch 'main' into fix/2255

75c9836

fl0rianr reviewed Jun 18, 2026

View reviewed changes

siavashhub requested a review from jeremyfowers June 18, 2026 16:10

siavashhub and others added 3 commits June 19, 2026 19:46

Merge branch 'main' into fix/2255

5e2965b

test(server_endpoints): reformat

40319e4

test(server_endpoints): add duplicate-port guard regression test for …

d7b247d

…lemond

Merge branch 'fix/2255' of https://github.com/siavashhub/lemonade int…

8501655

…o fix/2255

fl0rianr approved these changes Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refuse to start a second lemond when the port is already in use (#2255)#2258

fix: refuse to start a second lemond when the port is already in use (#2255)#2258
siavashhub wants to merge 25 commits into
lemonade-sdk:mainfrom
siavashhub:fix/2255

siavashhub commented Jun 15, 2026 •

edited

Loading

Uh oh!

fl0rianr left a comment

Uh oh!

siavashhub commented Jun 18, 2026

Uh oh!

siavashhub commented Jun 20, 2026

Uh oh!

fl0rianr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

siavashhub commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Server (core fix)

CI pipeline (required by the stricter bind)

Tests (required by the stricter bind)

Behavior

Testing

Uh oh!

fl0rianr left a comment

Choose a reason for hiding this comment

Uh oh!

siavashhub commented Jun 18, 2026

Uh oh!

siavashhub commented Jun 20, 2026

Uh oh!

fl0rianr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

siavashhub commented Jun 15, 2026 •

edited

Loading