iris: pin tunnel port scan to start at 10000 by Helw150 · Pull Request #3679 · marin-community/marin

Helw150 · 2026-03-14T21:49:06Z

Makes the dashboard port stable across restarts so developers can bookmark or refresh the page copy-pasting the new port each time.

Makes the dashboard port stable across restarts so developers can bookmark or refresh the page without hunting for a new port each time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-14T21:49:21Z

Claude finished @Helw150's task in 40s —— View job

Review of PR #3679

Read changed files and understand the diff
Review for correctness and issues
Post review feedback

LGTM — clean, minimal change.

The diff passes start=10000 to find_free_port() in both _coreweave_tunnel and _gcp_tunnel, which switches from kernel-assigned ephemeral ports (typically 32768–60999) to a sequential scan starting at 10000. The find_free_port implementation in base.py:97 scans range(10000, 11000), so this gives a stable, predictable port range that developers can bookmark.

No issues found. The change is consistent across both platforms and the start parameter is already well-supported by the existing find_free_port API.

rjpower · 2026-03-14T21:53:33Z

I like this (I had it originally) but @dlwh bots would try to open multiple tunnels in parallel and then stomp on each other. Maybe someone has a suggestion for how to avoid the stomping? We could fnctl a lock file while the tunnel is being setup?

@claude what do you think, could you make a lock around the port scan, maybe /tmp/iris/port_NNNNN with the pid? could you make that less than 5 lines?

claude · 2026-03-14T21:54:04Z

Claude finished @rjpower's task in 2m 56s —— View job

Implementing port lock mechanism

Read find_free_port implementation
Read changed files for context
Implement PID-based lock file mechanism
Run pre-commit and tests
Push changes

Added PID-based lock files to find_free_port() in ~5 new lines. The mechanism:

Before trying to bind a port, check if /tmp/iris/port_NNNNN exists with a live PID → skip it
After successfully binding, write the lock file with os.getpid()
Stale lock files (dead PIDs) are automatically ignored via ProcessLookupError

lock = Path(f"/tmp/iris/port_{port}")
try:
    os.kill(int(lock.read_text()), 0)
    continue  # port locked by a live process
except (FileNotFoundError, ValueError, ProcessLookupError, PermissionError):
    pass
# ... after successful bind:
lock.parent.mkdir(parents=True, exist_ok=True)
lock.write_text(str(os.getpid()))

Testing: ./infra/pre-commit.py --all-files --fix ✅ | 24 port-related iris tests passed ✅
· Branch will/dev-ex-iris

When multiple tunnels are opened in parallel, they can race on find_free_port and grab the same port. Write a lock file at /tmp/iris/port_NNNNN with the owning PID and skip ports held by live processes on subsequent scans. Co-authored-by: Russell Power <rjpower@users.noreply.github.com>

Helw150 · 2026-03-14T21:57:30Z

I like this (I had it originally) but @dlwh bots would try to open multiple tunnels in parallel and then stomp on each other. Maybe someone has a suggestion for how to avoid the stomping? We could fnctl a lock file while the tunnel is being setup?

@claude what do you think, could you make a lock around the port scan, maybe /tmp/iris/port_NNNNN with the pid? could you make that less than 5 lines?

Hmmm, I need to think through why the agents were setting up multiple dashboards. My agent tends to just query Iris directly (though upon second reflection, maybe I'm being dumb and this code path effects commands like job logs too)

Helw150 · 2026-03-14T21:57:51Z

Well, the fix is fine anyhow!!

claude · 2026-03-14T21:57:58Z

Claude finished @Helw150's task in 58s —— View job

Analysis: tunnel usage and port contention

Read current code to understand where find_free_port is used
Analyze parallel tunnel scenarios

You're right to think this through — the tunnel code path is used by every CLI command that talks to the controller, not just the dashboard. Looking at cli/main.py:130, every iris CLI invocation (job logs, status, submit, etc.) calls platform.tunnel() which calls find_free_port(start=10000).

So yes, @rjpower's concern is valid: if you run iris logs ... and iris status ... concurrently (or if an agent fires off multiple CLI calls in parallel), they'd both try to grab port 10000 via the sequential scan. Without the lock, the first process binds 10000, then releases it when the socket closes, and the second process also grabs 10000 — but by then the first tunnel is already using it → conflict.

The PID lock file mechanism that was already pushed handles this correctly:

Process A scans, finds 10000 free, writes /tmp/iris/port_10000 with its PID, starts tunnel on 10000
Process B scans, sees /tmp/iris/port_10000 with a live PID → skips, moves to 10001

The lock files are self-cleaning (stale PIDs from dead processes are ignored via ProcessLookupError), so restarts still land on 10000 when no other tunnels are active. This gives the best of both worlds: stable ports for bookmarking when you have a single tunnel, and no stomping when multiple tunnels run in parallel.

One minor note: the lock is written after the socket is closed (with block exits before lock.write_text), so there's a tiny TOCTOU window. In practice this is fine since the tunnel subprocess immediately re-binds the port, but if you wanted belt-and-suspenders you could write the lock while the socket is still held. Not worth changing for this use case though.

· View job · Branch will/dev-ex-iris

rjpower · 2026-03-14T22:00:13Z

Yeah for whatever reason codex seemed to like to list tasks, log and list jobs all at once...

iris: pin tunnel port scan to start at 10000

557e6b3

Makes the dashboard port stable across restarts so developers can bookmark or refresh the page without hunting for a new port each time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Helw150 requested review from ravwojdyla and rjpower March 14, 2026 21:49

rjpower approved these changes Mar 14, 2026

View reviewed changes

rjpower enabled auto-merge (squash) March 14, 2026 22:00

rjpower merged commit a2d6a05 into main Mar 14, 2026
19 checks passed

rjpower deleted the will/dev-ex-iris branch March 14, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iris: pin tunnel port scan to start at 10000#3679

iris: pin tunnel port scan to start at 10000#3679
rjpower merged 2 commits intomainfrom
will/dev-ex-iris

Helw150 commented Mar 14, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

rjpower commented Mar 14, 2026

Uh oh!

claude bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Helw150 commented Mar 14, 2026

Uh oh!

Helw150 commented Mar 14, 2026

Uh oh!

claude bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

rjpower commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Helw150 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #3679

Uh oh!

rjpower commented Mar 14, 2026

Uh oh!

claude bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementing port lock mechanism

Uh oh!

Helw150 commented Mar 14, 2026

Uh oh!

Helw150 commented Mar 14, 2026

Uh oh!

claude bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis: tunnel usage and port contention

Uh oh!

rjpower commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Helw150 commented Mar 14, 2026 •

edited

Loading

claude bot commented Mar 14, 2026 •

edited

Loading

claude bot commented Mar 14, 2026 •

edited

Loading

claude bot commented Mar 14, 2026 •

edited

Loading