Tep Scheduled server: replace poll(2) O(N)/tick with epoll/kqueue + persistent registration

## Tep Scheduled server: replace poll(2) O(N)/tick with epoll/kqueue + persistent registration

The live-updates work (#41 / #44) needs a worker to hold a large number of long-lived WebSocket connections (each `turbo_stream_from` opens one). `Tep::Server::Scheduled` already has the **right architecture** for this — fiber-per-connection over a cooperative scheduler, prefork + `SO_REUSEPORT` for multicore — and it's the server the blog runs. The blocker to scale is the I/O multiplexer underneath it.

### The bottleneck

`Tep::Scheduler.poll_round` (`runtime/spinel/tep/scheduler.rb:123`) is poll(2)-shaped: **every tick** it calls `sphttp_poll_reset`, loops over *all* parked fibers re-adding each fd (`sphttp_poll_add` per fiber, :135), then `sphttp_poll_run`. That's **O(total connections) per scheduler pass**, regardless of how many are actually readable. This is the classic c10k wall: poll/select are O(N); epoll/kqueue are O(ready). Phoenix/BEAM, Go's netpoller, and AnyCable-Go all use epoll/kqueue with *persistent* registration.

As written, the per-tick pollset rebuild dominates somewhere in the low thousands of connections per worker — well short of the tens-to-hundreds of thousands ("AnyCable-Go class") this needs.

### Two secondary issues in the same path

- **Tail-only dead-slot reclamation** (`scheduler.rb:71-89`) is tuned for FIFO request lifecycles. A large WebSocket population closes in arbitrary order, leaving dead holes that every O(N) scan (`tick`, `poll_round`, `any_io_waiter`) still walks.
- **No preemption** (`scheduler.rb:32-35` — Spinel has no implicit-yield `Fiber::SchedulerInterface` hook; yields are explicit). A long synchronous handler — e.g. an in-process live re-render/diff — stalls every other connection on that worker until it yields. This is the price of running the render *in-process* (the AnyCable-split-collapse win) and wants bounded-work / yield points in heavy handlers.

### Proposed direction

- [ ] Add epoll (Linux) / kqueue (macOS/BSD) primitives to `sp_net`, exposed behind the existing `Sock.sphttp_poll_*` façade (`runtime/spinel/tep/net.rb:35-38`) so call sites are unchanged.
- [ ] Make registration **persistent**: `poll_round` should `EPOLL_CTL_ADD`/`DEL` on park/unpark, not reset-and-rebuild every tick. Per-pass cost drops from O(total) to O(ready).
- [ ] Replace tail-only reclamation with a stable-slot/free-list allocator so non-FIFO WS closes don't leave O(N) holes (note: `scheduler.rb` deliberately keeps slot indices stable for captures held across `Fiber.yield` — the replacement must preserve that).
- [ ] (Separate, smaller) bound per-fiber work / add yield points in heavy handlers for fairness.

### Scope note

Realistic target is **AnyCable-Go class** (tens-to-hundreds of thousands/node). BEAM-class millions-on-one-node is out of reach without per-process-heap GC isolation Ruby semantics don't provide — and isn't needed here. The model is already Phoenix-shaped; only the I/O multiplexer is c10k-era.

Refs: `runtime/spinel/tep/scheduler.rb`, `server_scheduled.rb`, `websocket/connection.rb` (one fiber/conn recv loop), `net.rb` (`sp_net_poll_*`). Background: the live-updates transport discussion on #44.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tep Scheduled server: replace poll(2) O(N)/tick with epoll/kqueue + persistent registration #52