websocket proxy: ping/pong downstream clients and proactively disconnect #264

haardikk21 · 2025-05-28T07:34:05Z

has the ws proxy send ping frames to downstream clients periodically. if a pong isn't received back within a specified timeout (10s), it disconnects the clients

previously, we disconnected clients if message sending failed - but that only works for disconnecting on one-way networking issues.

there are a variety of cases where clients can continue to receive packets but aren't actually (able to) process them, adding unnecessary overhead to this service. devices going to sleep, battery saving/low prio mode in browsers, processing getting blocked on client side due to slow code, etc can all cause the TCP buffer to stay open and receive packets but being unable to process the data

by pinging, we ensure that clients are not just receiving messages but also actively processing them by needing them to respond with a pong

vercel · 2025-05-28T07:34:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
rollup-boost	⬜️ Ignored (Inspect)	Visit Preview		May 28, 2025 7:34am

angel-ding-cb · 2025-05-28T16:13:08Z

I am not a rust expert and I am not familiar with the ping pong conventions in rust. But I think the code can be improved in a couple of ways.

The health check logic is redundant in this implementation.
The only scenario where last_pong wouldn't be updated is if the client stops responding to pings. But here's what happens in that case:

Ping is sent → No pong response from client
client.recv() continues waiting for the pong (or any message)
Next ping timer fires → client.ping().await is called
Ping will fail because the connection is broken/unresponsive
Loop breaks immediately on the ping error

Meanwhile, in your implementation, even if the pong fails, client.ping().await might not fail immediately. The client could enter a zombie state where it continues to receive pings but never responds with a pong. I suggest to simplify this logic by making client.ping() wait for the pong response. The overall flow should look something like this:

Ping Timer Fires → Calls client.ping_and_wait(PONG_TIMEOUT)
Ping Sent → Creates oneshot channel and sends ping frame
Waits for Pong → Times out after x seconds if no pong received
Pong Received → handle_pong() notifies the waiting ping
Success or Failure → Ping returns Ok() or Error, breaking loop on failure

In this case, we could have cleaner code with no periodic healthchecks every 2 seconds

angel-ding-cb · 2025-05-28T16:38:09Z

I have some personal opinions (not strongly held) on the ping/pong mechanism as I am basically implementing the same thing in the conductors. I think the ping/pong mechanism should be observational (monitoring health) rather than operational (managing connections). We can afford keeping all the connections open even if they don't respond with a pong in 12 seconds (in your health check case). When we actually deploy this to mainnet, the client to rollup boost will be 1 conductor. So even if it is not responsive, we can still keep the connection open. We just need to log out error messages instead of closing the connections, which might result in deadlocks.

More context on the architecture is that we will have 5 conductors and 5 rollup boosts (1:1 mapping). And Conductor health is managed separately by conductor health check logic, which means unhealthy conductors won't become leaders anyway.

If we only log out warnings, we can avoid false positives: Network hiccup → Missed pong → Connection killed → Healthy conductor unnecessarily disconnected.

angel-ding-cb · 2025-05-28T16:42:21Z

This is more of a question than a comment since I am not familiar with the rust conventions of ping pong mechanisms. In my golang implementation in conductors, I wanted to make sure that it can send 2-3 pings without the client timing out. My pong read timeout is 60s and my ping interval is every 30 seconds. I am worried that 10s ping might be too frequent. But I will defer it to you!

danyalprout · 2025-05-28T16:55:29Z

I agree with the client.ping_and_wait(PONG_TIMEOUT).

We can afford keeping all the connections open even if they don't respond with a pong in 12 seconds (in your health check case).

@angel-ding-cb I don't think that is true for the websocket proxy. We could have a huge number of clients

haardikk21 · 2025-05-28T16:57:24Z

I am not a rust expert and I am not familiar with the ping pong conventions in rust. But I think the code can be improved in a couple of ways.
1. The health check logic is redundant in this implementation.
   The only scenario where last_pong wouldn't be updated is if the client stops responding to pings. But here's what happens in that case:


* Ping is sent → No pong response from client

* `client.recv()` continues waiting for the pong (or any message)

* Next ping timer fires → client.ping().await is called

* Ping will fail because the connection is broken/unresponsive

* Loop breaks immediately on the ping error
Meanwhile, in your implementation, even if the pong fails, client.ping().await might not fail immediately. The client could enter a zombie state where it continues to receive pings but never responds with a pong. I suggest to simplify this logic by making client.ping() wait for the pong response. The overall flow should look something like this:
1. Ping Timer Fires → Calls `client.ping_and_wait(PONG_TIMEOUT)`

2. Ping Sent → Creates oneshot channel and sends ping frame

3. Waits for Pong → Times out after x seconds if no pong received

4. Pong Received → `handle_pong()` notifies the waiting ping

5. Success or Failure → Ping returns` Ok()` or Error, breaking loop on failure
In this case, we could have cleaner code with no periodic healthchecks every 2 seconds

It's important to note here that pings can definitely continue sending without the client ponging back. It's the same as our flashblocks messages continuing to send to them even if they're not actually processing/active. The cases in which this happens is the same - device put to sleep, CPU throttling processes, browsers sleeping their tabs, etc.

so its not good enough to just assume that the next ping will fail. the healthcheck therefore expects a pong or it forces a disconnect.

the ping_and_wait approach works though, i can switch to that instead

This is more of a question than a comment since I am not familiar with the rust conventions of ping pong mechanisms. In my golang implementation in conductors, I wanted to make sure that it can send 2-3 pings without the client timing out. My pong read timeout is 60s and my ping interval is every 30 seconds. I am worried that 10s ping might be too frequent. But I will defer it to you!

im considering perhaps making these part of config/env instead of hardcoding so we can play with it. im not too sure on what the right value should be atm either.

angel-ding-cb · 2025-05-28T17:02:20Z

@danyalprout I agree that for the websocket proxy, we could have a huge number of clients. That's why the conductor will close out stale/unresponsive connections gracefully. Since in the final state where the rollupboost will only stream messages from the builder to the conductor, I am worried that closing the connections will interrupt the flow when there is a false positive. But this is not a huge issue since the conductor will indefinitely try to connect to rollup boost if it ever loses the connection. So I will defer it to you @haardikk21

danyalprout · 2025-05-28T17:56:05Z

@angel-ding-cb I believe this code is for managing connections to downstream clients from the websocket proxy (i.e. the nodes).

Unless I'm missing something, I don't think it'll impact the conductor setup at all

add ping/pong for clients

4070151

haardikk21 closed this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

websocket proxy: ping/pong downstream clients and proactively disconnect #264

websocket proxy: ping/pong downstream clients and proactively disconnect #264

Uh oh!

haardikk21 commented May 28, 2025 •

edited

Loading

Uh oh!

vercel bot commented May 28, 2025 •

edited

Loading

Uh oh!

angel-ding-cb commented May 28, 2025 •

edited

Loading

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

danyalprout commented May 28, 2025

Uh oh!

haardikk21 commented May 28, 2025

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

danyalprout commented May 28, 2025

Uh oh!

Uh oh!

websocket proxy: ping/pong downstream clients and proactively disconnect #264

websocket proxy: ping/pong downstream clients and proactively disconnect #264

Uh oh!

Conversation

haardikk21 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angel-ding-cb commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

danyalprout commented May 28, 2025

Uh oh!

haardikk21 commented May 28, 2025

Uh oh!

angel-ding-cb commented May 28, 2025

Uh oh!

danyalprout commented May 28, 2025

Uh oh!

Uh oh!

haardikk21 commented May 28, 2025 •

edited

Loading

vercel bot commented May 28, 2025 •

edited

Loading

angel-ding-cb commented May 28, 2025 •

edited

Loading