[DashboardClientX] Apply httplib timeouts; wire setReceiveTimeout()#518
Conversation
cpp-httplib's default read_timeout is 300 seconds. Because the PolyScope X
DashboardClientImplX constructor never overrides any timeout, every dashboard
call (commandPowerOff, commandStop, commandPlay, etc.) blocks for the full
300 s when the controller becomes unresponsive — e.g. a network partition
or paused container in tests. Symptoms include thread-pool exhaustion in
callers that dispatch dashboard commands from a bounded executor pool.
Two changes:
1. Apply explicit set_connection_timeout(5s), set_read_timeout(1s), and
set_write_timeout(5s) in the constructor. The 1 s read default matches
the contract already documented on this impl: getConfiguredReceiveTimeout()
returns {tv_sec = 1, tv_usec = 0}.
2. Wire setReceiveTimeout(const timeval&) into the underlying httplib::Client
instead of being a [[maybe_unused]] no-op. This makes the base-class API
work as documented on the X path; callers can now tune the read timeout
per their needs.
Both changes are backward-compatible: there is no public API change, and the
new defaults align with what the impl already advertised via
getConfiguredReceiveTimeout().
|
Hi @urfeex — would you be willing to take a look at this when you have a moment? Small fix for the dashboard 300 s hang on PolyScope X. Thanks! |
Address review feedback: the new 1 s default applied in the constructor was in effect during the initial openapi.json GET in connect(), unlike the prior 300 s default which gave that first read plenty of headroom. Mirror the G5 pattern (dashboard_client_implementation_g5.cpp connect()): save the configured timeout, temporarily bump the read timeout to 10 s for the setup GET, then restore the configured value before returning. Also consolidate the exit paths so the restore runs on every outcome.
Address review feedback: json::parse and VersionInformation::fromString can throw on malformed or unexpected openapi.json payloads. Without exception handling, the restore call at the end of connect() would be skipped and the client would be left on the 10 s setup read timeout. Wrap the body in try / catch(...) and restore the configured timeout before rethrowing, so the configured value is honored on every exit path.
Address review feedback: setReceiveTimeout previously only updated the httplib client, while getConfiguredReceiveTimeout always returned the hardcoded 1 s default. connect() saves and restores via the latter, so any custom read timeout the caller set before (or between) connects was silently overwritten with 1 s. Mirror the G5 pattern: store the caller-provided value in a member recv_timeout_ (null until explicitly set), and have getConfiguredReceiveTimeout return that value when present, falling back to the 1 s documented default otherwise. Now connect()'s save/restore round-trips the caller's value correctly.
There was a problem hiding this comment.
Thanks a lot for this. However, I think to get this to a mergeable state, we would have to properly implement the concept of timeouts. The current getConfiguredReceiveTimeout is only a placeholder. We should fill that with life and probably also add a mechanism to alter the write timeout.
power_on would have to be updated to actually use the timeout parameter and reset to the default timeout afterwards, as done in G5.
@Vinicius-T-Robotics would you be up for extending this PR accordingly?
Edit: I see that while I was writing this, you have been addressing some of the points already :-)
| // partition, paused container in tests). Align with the contract documented on the base | ||
| // class: getConfiguredReceiveTimeout() returns {tv_sec = 1, tv_usec = 0} on this impl. | ||
| cli_->set_connection_timeout(std::chrono::seconds(5)); | ||
| cli_->set_read_timeout(std::chrono::seconds(1)); |
There was a problem hiding this comment.
At this point we should probably make the read timeout available as in the G5 implementation. the GetConfiguredReceiveTimeout method was so far only a dummy implementation. With these changes, I think it makes sense to keep a timeout member around. As far as I can see, cpp-httplib doesn't allow accessing the timeout, so we would have to keep track here.
We would also have to be sensitive about blocking calls such as power_on and brake_release that may take quite some time on real hardware. For that reason power_on has a timeout parameter already, but brake_release doesn't.
Other calls that might hit a timeout are
- loading programs (read timeout)
- importing (uploading) programs (write timeout)
- Updating programs (write timeout)
- Downloading programs (read timeout)
Using a higher default timeout (e.g. 10 seconds) might cover most of those cases already.
Addresses upstream review on the httplib timeout PR:
- Add setSendTimeout / getConfiguredSendTimeout to the base class
(DashboardClientImpl), mirroring the existing read-timeout API.
getConfiguredSendTimeout has a non-pure default that returns the
documented 1 s value, so DashboardClientImplG5 needs no change.
- Override both in DashboardClientImplX with a send_timeout_ member
parallel to recv_timeout_. The setter updates the underlying
httplib::Client; the getter falls back to the 10 s constructor
default until a caller configures it explicitly.
- Bump the constructor's read/write timeouts from 1 s / 5 s to
10 s / 10 s. The 1 s read default was too tight for legitimate
blocking calls (brake_release, commandLoadProgram, program
upload/update/download). The 10 s figure covers all non-power-on
commands without per-method timeouts; callers can override via
setReceiveTimeout / setSendTimeout.
- Honor the timeout parameter on commandPowerOn using the same
save-bump-restore pattern as connect(): cache the configured
read timeout, set it to the caller-supplied value for the PUT,
restore on both success and exception (catch(...) rethrow).
Address review feedback: connect() unconditionally set the setup-phase read timeout to 10 s, even when getConfiguredReceiveTimeout() already returned a larger value. That temporarily shrank the in-flight read timeout during the openapi.json GET. The worst case is the lazy-connect path inside commandPowerOn: power on calls setReceiveTimeout(timeout) (typically 300 s by default), then issues a PUT. If robot_api_version_ is still empty, put() lazy-calls connect(), which would previously drop the 300 s back to 10 s for the setup GET — exactly when the controller is booting and the GET most needs the larger budget. Treat the 10 s as a minimum-headroom for the setup GET, not a cap on the caller's preferences: start from the configured value and only bump up to 10 s if it is smaller. The save/restore around the GET still honors whatever the caller had configured before connect() was entered.
Address review feedback: commandPowerOn previously mapped its std::chrono::duration<double> argument to timeval via duration_cast<seconds>, which truncates toward zero, and hardcoded tv_usec = 0. A caller passing e.g. 2.5 s would silently get a 2 s read timeout. Cast to microseconds (the smallest unit timeval can represent) and split into tv_sec + tv_usec so the fractional portion survives.
Hello, have added the write timeout as well in 241c1a2 |
The std::unique_ptr<timeval> indirection was carried over from
DashboardClientImplG5, where it composes with TCPSocket's existing
unique_ptr<timeval> member. In DashboardClientImplX it bought nothing:
no nullable semantics are needed because the constructor now installs
a 10 s default that every getter falls back to anyway.
Replace recv_timeout_ and send_timeout_ with value-typed timeval
members defaulted to {10, 0}. setReceiveTimeout / setSendTimeout
assign directly; the getters return the member. Behavior is
preserved (verified end-to-end against a stalling-server test that
times connect() at the configured value).
Reviewer feedback: the save-bump-restore in connect() mimics G5's
pattern but for the wrong reason. G5 bumps because it does an
unsolicited blocking read for the dashboard server's boot-up
message after TCPSocket::setup() — explicitly called out in the G5
source ("The first read after connection can take more time").
PolyScope X's connect() just issues an HTTP GET for /openapi.json
(~200 ms over VPN per maintainer measurement); no analogous
unsolicited read exists.
The original cursor-bot concern that motivated b1f900d was that the
1 s constructor read_timeout left openapi.json under-resourced
while in-tree callers raised the timeout only post-connect. That
concern was resolved by bumping the constructor default from 1 s to
10 s, which gives openapi.json the same headroom the example wrapper
applies manually. The save-bump-restore became dead weight: a no-op
for default callers, and it overrides explicit caller values shorter
than 10 s — paternalistic behavior the reviewer is rightly pushing
back on.
Drop the bump, the try/catch(...) rethrow guard, and the restore.
connect() returns to its pre-b1f900d shape (early returns from the
GET-and-parse pipeline). The recv_timeout_ storage stays because
commandPowerOn still uses it for its own per-call save-bump-restore,
which is a different and legitimate case (caller-supplied longer
deadline for an operation known to take longer).
| // time on real hardware — brake_release, commandLoadProgram (read), commandUploadProgram | ||
| // (write), commandUpdateProgram (write), commandDownloadProgram (read) — without forcing | ||
| // each one to plumb its own timeout. commandPowerOn already accepts its own (longer) | ||
| // timeout parameter. Callers needing different limits can override via setReceiveTimeout. |
There was a problem hiding this comment.
Receive timeout ignores write uploads
Medium Severity
The new constructor comment tells callers to raise limits via setReceiveTimeout, but commandUploadProgram and commandUpdateProgram are bounded by httplib’s write timeout. This change adds setSendTimeout on the X implementation and sets a 10s write default, while DashboardClient still only forwards receive timeout, so tuning read timeout will not help upload/write failures.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e84c769. Configure here.
Co-authored-by: Felix Exner <feex@universal-robots.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #518 +/- ##
==========================================
- Coverage 78.85% 78.36% -0.50%
==========================================
Files 116 115 -1
Lines 6612 6637 +25
Branches 2920 2927 +7
==========================================
- Hits 5214 5201 -13
- Misses 1031 1068 +37
- Partials 367 368 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit 8b9e0f4. Configure here.
urfeex
left a comment
There was a problem hiding this comment.
Adding a test setting timeouts to a couple of microseconds and validating that the calls then fail would be good. Apart from that, I think we're good to go.
| // If the caller has explicitly configured a receive timeout via setReceiveTimeout, | ||
| // return that. Otherwise fall back to the constructor default. See the constructor | ||
| // comment for the rationale on the 10 s default (covers brake_release / program | ||
| // load/upload/download without per-method timeouts). |
There was a problem hiding this comment.
| // If the caller has explicitly configured a receive timeout via setReceiveTimeout, | |
| // return that. Otherwise fall back to the constructor default. See the constructor | |
| // comment for the rationale on the 10 s default (covers brake_release / program | |
| // load/upload/download without per-method timeouts). |
|
|
||
| timeval DashboardClientImplX::getConfiguredSendTimeout() const | ||
| { | ||
| // Mirrors getConfiguredReceiveTimeout. Default of 10 s matches the constructor. |
There was a problem hiding this comment.
| // Mirrors getConfiguredReceiveTimeout. Default of 10 s matches the constructor. |


cpp-httplib's default read_timeout is 300 seconds. Because the PolyScope X DashboardClientImplX constructor never overrides any timeout, every dashboard call (commandPowerOff, commandStop, commandPlay, etc.) blocks for the full 300 s when the controller becomes unresponsive — e.g. a network partition or paused container in tests. Symptoms include thread-pool exhaustion in callers that dispatch dashboard commands from a bounded executor pool.
Two changes:
Apply explicit set_connection_timeout(5s), set_read_timeout(1s), and set_write_timeout(5s) in the constructor. The 1 s read default matches the contract already documented on this impl: getConfiguredReceiveTimeout() returns {tv_sec = 1, tv_usec = 0}.
Wire setReceiveTimeout(const timeval&) into the underlying httplib::Client instead of being a [[maybe_unused]] no-op. This makes the base-class API work as documented on the X path; callers can now tune the read timeout per their needs.
Both changes are backward-compatible: there is no public API change, and the new defaults align with what the impl already advertised via getConfiguredReceiveTimeout().
Note
Medium Risk
Changes default blocking behavior for all PolyScope X dashboard HTTP calls and power-on timing; mis-tuned timeouts could cause new failures on slow robot operations.
Overview
PolyScope X dashboard HTTP calls no longer inherit cpp-httplib’s 300 s read/write defaults.
DashboardClientImplXnow applies 5 s connection and 10 s read/write timeouts at construction, keeps them inrecv_timeout_/send_timeout_, andsetReceiveTimeout/setSendTimeoutupdate the underlyinghttplib::Client.The base
DashboardClientImplAPI gainsgetConfiguredSendTimeout/setSendTimeout(default getter still 1 s on non-overriding impls).getConfiguredReceiveTimeouton the X path returns the stored value (10 s default) instead of a fixed 1 s.commandPowerOntemporarily raises the read timeout to the caller’stimeoutargument for the power-on PUT, then restores the previous setting (including on exceptions).Reviewed by Cursor Bugbot for commit 8b9e0f4. Bugbot is set up for automated code reviews on this repo. Configure here.