Skip to content

Commit 5cd50a3

Browse files
78cursoragent
andcommitted
remote_session: keep guest content off the cloud relay; soften ICE setup
The cloud websocket is treated as untrusted in our threat model, but remote_session.created was leaking the cursor bitmap (and clipboard / audio metadata) inside its runtime snapshot. Add a SnapshotScope to RuntimeManager::RemoteRuntimeSnapshot and have the cloud-facing path use kPublic so only running / display / frame / framebuffer_ready travel over WSS; cursor continues to seed over the WebRTC control DC (SetDataChannelOpenHandler already does this). Also fix the WebRTC handshake stalls we were seeing on CN networks: - AcceptOffer waited up to 5s for both local SDP *and* full ICE gathering; if STUN was UDP-blackholed (very common with stun.l.google on mainland ISPs) gathering never completed and the answer came back empty with "timed out creating WebRTC answer". Make gathering soft - return whatever host candidates we have once SDP is ready, log a WARN, and trickle later candidates via a new LocalIceCandidateHandler. - Stretch the SDP wait to 10s as a safety margin. - Default STUN list now skews toward CN-reachable hosts (stun.qq.com, stun.miwifi.com, stun.cloudflare.com); operators can still override the whole list with TENBOX_STUN_SERVERS. The same list is advertised to the browser via ice_servers so both peers probe a consistent set. - cloud_tunnel forwards every host-side candidate as a remote_signal.candidate push, reusing the existing cloud relay forward path for browser-side trickle. Bumps to 0.7.20. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 00f72dc commit 5cd50a3

6 files changed

Lines changed: 217 additions & 15 deletions

File tree

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.7.19
1+
0.7.20

src/daemon/cloud_tunnel.cpp

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1625,6 +1625,32 @@ std::shared_ptr<WebRtcPeer> CloudTunnel::CreateRemotePeer(
16251625
peer->SetDataChannelHandler([this, vm_id](const nlohmann::json& message) {
16261626
HandleDataChannelMessage(vm_id, message);
16271627
});
1628+
// Trickle host-side ICE candidates back to the browser through the
1629+
// existing `remote_signal.candidate` cloud relay envelope. The
1630+
// browser-side hostBus router (`remote_signal.candidate` listener
1631+
// in RemoteDesktopPanel) feeds them to RTCPeerConnection.addIceCandidate.
1632+
// Without this, AcceptOffer's softened gathering window would only
1633+
// ship whatever candidates were collected within ~10s; on STUN-blocked
1634+
// networks the answer sometimes returns with zero candidates and
1635+
// the browser would have nothing to connect to.
1636+
peer->SetLocalIceCandidateHandler(
1637+
[this, session_id, vm_id](nlohmann::json candidate) {
1638+
if (fd_ < 0) return;
1639+
nlohmann::json payload = {
1640+
{"session_id", session_id},
1641+
};
1642+
for (auto it = candidate.begin(); it != candidate.end(); ++it) {
1643+
payload[it.key()] = it.value();
1644+
}
1645+
(void)SendJson({
1646+
{"id", GenerateUuid()},
1647+
{"type", "remote_signal.candidate"},
1648+
{"host_id", host_id_},
1649+
{"vm_id", vm_id},
1650+
{"session_id", session_id},
1651+
{"payload", std::move(payload)},
1652+
});
1653+
});
16281654
// When the control DC opens, push the latest cached cursor so the
16291655
// browser is in sync even if MOVE_CURSOR events fired (and were
16301656
// dropped) while the channel was still in DTLS/SCTP setup.
@@ -1802,10 +1828,20 @@ nlohmann::json CloudTunnel::CreateRemoteSession(const std::string& vm_id, const
18021828
auto json = ToJson(*session);
18031829
json["video_bitrate_bps"] = video_bitrate_bps;
18041830
json["video_pixel_format"] = RemoteVideoPixelFormatName(video_pixel_format);
1831+
// Advertise the same STUN list the daemon itself is using so both
1832+
// peers agree on the candidate-gathering servers (see ResolvedStunServers
1833+
// for why the defaults skew toward CN-reachable hosts; operators can
1834+
// override via TENBOX_STUN_SERVERS).
1835+
nlohmann::json ice_urls = nlohmann::json::array();
1836+
for (const auto& url : ResolvedStunServers()) ice_urls.push_back(url);
18051837
json["ice_servers"] = nlohmann::json::array({
1806-
{{"urls", nlohmann::json::array({"stun:stun.l.google.com:19302"})}},
1838+
{{"urls", std::move(ice_urls)}},
18071839
});
1808-
json["runtime"] = runtime_manager_.RemoteRuntimeSnapshot(vm_id);
1840+
// Public snapshot: scrub cursor pixels / clipboard / audio so the cloud
1841+
// relay never observes guest-visible content. Browser receives those
1842+
// exclusively through the WebRTC control DataChannel.
1843+
json["runtime"] = runtime_manager_.RemoteRuntimeSnapshot(
1844+
vm_id, RuntimeManager::SnapshotScope::kPublic);
18091845
json["webrtc_ready"] = NativeWebRtcAvailable();
18101846
json["message"] = NativeWebRtcAvailable()
18111847
? "remote session signaling is ready"

src/daemon/remote_webrtc.cpp

Lines changed: 138 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
#include <rtc/plihandler.hpp>
66

77
#include <atomic>
8+
#include <cctype>
89
#include <chrono>
910
#include <condition_variable>
1011
#include <cstdlib>
@@ -15,9 +16,11 @@
1516
#include <inttypes.h>
1617
#include <mutex>
1718
#include <optional>
19+
#include <string>
1820
#include <string_view>
1921
#include <thread>
2022
#include <utility>
23+
#include <vector>
2124

2225
namespace tenbox::daemon {
2326

@@ -70,6 +73,50 @@ const char* PeerStateName(int state) {
7073
}
7174
}
7275

76+
// Returns the configured STUN URL list. `TENBOX_STUN_SERVERS` is a
77+
// comma-separated list (whitespace tolerated) of full URLs such as
78+
// "stun:stun.qq.com:3478,stun:stun.miwifi.com:3478". Empty entries
79+
// are skipped.
80+
//
81+
// Defaults are skewed for mainland China deployments because that's where
82+
// most of the production fleet lives today and Google STUN
83+
// (stun.l.google.com:19302) is regularly UDP-blackholed by CN ISPs,
84+
// which previously made WebRTC setup time out at "creating answer".
85+
// Order matters - libdatachannel probes them in sequence:
86+
// 1. Tencent - rock solid in CN, used by every domestic Live SDK
87+
// 2. Xiaomi - secondary domestic option, embedded in MiWiFi routers
88+
// 3. Cloudflare - global fallback for users outside CN; CF anycast is
89+
// reachable with low latency from the mainland too,
90+
// so it doubles as a third tier when the first two are
91+
// busy or filtered.
92+
// Operators can override the whole list with TENBOX_STUN_SERVERS, e.g.
93+
// to point at a self-hosted coturn or to add TURN once we have one.
94+
std::vector<std::string> ConfiguredStunServers() {
95+
const char* raw = std::getenv("TENBOX_STUN_SERVERS");
96+
std::vector<std::string> out;
97+
if (raw && raw[0] != '\0') {
98+
std::string_view view(raw);
99+
size_t start = 0;
100+
while (start <= view.size()) {
101+
size_t comma = view.find(',', start);
102+
size_t end = comma == std::string_view::npos ? view.size() : comma;
103+
size_t s = start;
104+
size_t e = end;
105+
while (s < e && std::isspace(static_cast<unsigned char>(view[s]))) ++s;
106+
while (e > s && std::isspace(static_cast<unsigned char>(view[e - 1]))) --e;
107+
if (e > s) out.emplace_back(view.substr(s, e - s));
108+
if (comma == std::string_view::npos) break;
109+
start = comma + 1;
110+
}
111+
}
112+
if (out.empty()) {
113+
out.emplace_back("stun:stun.qq.com:3478");
114+
out.emplace_back("stun:stun.miwifi.com:3478");
115+
out.emplace_back("stun:stun.cloudflare.com:3478");
116+
}
117+
return out;
118+
}
119+
73120
std::string StripLipSyncGroups(std::string sdp) {
74121
std::string filtered;
75122
filtered.reserve(sdp.size());
@@ -101,7 +148,16 @@ class NativeWebRtcPeer final : public WebRtcPeer {
101148
: frame_reader_(std::move(frame_reader)),
102149
preferred_video_format_(preferred_video_format) {
103150
rtc::Configuration config;
104-
config.iceServers.emplace_back("stun:stun.l.google.com:19302");
151+
for (const auto& url : ConfiguredStunServers()) {
152+
try {
153+
config.iceServers.emplace_back(url);
154+
} catch (const std::exception& e) {
155+
std::fprintf(stdout,
156+
"[WARN] remote_webrtc: ignoring invalid STUN url %s: %s\n",
157+
url.c_str(), e.what());
158+
std::fflush(stdout);
159+
}
160+
}
105161
config.disableAutoNegotiation = true;
106162
peer_ = std::make_shared<rtc::PeerConnection>(std::move(config));
107163
peer_->onStateChange([this](rtc::PeerConnection::State state) {
@@ -131,11 +187,32 @@ class NativeWebRtcPeer final : public WebRtcPeer {
131187
cv_.notify_all();
132188
});
133189
peer_->onLocalCandidate([this](rtc::Candidate candidate) {
134-
std::lock_guard<std::mutex> lock(mu_);
135-
candidates_.push_back({
190+
nlohmann::json entry = {
136191
{"candidate", std::string(candidate)},
137192
{"sdpMid", candidate.mid()},
138-
});
193+
};
194+
LocalIceCandidateHandler handler;
195+
{
196+
std::lock_guard<std::mutex> lock(mu_);
197+
candidates_.push_back(entry);
198+
handler = local_ice_handler_;
199+
}
200+
// Trickle every host-side candidate to the embedder so the
201+
// browser can start probing as soon as gathering produces
202+
// host / srflx entries, instead of having to wait for the
203+
// initial answer's `candidates[]` (which we cap at a short
204+
// gathering window so STUN-blackholed networks don't stall
205+
// session creation).
206+
if (handler) {
207+
try {
208+
handler(std::move(entry));
209+
} catch (const std::exception& e) {
210+
std::fprintf(stdout,
211+
"[ERROR] remote_webrtc: local ice handler threw: %s\n",
212+
e.what());
213+
std::fflush(stdout);
214+
}
215+
}
139216
});
140217
peer_->onGatheringStateChange([this](rtc::PeerConnection::GatheringState state) {
141218
if (state == rtc::PeerConnection::GatheringState::Complete) {
@@ -183,13 +260,36 @@ class NativeWebRtcPeer final : public WebRtcPeer {
183260
peer_->setRemoteDescription(rtc::Description(sdp, "offer"));
184261
peer_->setLocalDescription(rtc::Description::Type::Answer);
185262

263+
// Wait long enough to (a) absolutely require the answer SDP and
264+
// (b) opportunistically pick up gathered candidates so the
265+
// initial answer carries something useful.
266+
//
267+
// `description_ready_` is hard: without local SDP we have no
268+
// answer to hand back, so we error out. `gathering_complete_`
269+
// is intentionally soft - if STUN servers are unreachable
270+
// (common on locked-down networks: 19302/UDP black-holed by
271+
// ISP, no TURN configured), libdatachannel keeps waiting on
272+
// the binding response indefinitely. Returning the SDP with
273+
// whatever host candidates we already have lets LAN peers
274+
// connect, and the `LocalIceCandidateHandler` continues to
275+
// trickle later srflx/relay candidates as they arrive.
186276
std::unique_lock<std::mutex> lock(mu_);
187-
const bool ready = cv_.wait_for(lock, std::chrono::seconds(5), [this] {
277+
cv_.wait_for(lock, std::chrono::seconds(10), [this] {
188278
return description_ready_ && gathering_complete_;
189279
});
190-
if (!ready || local_sdp_.empty()) {
280+
if (local_sdp_.empty()) {
191281
return WebRtcAnswer{.ok = false, .error = "timed out creating WebRTC answer"};
192282
}
283+
if (!gathering_complete_) {
284+
std::fprintf(stdout,
285+
"[WARN] remote_webrtc: returning answer before ICE "
286+
"gathering finished (%zu host candidate(s) so far); "
287+
"remaining candidates will trickle. Check STUN "
288+
"reachability or set TENBOX_STUN_SERVERS to a "
289+
"reachable server list.\n",
290+
candidates_.size());
291+
std::fflush(stdout);
292+
}
193293
return WebRtcAnswer{
194294
.ok = true,
195295
.sdp = local_sdp_,
@@ -242,6 +342,33 @@ class NativeWebRtcPeer final : public WebRtcPeer {
242342
dc_handler_ = std::move(handler);
243343
}
244344

345+
void SetLocalIceCandidateHandler(LocalIceCandidateHandler handler) override {
346+
// Snapshot any candidates already gathered before the embedder
347+
// installed the trickle handler (typical race: handler is
348+
// attached right after CreateWebRtcPeer but onLocalCandidate
349+
// fires from the libdatachannel worker as soon as
350+
// setLocalDescription is called). Replaying ensures the browser
351+
// ends up with the same candidate set regardless of timing.
352+
std::vector<nlohmann::json> backlog;
353+
{
354+
std::lock_guard<std::mutex> lock(mu_);
355+
local_ice_handler_ = handler;
356+
if (handler && !candidates_.empty()) {
357+
for (const auto& c : candidates_) backlog.push_back(c);
358+
}
359+
}
360+
for (auto& c : backlog) {
361+
try {
362+
handler(std::move(c));
363+
} catch (const std::exception& e) {
364+
std::fprintf(stdout,
365+
"[ERROR] remote_webrtc: local ice handler threw on backlog: %s\n",
366+
e.what());
367+
std::fflush(stdout);
368+
}
369+
}
370+
}
371+
245372
void SetDataChannelOpenHandler(DataChannelOpenHandler handler) override {
246373
// Snapshot any already-open channels so a handler installed after
247374
// open() still gets a chance to seed initial state. Channels we
@@ -1153,6 +1280,7 @@ class NativeWebRtcPeer final : public WebRtcPeer {
11531280
std::vector<std::shared_ptr<rtc::DataChannel>> data_channels_;
11541281
DataChannelMessageHandler dc_handler_;
11551282
DataChannelOpenHandler dc_open_handler_;
1283+
LocalIceCandidateHandler local_ice_handler_;
11561284
std::shared_ptr<rtc::Track> video_track_;
11571285
std::shared_ptr<rtc::Track> audio_track_;
11581286
std::thread video_thread_;
@@ -1187,4 +1315,8 @@ bool NativeWebRtcAvailable() {
11871315
return true;
11881316
}
11891317

1318+
std::vector<std::string> ResolvedStunServers() {
1319+
return ConfiguredStunServers();
1320+
}
1321+
11901322
} // namespace tenbox::daemon

src/daemon/remote_webrtc.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,14 @@ using DataChannelMessageHandler = std::function<void(const nlohmann::json&)>;
7171
// silently dropped by SendOnDataChannel.
7272
using DataChannelOpenHandler = std::function<void(const std::string& label)>;
7373

74+
// Fired for every host-side ICE candidate as it is gathered. The
75+
// embedder is expected to forward the candidate to the remote peer over
76+
// whatever signaling channel is in use (cloud websocket relay in the
77+
// production setup). Candidates included in the initial WebRtcAnswer
78+
// are *also* delivered through this handler if it is installed before
79+
// AcceptOffer; deduplication is the embedder's responsibility.
80+
using LocalIceCandidateHandler = std::function<void(nlohmann::json candidate)>;
81+
7482
class WebRtcPeer {
7583
public:
7684
virtual ~WebRtcPeer() = default;
@@ -80,6 +88,7 @@ class WebRtcPeer {
8088
virtual void SetVideoBitrate(uint32_t bitrate_bps) = 0;
8189
virtual void SetDataChannelHandler(DataChannelMessageHandler handler) = 0;
8290
virtual void SetDataChannelOpenHandler(DataChannelOpenHandler handler) = 0;
91+
virtual void SetLocalIceCandidateHandler(LocalIceCandidateHandler handler) = 0;
8392
// Send a JSON text frame to a specific data channel (typically "control"
8493
// for clipboard / status events). Returns false if the channel doesn't
8594
// exist or isn't open yet; the caller is expected to drop the message.
@@ -91,4 +100,11 @@ std::shared_ptr<WebRtcPeer> CreateWebRtcPeer(
91100
PixelFormat preferred_video_format = PixelFormat::kYuv420p);
92101
bool NativeWebRtcAvailable();
93102

103+
// Returns the resolved STUN URL list (TENBOX_STUN_SERVERS override or the
104+
// CN-leaning defaults). Exposed so the cloud tunnel can advertise the
105+
// same servers to the browser-side RTCPeerConnection - keeping both sides
106+
// of the connection probing the same set avoids one peer giving up on
107+
// srflx because it picked a server the other peer cannot reach.
108+
std::vector<std::string> ResolvedStunServers();
109+
94110
} // namespace tenbox::daemon

src/daemon/runtime_manager.cpp

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1079,7 +1079,8 @@ bool RuntimeManager::SetRemoteVideoPixelFormat(const std::string& vm_id, PixelFo
10791079
return true;
10801080
}
10811081

1082-
nlohmann::json RuntimeManager::RemoteRuntimeSnapshot(const std::string& vm_id) const {
1082+
nlohmann::json RuntimeManager::RemoteRuntimeSnapshot(const std::string& vm_id,
1083+
SnapshotScope scope) const {
10831084
auto session = FindSession(vm_id);
10841085
if (!session) return {{"running", false}};
10851086
bool framebuffer_ready = false;
@@ -1088,15 +1089,25 @@ nlohmann::json RuntimeManager::RemoteRuntimeSnapshot(const std::string& vm_id) c
10881089
framebuffer_ready = session->framebuffer && session->framebuffer->IsValid();
10891090
}
10901091
std::lock_guard<std::mutex> lock(session->console_mutex);
1091-
return {
1092+
nlohmann::json snapshot = {
10921093
{"running", true},
10931094
{"display", session->display_state},
10941095
{"frame", session->last_frame},
10951096
{"framebuffer_ready", framebuffer_ready},
1096-
{"cursor", session->cursor},
1097-
{"audio", session->last_audio},
1098-
{"clipboard", session->last_clipboard},
10991097
};
1098+
// Guest-visible state (cursor bitmap, clipboard activity metadata,
1099+
// audio activity metadata) MUST NOT appear in a Public snapshot — that
1100+
// payload travels over the cloud websocket relay, which our threat model
1101+
// treats as untrusted. Browsers receive these via the WebRTC `control`
1102+
// DataChannel after DTLS/SRTP is up; the daemon seeds the latest cursor
1103+
// on channel-open in cloud_tunnel so a late-attaching peer is not stuck
1104+
// waiting for the next MOVE_CURSOR (virtio_gpu source-side dedup).
1105+
if (scope == SnapshotScope::kInternal) {
1106+
snapshot["cursor"] = session->cursor;
1107+
snapshot["audio"] = session->last_audio;
1108+
snapshot["clipboard"] = session->last_clipboard;
1109+
}
1110+
return snapshot;
11001111
}
11011112

11021113
nlohmann::json RuntimeManager::CursorSnapshot(const std::string& vm_id) const {

src/daemon/runtime_manager.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,14 @@ class RuntimeManager {
8787
RemoteVideoFrame* frame,
8888
bool need_full_frame,
8989
std::chrono::milliseconds wait_timeout = std::chrono::milliseconds(0));
90-
nlohmann::json RemoteRuntimeSnapshot(const std::string& vm_id) const;
90+
// Snapshot scope. `kPublic` strips every field that carries guest-visible
91+
// content (cursor pixels, clipboard metadata, audio metadata). It is the
92+
// only variant we are allowed to put on a wire that the cloud relay can
93+
// observe in plaintext, because our threat model treats the relay as
94+
// untrusted. `kInternal` keeps everything for in-process callers.
95+
enum class SnapshotScope { kPublic, kInternal };
96+
nlohmann::json RemoteRuntimeSnapshot(const std::string& vm_id,
97+
SnapshotScope scope = SnapshotScope::kInternal) const;
9198
// Returns the most recently observed cursor JSON for `vm_id`, or an empty
9299
// object if no cursor event has been seen yet (or the VM is gone). Used
93100
// by the cloud tunnel to seed a freshly opened control DataChannel,

0 commit comments

Comments
 (0)