Remote rtc by grach0v · Pull Request #3125 · huggingface/lerobot

grach0v · 2026-03-10T17:46:11Z

feat(rtc): remote inference system + action queue delay fix

Type / Scope

Type: Feature + Bug
Scope: policies/rtc, policies/pi05, examples/remote_rtc

Summary / Motivation

Adds a client-server architecture for running RTC policy inference on a remote GPU server while the robot client runs on lightweight hardware (gRPC). Also fixes several bugs: pi05 RTC mode, torch.compile CUDA graph compatibility issues.

What changed

New: examples/remote_rtc/ — gRPC server (rtc_policy_server.py), robot client (eval_with_real_robot.py), dataset evaluator (eval_dataset.py)
New: src/lerobot/policies/rtc/remote.py — Shared data classes for client-server protocol (RTCObservationData, RTCActionData, RTCTimingData)
New: src/lerobot/policies/rtc/profiling.py — Per-request profiling with parquet export and matplotlib plots
Fix: src/lerobot/policies/rtc/action_queue.py — When no actions consumed during inference, return delay=0 instead of latency estimate (fixes first-movement jerk where 210/300 actions were skipped); merge() now returns int instead of None
Fix: src/lerobot/policies/pi05/modeling_pi05.py — Create inference_delay/execution_horizon tensors on model device (fixes CUDA graph warnings); normalize RTC inputs before torch.compile boundary to prevent recompilation
Fix: src/lerobot/policies/rtc/modeling_rtc.py — Rewrite guidance as pure tensor ops (no autograd, no .item()), compile-friendly get_prefix_weights, proper device/dtype propagation
Fix: pyproject.toml — RTX 5090 cu128 support, version caps for torch/torchcodec compatibility, uv resolver conflict reduction (19→4)

How was this tested

Real robot evaluation: Pi0.5 on Trossen Mobile (dual arm), 30fps, 40s
Smooth first movement confirmed
Profiling: 84 requests, server inference P50=184ms, client E2E P50=322ms, steady queue=35
torch.compile reduce-overhead mode: 309ms first inference → 184ms steady state

# Start server on GPU machine
python examples/remote_rtc/rtc_policy_server.py --host=0.0.0.0 --port=8080

# Run robot client
python examples/remote_rtc/eval_with_real_robot.py \
    --server_address=<GPU_IP>:8080 \
    --policy_type=pi05 \
    --pretrained_name_or_path=<checkpoint_path> \
    --robot.type=<robot_type> \
    --task="Your task" \
    --rtc.enabled=true \
    --rtc.execution_horizon=20 \
    --use_torch_compile=true \
    --enable_profiling=true

Copilot

Pull request overview

Adds a gRPC-based client/server path for running RTC-enabled policy inference remotely (GPU server) while keeping robot control on lightweight hardware, plus several RTC/PI0.5 compile & action-queue fixes and dependency updates.

Changes:

Introduces remote RTC protocol dataclasses + profiling utilities for per-request timing artifacts.
Adds remote RTC example server/client/dataset evaluator scripts using the new protocol.
Fixes RTC action queue delay handling and PI0.5 RTC/torch.compile compatibility; updates torch/torchvision/torchcodec + uv CUDA index config.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/lerobot/policies/rtc/remote.py	Defines pickled dataclasses used as the remote RTC request/response protocol.
src/lerobot/policies/rtc/profiling.py	Adds profiling record storage plus parquet export and matplotlib plotting.
src/lerobot/policies/rtc/modeling_rtc.py	Makes RTC guidance more torch.compile-friendly and adjusts device/dtype handling.
src/lerobot/policies/rtc/action_queue.py	Adds `clear()` and changes `merge()` to return the applied delay; fixes delay selection logic.
src/lerobot/policies/pi05/modeling_pi05.py	Normalizes RTC inputs pre-compile boundary and makes action queue thread-local.
pyproject.toml	Adjusts torch/torchvision/torchcodec/transformers bounds and adds uv cu128 index configuration.
examples/rtc/eval_with_real_robot.py	Adds compile warmup + compile caching toggles; refactors preprocessing in the RTC demo.
examples/remote_rtc/rtc_policy_server.py	New RTC inference server (gRPC) supporting RTC parameters + optional torch.compile.
examples/remote_rtc/eval_with_real_robot.py	New robot-side remote client with action queue management and optional profiling.
examples/remote_rtc/eval_dataset.py	New dataset-based remote RTC evaluator + plotting/profiling hooks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-10T18:02:35Z

examples/remote_rtc/rtc_policy_server.py

+        client_id = context.peer()
+        policy_specs = pickle.loads(request.data)  # nosec
+
+        if not isinstance(policy_specs, RTCRemotePolicyConfig):
+            raise TypeError(f"Expected RTCRemotePolicyConfig, got {type(policy_specs)}")
+


This server deserializes client-provided bytes with pickle.loads(). Pickle is unsafe against untrusted input (remote code execution), so running this on an open network is risky. If this is intended beyond trusted LAN/dev usage, consider switching the request/response payloads to protobuf (or another non-executable format) and/or adding explicit authentication + message validation before deserialization.

This is for trusted servers only

Copilot · 2026-03-10T18:02:35Z

examples/remote_rtc/rtc_policy_server.py

+        self.device = policy_specs.device
+        self.policy_type = policy_specs.policy_type
+        self.lerobot_features = policy_specs.lerobot_features
+
+        # Load policy
+        self._unload_policy(reason="replacing_existing_policy")


In SendPolicyInstructions, self.device/self.policy_type/self.lerobot_features are set before calling _unload_policy(), but _unload_policy() clears these attributes when a policy is already loaded. This will make get_policy_class(self.policy_type) receive None (and self.policy.to(self.device) use None) on subsequent client connections / reconfiguration. Set new values after unloading, or keep them in locals and only assign to self.* once the previous policy has been unloaded.

Suggested change

self.device = policy_specs.device

self.policy_type = policy_specs.policy_type

self.lerobot_features = policy_specs.lerobot_features

# Load policy

self._unload_policy(reason="replacing_existing_policy")

# Load policy

self._unload_policy(reason="replacing_existing_policy")

# Set new configuration after unloading any existing policy to avoid it being cleared.

self.device = policy_specs.device

self.policy_type = policy_specs.policy_type

self.lerobot_features = policy_specs.lerobot_features

Copilot · 2026-03-10T18:02:36Z

examples/remote_rtc/eval_with_real_robot.py

+        except Exception as e:
+            logger.error(f"[GET_ACTIONS] Fatal error: {e}")
+            traceback.print_exc()
+            sys.exit(1)


sys.exit(1) inside a worker thread only terminates that thread (raises SystemExit there) and typically does not stop the whole process. If this is intended to be a fatal error, signal the main loop via self.shutdown_event/ProcessSignalHandler, and let the main thread exit (or use os._exit(1) as a last resort).

Copilot · 2026-03-10T18:02:36Z

examples/remote_rtc/eval_with_real_robot.py

+        except Exception as e:
+            logger.error(f"[ACTOR] Fatal error: {e}")
+            traceback.print_exc()
+            sys.exit(1)


Same issue here: sys.exit(1) in a non-main thread won't necessarily stop the program, so fatal errors in the actor thread may leave the process running in a bad state. Prefer setting a shutdown flag / raising to the main thread so cleanup and termination happen deterministically.

Copilot · 2026-03-10T18:02:36Z

src/lerobot/policies/pi05/modeling_pi05.py

+        inference_delay: int | None = None,
+        prev_chunk_left_over: Tensor | None = None,
+        execution_horizon: int | None = None,


PI05Policy.predict_action_chunk converts inference_delay / execution_horizon to torch.Tensor for torch.compile stability, but PI05Pytorch.sample_actions still types these parameters as int | None. Update the annotations (e.g., int | Tensor | None) to reflect actual supported inputs and avoid type-checking / reader confusion.

Suggested change

inference_delay: int | None = None,

prev_chunk_left_over: Tensor | None = None,

execution_horizon: int | None = None,

inference_delay: int | Tensor | None = None,

prev_chunk_left_over: Tensor | None = None,

execution_horizon: int | Tensor | None = None,

denis-grachev added 7 commits March 10, 2026 15:43

remote rtc + little fixes on rtc and pi05

58f6cbb

add torch compile args to eval scripts

68af8d3

fix some delays things

a11fa05

more delay tricks

a83fcbe

more delay fixes

6fff044

fix loggings

a6d40d3

fix lint + use init_logging in server

18b7848

Copilot AI review requested due to automatic review settings March 10, 2026 17:46

github-actions bot added policies Items related to robot policies examples Issues related to the examples labels Mar 10, 2026

Copilot started reviewing on behalf of grach0v March 10, 2026 17:46 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote rtc#3125

Remote rtc#3125
grach0v wants to merge 7 commits intohuggingface:mainfrom
grach0v:remote_rtc

grach0v commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

grach0v Mar 11, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grach0v commented Mar 10, 2026

feat(rtc): remote inference system + action queue delay fix

Type / Scope

Summary / Motivation

What changed

How was this tested

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

grach0v Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants