Skip to content

Newton: push_by_setting_velocity causes training divergence (angular velocity frame mismatch) #5064

@ax-anoop

Description

@ax-anoop

Bug

push_by_setting_velocity has no effect on the Newton physics backend. The velocity write doesn't propagate to MuJoCo-Warp's simulation state.

Root Cause (confirmed)

Problem 1: write_root_velocity_to_sim_index only updates display buffers

The Newton warp kernel set_root_link_velocity_to_sim_index (in isaaclab_newton/assets/kernels.py) writes to root_link_velocity_w, root_com_velocity_w, and state vectors — but never writes to the solver's joint_qd. MuJoCo-Warp reads velocities from State.joint_qd on each step, so the buffered values are display-only.

Problem 2: model.joint_qd is not the live state

NewtonManager.get_model().joint_qd (ptr: 12998273536) is the model defaults, not the live simulation state. The live state is in NewtonManager.get_state_0() / get_state_1() (ptr: 12998294016), which are double-buffered and swapped each substep.

Writing to model.joint_qd has zero effect on the simulation. This was confirmed by:

  • Setting model.joint_qd[0] = 100.0 → robot didn't move
  • Setting get_state_0().joint_qd[0] = 10.0 → robot moved correctly (+0.05m at dt=0.005)

Problem 3: Direct state writes cause GPU crashes

Writing to get_state_0().joint_qd during training (between simulation steps) causes CUDA context corruption after ~150-250 iterations, killing the GPU driver with "no CUDA-capable device" errors. This is likely due to the double-buffer state swap + CUDA graph capture making the memory unsafe to modify externally.

Verified Behavior

Test Result
Write to model.joint_qd No effect (wrong buffer)
Write to state_0.joint_qd (single step) Works correctly
Write to state_0.joint_qd (training loop) Crashes GPU after ~150 iters
Stock write_root_velocity_to_sim_index Writes to display buffers only
ArticulationView.set_root_velocities(state, values) Untested (GPU crashed before we could try)

Impact

All domain randomization events that modify velocities (push_by_setting_velocity) or body properties (randomize_rigid_body_material, randomize_rigid_body_com, randomize_rigid_body_mass) may be silently ineffective on Newton if they write to model arrays rather than live state.

This means training results with "all-DR" on Newton are identical to no-DR — confirmed by matching rewards and 0% fall rates across both configurations.

Reproduction

# This has NO effect:
model = NewtonManager.get_model()
wp.to_torch(model.joint_qd)[0] = 100.0  # writes to model defaults
sim.step()  # robot doesn't move

# This works for one step but crashes in training:
state = NewtonManager.get_state_0()
wp.to_torch(state.joint_qd)[0] = 10.0  # writes to live state
sim.step()  # robot moves

Suggested Fix

The velocity write kernels need to also write to State.joint_qd (the solver's velocity array). This requires:

  1. Passing the live State object to the kernel
  2. Computing the body-frame angular velocity (quat_rotate_inv) for MuJoCo's qvel convention
  3. Properly handling the double-buffer swap to avoid race conditions
  4. Or: provide a safe set_velocity API on NewtonManager that schedules the write for the correct point in the simulation step

The ArticulationView.set_root_velocities(state, values) API exists and writes to joint_qd via _set_attribute_values, but needs the correct live state passed as the target.

Environment

  • IsaacLab v3.0.0-beta
  • Isaac Sim 6.0 (from source, v6.0.0-dev2)
  • Newton/MuJoCo-Warp backend
  • RTX 5090, Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions