Newton: push_by_setting_velocity causes training divergence (angular velocity frame mismatch)

## Bug

`push_by_setting_velocity` has no effect on the Newton physics backend. The velocity write doesn't propagate to MuJoCo-Warp's simulation state.

## Root Cause (confirmed)

### Problem 1: write_root_velocity_to_sim_index only updates display buffers

The Newton warp kernel `set_root_link_velocity_to_sim_index` (in `isaaclab_newton/assets/kernels.py`) writes to `root_link_velocity_w`, `root_com_velocity_w`, and state vectors — but **never writes to the solver's `joint_qd`**. MuJoCo-Warp reads velocities from `State.joint_qd` on each step, so the buffered values are display-only.

### Problem 2: model.joint_qd is not the live state

`NewtonManager.get_model().joint_qd` (ptr: 12998273536) is the **model defaults**, not the live simulation state. The live state is in `NewtonManager.get_state_0()` / `get_state_1()` (ptr: 12998294016), which are double-buffered and swapped each substep.

Writing to `model.joint_qd` has zero effect on the simulation. This was confirmed by:
- Setting `model.joint_qd[0] = 100.0` → robot didn't move
- Setting `get_state_0().joint_qd[0] = 10.0` → robot moved correctly (+0.05m at dt=0.005)

### Problem 3: Direct state writes cause GPU crashes

Writing to `get_state_0().joint_qd` during training (between simulation steps) causes CUDA context corruption after ~150-250 iterations, killing the GPU driver with "no CUDA-capable device" errors. This is likely due to the double-buffer state swap + CUDA graph capture making the memory unsafe to modify externally.

## Verified Behavior

| Test | Result |
|------|--------|
| Write to `model.joint_qd` | No effect (wrong buffer) |
| Write to `state_0.joint_qd` (single step) | Works correctly |
| Write to `state_0.joint_qd` (training loop) | Crashes GPU after ~150 iters |
| Stock `write_root_velocity_to_sim_index` | Writes to display buffers only |
| `ArticulationView.set_root_velocities(state, values)` | Untested (GPU crashed before we could try) |

## Impact

All domain randomization events that modify velocities (`push_by_setting_velocity`) or body properties (`randomize_rigid_body_material`, `randomize_rigid_body_com`, `randomize_rigid_body_mass`) may be silently ineffective on Newton if they write to model arrays rather than live state.

This means **training results with "all-DR" on Newton are identical to no-DR** — confirmed by matching rewards and 0% fall rates across both configurations.

## Reproduction

```python
# This has NO effect:
model = NewtonManager.get_model()
wp.to_torch(model.joint_qd)[0] = 100.0  # writes to model defaults
sim.step()  # robot doesn't move

# This works for one step but crashes in training:
state = NewtonManager.get_state_0()
wp.to_torch(state.joint_qd)[0] = 10.0  # writes to live state
sim.step()  # robot moves
```

## Suggested Fix

The velocity write kernels need to also write to `State.joint_qd` (the solver's velocity array). This requires:

1. Passing the live `State` object to the kernel
2. Computing the body-frame angular velocity (`quat_rotate_inv`) for MuJoCo's qvel convention
3. Properly handling the double-buffer swap to avoid race conditions
4. Or: provide a safe `set_velocity` API on `NewtonManager` that schedules the write for the correct point in the simulation step

The `ArticulationView.set_root_velocities(state, values)` API exists and writes to `joint_qd` via `_set_attribute_values`, but needs the correct live state passed as the target.

## Environment

- IsaacLab v3.0.0-beta
- Isaac Sim 6.0 (from source, `v6.0.0-dev2`)
- Newton/MuJoCo-Warp backend
- RTX 5090, Ubuntu 24.04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newton: push_by_setting_velocity causes training divergence (angular velocity frame mismatch) #5064

Bug

Root Cause (confirmed)

Problem 1: write_root_velocity_to_sim_index only updates display buffers

Problem 2: model.joint_qd is not the live state

Problem 3: Direct state writes cause GPU crashes

Verified Behavior

Impact

Reproduction

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	Result
Write to `model.joint_qd`	No effect (wrong buffer)
Write to `state_0.joint_qd` (single step)	Works correctly
Write to `state_0.joint_qd` (training loop)	Crashes GPU after ~150 iters
Stock `write_root_velocity_to_sim_index`	Writes to display buffers only
`ArticulationView.set_root_velocities(state, values)`	Untested (GPU crashed before we could try)

Newton: push_by_setting_velocity causes training divergence (angular velocity frame mismatch) #5064

Description

Bug

Root Cause (confirmed)

Problem 1: write_root_velocity_to_sim_index only updates display buffers

Problem 2: model.joint_qd is not the live state

Problem 3: Direct state writes cause GPU crashes

Verified Behavior

Impact

Reproduction

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions