Basic RL workflow example #287

viiik-inside · 2025-12-12T15:42:54Z

Summary

One RL training environment.

NOTE: Moved up from draft as the RL pipeline is running. Some changes to the code is imminent at this state, but it is a very good reference point.

## Summary Add signoff rules to the contribution guide. ## Detailed description - Require that external contributions are signed off.

## Summary Add yaml anchors to remove repetition in ci workflow. ## Detailed description - We have several jobs in our workflow which share common steps. - This MR eliminates the repetition of those steps by using yaml anchors. - In some cases small differences in the steps were eliminated by unifying the steps.

## Summary Generate finetuned gr00t locomanip model on the fly in tests. ## Detailed description - Mounting models dir in CI is problematic because it doesn't work for public CI. - Move from mounting pre-finetuned models dir, to generating a fine-tuned model before running the tests. - Also has the effect that we test the fine-tuning pipeline.

## Summary Updates the G1 WBC embodiment file paths to use S3

## Summary Patch breakage to Isaac Lab 2.3 due to `qpsolvers` upgrade. ## Detailed description - Manually downgrade `qpsolvers` to `4.8.1` while we wait for a proper fix from Isaac Lab. ## Todo - Back this change out when an official fix is available.

## Summary Move description of private omniverse access to a separate page. ## Detailed description - By moving is off the installation page we make it clear that this is not normally required. - Addresses nvbug: [5709098](https://nvbugspro.nvidia.com/bug/5709098)

## Summary Remove comments which were apparently breaking the parallel eval run commands. ## Detailed description - Fixes: nvbug: [5709051](https://nvbugspro.nvidia.com/bug/5709051)

## Summary Closedloop GN1.5 observed low SR in multi-episode rollout, esp in parallel-env where more contacts are being introduced / more episodes are being observed. ## Detailed description ### Static manip - Issue: At the beginning of episode, hands have close-open motions in recorded trajectories. Given microwave joint is not stiff enough, small deviations during first few inferences cause the door closed by mistake. And this closed door is hard to pull with static GR1, causing it to fail the task. [Screencast from 12-02-2025 03:16:36 PM.webm](https://github.com/user-attachments/assets/da06de60-8f01-47e7-ae26-a48e08cb523f) - Fix: a. Shorten task_episode_length_s to introduce more frequent resets once the door is closed. Tradeoff is introducing more episodes. b. Also tried with shorter `action_horizon` but ended up getting worse SR. My hypothesis is that it's hard for VLA to tell from visuals/states whether the door is closed to 0.2 (success) vs 0.21 (fail). > 16 -- Metrics: {'success_rate': 0.605, 'door_moved_rate': 0.955, 'num_episodes': 200} > 8 -- Metrics: {'success_rate': 0.225, 'door_moved_rate': 0.615, 'num_episodes': 200} > 1 -- Metrics: {'success_rate': 0.0, 'door_moved_rate': 0.985, 'num_episodes': 200} c. Switching to CPU PhyX does not solve above issues. So keep it on GPU for faster parallelization (in theory). ### Loco manip - Issue: After each reset, the left arm tends to have fast motions and the box is tilted. Also observed significant penetration among fingers. See 00:15 VS 0:30 for 5 parallel env closedloop in below video. [Screencast from 11-25-2025 03:42:36 PM.webm](https://github.com/user-attachments/assets/c4934817-65fa-412f-a88c-af143d25d7c2) - Fix: switch to CPU phyX, keep the policy on GPU Arms open first and G1 starts moving, box is placed with expected pose. [Screencast from 12-02-2025 10:15:59 PM.webm](https://github.com/user-attachments/assets/4a02e6cd-7baf-441b-8c0f-7146051e5c9a) ### Minor fixes Update doc on cmds & metrics.

## Summary Modify docs to show that this is manual annotation

alexmillane

A have a few comments. But looks good in general.

Thanks for doing it!

alexmillane · 2025-12-18T09:15:46Z

isaaclab_arena/embodiments/franka/franka.py

+    def get_command_body_name(self) -> str:
+        return self.action_config.arm_action.body_name


Can we add a small docstring here. It's not clear what "command_bocy_frame" means.

isaaclab_arena/embodiments/franka/franka.py

isaaclab_arena/embodiments/gr1t2/gr1t2.py

isaaclab_arena/embodiments/franka/franka.py

isaaclab_arena/embodiments/embodiment_base.py

isaaclab_arena/tasks/rewards/rewards.py

isaaclab_arena/tasks/lift_object_task.py

isaaclab_arena/tasks/terminations.py

alexmillane · 2025-12-18T13:30:39Z

isaaclab_arena_environments/lift_object_environment.py

+        # NOTE(alexmillane, 2025.09.04): We need a teleop device argument in order
+        # to be used in the record_demos.py script.
+        parser.add_argument("--teleop_device", type=str, default=None)
+        # Note (xinjieyao, 2025.10.06): Add the embodiment argument for PINK IK EEF control or Joint positional control


viiik-inside and others added 30 commits November 20, 2025 19:36

Add reward function for press button task

8179602

initial go at rl workflow

f5de4bf

merge with main

afc671a

RL task example

112bfaa

Create open door rl example

6617ea1

Bring in lift object environment

74e53f2

Partially working lift task

1cd406c

modified lift object environment

c9c548a

Add signoff rules to the contribution guide. (#238)

6b54c14

## Summary Add signoff rules to the contribution guide. ## Detailed description - Require that external contributions are signed off.

Update G1 WBC file paths to S3 (#251)

e3037ab

## Summary Updates the G1 WBC embodiment file paths to use S3

Add support for enabling ground plane and lights

17b317c

Add spawner cfg

ae12884

Add working ground plane and lights

8a628b5

Remove unintended letter

ffbc0eb

Change tag

8f15f47

Add lights and ground plane to test

a15cf47

Fix issues with qpsolvers. (#252)

58bc56d

## Summary Patch breakage to Isaac Lab 2.3 due to `qpsolvers` upgrade. ## Detailed description - Manually downgrade `qpsolvers` to `4.8.1` while we wait for a proper fix from Isaac Lab. ## Todo - Back this change out when an official fix is available.

Remove comment which was breaking the parallel eval command. (#262)

dc5e882

## Summary Remove comments which were apparently breaking the parallel eval run commands. ## Detailed description - Fixes: nvbug: [5709051](https://nvbugspro.nvidia.com/bug/5709051)

Add comment to show that this is manual annotation (#257)

e613ade

## Summary Modify docs to show that this is manual annotation

merge with main

6dbd31b

Merge branch 'main' into feature/rl

1042b93

RL basic training workflow

111be78

change object to background

53af546

remove succes term metrics for now

ad46de7

Use env spacing from arguments

048d5ec

Add object observation

7b2b625

viiik-inside and others added 16 commits December 4, 2025 16:12

Add observation function

0e56969

Change file name

55c434e

Add obs group to the policy

2412f40

Increase the num steps and save times

74c4e8b

merge with main

e337b8e

clean up hardcoded params

24d9200

Clean up extra changes

e70da73

make observation terms concat an option

d53bece

Make concat as true for rl robot

306512e

Merge branch 'main' into feature/rl

ee2c0de

Change funciton name

cd790ec

add metadata

d849b93

merge with main

28f3c06

pre commit update

c5b59eb

RL concept

8915c19

Merge branch 'main' into feature/rl

8c0aab3

yanchangNvidia self-assigned this Dec 16, 2025

viiik-inside and others added 7 commits December 17, 2025 10:36

update broken links

05beb6e

fixed training script

9468b3a

Merge branch 'main' into feature/rl

ffbe3bb

Add arguments of policy to cli

1d67c24

Merge branch 'main' into feature/rl

dd8f2de

remove get prompt method

18d21b4

Delete max iteration from train file

8cb80c7

alexmillane approved these changes Dec 18, 2025

View reviewed changes

viiik-inside added 3 commits December 18, 2025 14:42

merge with main

a514d7e

Tiny refactors with same concept

3a63497

bug fixes

8302680

viiik-inside merged commit e8e83e1 into main Dec 18, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Basic RL workflow example #287

Basic RL workflow example #287

Uh oh!

viiik-inside commented Dec 12, 2025 •

edited

Loading

Uh oh!

alexmillane left a comment

Uh oh!

alexmillane Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmillane Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		def get_command_body_name(self) -> str:
		return self.action_config.arm_action.body_name

Basic RL workflow example #287

Basic RL workflow example #287

Uh oh!

Conversation

viiik-inside commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

alexmillane left a comment

Choose a reason for hiding this comment

Uh oh!

alexmillane Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmillane Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

viiik-inside commented Dec 12, 2025 •

edited

Loading