-
Notifications
You must be signed in to change notification settings - Fork 5
Basic RL workflow example #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Summary Add signoff rules to the contribution guide. ## Detailed description - Require that external contributions are signed off.
## Summary Add yaml anchors to remove repetition in ci workflow. ## Detailed description - We have several jobs in our workflow which share common steps. - This MR eliminates the repetition of those steps by using yaml anchors. - In some cases small differences in the steps were eliminated by unifying the steps.
## Summary Generate finetuned gr00t locomanip model on the fly in tests. ## Detailed description - Mounting models dir in CI is problematic because it doesn't work for public CI. - Move from mounting pre-finetuned models dir, to generating a fine-tuned model before running the tests. - Also has the effect that we test the fine-tuning pipeline.
## Summary Updates the G1 WBC embodiment file paths to use S3
## Summary Patch breakage to Isaac Lab 2.3 due to `qpsolvers` upgrade. ## Detailed description - Manually downgrade `qpsolvers` to `4.8.1` while we wait for a proper fix from Isaac Lab. ## Todo - Back this change out when an official fix is available.
## Summary Move description of private omniverse access to a separate page. ## Detailed description - By moving is off the installation page we make it clear that this is not normally required. - Addresses nvbug: [5709098](https://nvbugspro.nvidia.com/bug/5709098)
## Summary Remove comments which were apparently breaking the parallel eval run commands. ## Detailed description - Fixes: nvbug: [5709051](https://nvbugspro.nvidia.com/bug/5709051)
## Summary Closedloop GN1.5 observed low SR in multi-episode rollout, esp in parallel-env where more contacts are being introduced / more episodes are being observed. ## Detailed description ### Static manip - Issue: At the beginning of episode, hands have close-open motions in recorded trajectories. Given microwave joint is not stiff enough, small deviations during first few inferences cause the door closed by mistake. And this closed door is hard to pull with static GR1, causing it to fail the task. [Screencast from 12-02-2025 03:16:36 PM.webm](https://github.com/user-attachments/assets/da06de60-8f01-47e7-ae26-a48e08cb523f) - Fix: a. Shorten task_episode_length_s to introduce more frequent resets once the door is closed. Tradeoff is introducing more episodes. b. Also tried with shorter `action_horizon` but ended up getting worse SR. My hypothesis is that it's hard for VLA to tell from visuals/states whether the door is closed to 0.2 (success) vs 0.21 (fail). > 16 -- Metrics: {'success_rate': 0.605, 'door_moved_rate': 0.955, 'num_episodes': 200} > 8 -- Metrics: {'success_rate': 0.225, 'door_moved_rate': 0.615, 'num_episodes': 200} > 1 -- Metrics: {'success_rate': 0.0, 'door_moved_rate': 0.985, 'num_episodes': 200} c. Switching to CPU PhyX does not solve above issues. So keep it on GPU for faster parallelization (in theory). ### Loco manip - Issue: After each reset, the left arm tends to have fast motions and the box is tilted. Also observed significant penetration among fingers. See 00:15 VS 0:30 for 5 parallel env closedloop in below video. [Screencast from 11-25-2025 03:42:36 PM.webm](https://github.com/user-attachments/assets/c4934817-65fa-412f-a88c-af143d25d7c2) - Fix: switch to CPU phyX, keep the policy on GPU Arms open first and G1 starts moving, box is placed with expected pose. [Screencast from 12-02-2025 10:15:59 PM.webm](https://github.com/user-attachments/assets/4a02e6cd-7baf-441b-8c0f-7146051e5c9a) ### Minor fixes Update doc on cmds & metrics.
## Summary Modify docs to show that this is manual annotation
alexmillane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A have a few comments. But looks good in general.
Thanks for doing it!
| def get_command_body_name(self) -> str: | ||
| return self.action_config.arm_action.body_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a small docstring here. It's not clear what "command_bocy_frame" means.
| # NOTE(alexmillane, 2025.09.04): We need a teleop device argument in order | ||
| # to be used in the record_demos.py script. | ||
| parser.add_argument("--teleop_device", type=str, default=None) | ||
| # Note (xinjieyao, 2025.10.06): Add the embodiment argument for PINK IK EEF control or Joint positional control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
Summary
One RL training environment.
NOTE: Moved up from draft as the RL pipeline is running. Some changes to the code is imminent at this state, but it is a very good reference point.