Releases: NVIDIA-NeMo/Gym
Releases · NVIDIA-NeMo/Gym
v0.2.1
v0.2.0
Release Summary
NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:
- 17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
- Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
- End-to-end rollout collection with a locally managed vLLM server
- Install directly from PyPI with pip install nemo-gym
First-Time Contributors
We welcomed 15 new contributors to this release! Here are a few highlights:
- @sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
- @3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
- @Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation
Thank you to all the new contributors for helping make NeMo Gym better!
Major Features & Improvements
New Environments
- Added 17 new resources servers spanning:
- Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
- Math: Lean4 Mathematical Proofs (#563)
- Science: Aviary (#55), NewtonBench (#650)
- Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
- Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
- Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
- RLHF: Generative Reward Model Compare (#674)
- Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)
Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments
Model Serving
- Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
- vLLM 0.16+ support for the reasoning field in responses (#816)
- VLLMModel chat template kwargs support (#538, #636)
- Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)
Rollout Collection & Profiling
- New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
- CPU profiling for rollout performance analysis (#763)
- Add option for seeding on num_repeats for rollouts (#740)
Infrastructure & Developer Experience
- PyPI compatibility: install via
pip install nemo-gym(#649) - Dry run mode:
ng_run +dryrun=trueto validate configs and install environments without starting servers (#743) ng_statuscommand to list running servers and their health (#290)- Server stdout/stderr redirection with server name prefixes (#703)
- FastAPI worker support for higher throughput across multiple workers (#566)
Model Recipes
Deprecation Notices
- Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.
Bug Fixes
- Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
- Fixed crash when server receives malformed JSON during rollout collection (#770)
- Fixed dry run mode failing (#746)
- Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
- Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
- Fixed reward profiling failing when model response doesn't include usage stats (#824)
- Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
- Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
- ng_dump_config now redacts API key values from output (#567)
Documentation
- New training tutorials: Unsloth training with NeMo Gym, multi-environment training
- New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
- Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
- Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
- Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
- Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
- Infrastructure: added SWE RL infrastructure case study, deployment topology docs
- Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes
Looking Ahead
- VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
- Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
- Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
- Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling
Release Assets
GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super
What's Changed
- Bump to v0.2.0 by @bxyu-nvidia in #510
- reasoning-gym resource server by @cmunley1 in #113
- docs: redirect setup by @lbliii in #513
- docs: Miscellaneous GRPO tutorial fixes by @bxyu-nvidia in #512
- docs settings update by @lbliii in #525
- Debug server package versions by @fsiino-nvidia in #406
- List running server health and status by @fsiino-nvidia in #290
- VLLMModel supports chat template kwargs by @pjin-nvidia in #538
- Salesforce xlam-function-calling-60k resources server by @cmunley1 in #262
- python flag for colab venv installation by @cmunley1 in #526
- add unsloth and trl to docs by @cmunley1 in #536
- docs: remove trl docs by @cmunley1 in #543
- Remove PlainTextResponse response_class by @fsiino-nvidia in https://github.com/NVIDIA-N...
v0.1.1
What's Changed
- Bump package info for v0.2.0 by @bxyu-nvidia in #337
- fix: Update incorrect path in docs: library_judge_math -> math_with_j… by @shashank3959 in #355
- Update secret detector to work with forks by @chtruong814 in #358
- Removed reference to gitlab master by @hwolff99 in #377
- Mark experimental tutorials by @bxyu-nvidia in #386
- docs: experimental label by @lbliii in #391
- Fixed typos by @hwolff99 in #400
- Readme dataset discoverability cont by @fsiino-nvidia in #344
- Add absolute ip for multi node by @sdevare-nv in #286
- docs: removed "How to Navigate" section from concepts by @ahmadki in #414
- docs: Fixed image embedding in core abstractions page by @ahmadki in #410
- docs: Fixed Licensing information in structured outputs by @ahmadki in #412
- docs: Added hyperlinks to github repo in docs by @ahmadki in #413
- docs: Add software / hardware requirements to README and docs. by @ffrujeri in #401
- docs: Cleaned the "Quick Start" section in the README by @ahmadki in #411
- Display system and version info by @fsiino-nvidia in #347
- docs: Improve language around resources servers. by @ffrujeri in #408
- docs: Add Create Resource Server Tutorial by @ffrujeri in #407
- miniswe w/ offline uv by @sdevare-nv in #357
- update vllm model comments by @cmunley1 in #423
- docs: linked several terms to their defenition in glossary by @ahmadki in #424
- docs: Explain why GPT-4 is used and clarify support for other models by @ahmadki in #425
- Removed internal section by @hwolff99 in #430
- docs: various improvements and fixes by @ahmadki in #415
- docs: Relate sections Get Started and Rollout Collection by @fsiino-nvidia in #426
- Guide user on next steps after finishing get started by @cwing-nvidia in #435
- Add placeholder author by @jkyi-nvidia in #440
- Clarify training environment framing and align docs messaging by @cwing-nvidia in #438
- docs: Added CLI documentation by @ahmadki in #444
- Change NeMo Gym from framework to library by @cwing-nvidia in #456
- Add Data Designer and links to ecosystem page by @cwing-nvidia in #462
- docs: Moved configuration system under about by @ahmadki in #420
- Add benefits to About page aligned with README by @cwing-nvidia in #452
- Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material by @bxyu-nvidia in #470
- add calendar env for multi-turn IF by @sanjaykariyappa in #297
- docs(readme): fix Example Resource Servers table - correct Multi Step… by @lbliii in #464
- Remove penguin references by @ahmadki in #469
- docs: Training framework integration by @bxyu-nvidia in #439
- Bug: inconsistent documentation around servers running by @bxyu-nvidia in #472
- docs: Improve server reference info by @bxyu-nvidia in #474
- pyproject typos and grammar fixes by @ahmadki in #473
- Miscellaneous infra improvements/fixes by @pjin-nvidia in #317
- Expose server host and port in dataset viewer CLI by @ahmadki in #476
- Rename examples simple_weather and stateful_counter by @fsiino-nvidia in #479
- More single tool call filename updates by @fsiino-nvidia in #480
- docs: Fix wrong count vs actual by @fsiino-nvidia in #482
- Fix duplicate reference sections by @bxyu-nvidia in #483
- docs: home pg, quickstart move, gh icon by @lbliii in #463
- More single tool call filename updates cont by @fsiino-nvidia in #484
- Fix NeMo Gym Pyproject links by @bxyu-nvidia in #486
- docs: move FAQ by @lbliii in #489
- docs: contribute section by @lbliii in #490
- Misc rollout fixes by @pjin-nvidia in #447
- improve framing of training framework integration guide for contributing by @cwing-nvidia in #493
- Docs: Contribution Home & Dev Setup by @cwing-nvidia in #494
- Add environment contribution docs by @cwing-nvidia in #498
- FAQ cleanup by @cwing-nvidia in #499
- Simplify contributing.md by @cwing-nvidia in #500
- Reorder README structure by @cwing-nvidia in #501
- docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] by @bxyu-nvidia in #481
- Update dataset configs with HuggingFace links by @bxyu-nvidia in #508
- Change to v0.1.1 release version by @bxyu-nvidia in #509
New Contributors
- @shashank3959 made their first contribution in #355
- @hwolff99 made their first contribution in #377
- @ahmadki made their first contribution in #414
- @ffrujeri made their first contribution in #401
- @sanjaykariyappa made their first contribution in #297
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- Add copy-pr-bot by @chtruong814 in #1
- Add initial repo template by @chtruong814 in #2
- Update GitHub with Gitlab main by @bxyu-nvidia in #3
- Alias as Penguin by @bxyu-nvidia in #4
- Add Copyright docs README FAQ by @bxyu-nvidia in #7
- Dapo17k by @bxyu-nvidia in #6
- Fix docs build failures by @bxyu-nvidia in #8
- Fix docs by @bxyu-nvidia in #10
- Improve Github SSH Key setup docs by @bxyu-nvidia in #12
- Comp-Coding Verifier by @kbhardwaj-nvidia in #5
- Dataset viewer simple aggregations by @fsiino-nvidia in #9
- VLLMModel docs in main Readme by @bxyu-nvidia in #13
- Fix agent name in docs by @bxyu-nvidia in #15
- VLLMModel propogates token IDs by @bxyu-nvidia in #11
- VLLMModel tokenize params cleanup by @bxyu-nvidia in #21
- Update Comp-Coding README.md by @kbhardwaj-nvidia in #26
- Docs improvements - remove Why NeMo Gym section and add CI/CD tests info by @bxyu-nvidia in #27
- update server logging format to be more consistent by @cmunley1 in #22
- update readmes from ng_collect_traj to ng_collect_rollouts by @cmunley1 in #25
- Simple agent stop criteria requires no tool calls AND output message item to be present by @bxyu-nvidia in #19
- Server spinup polling by @bxyu-nvidia in #31
- Rename top-level config key 'openai_model' => 'policy_model' by @pjin-nvidia in #33
- Simple agent allows non-json tool responses by @bxyu-nvidia in #35
- Multi-verifier docs by @bxyu-nvidia in #36
- Servers have easy hooks into individual instances via session by @bxyu-nvidia in #24
- Add Math Stack Overflow dataset by @damon-mosk-aoyama-nvidia in #42
- Add Workbench validation dataset by @bxyu-nvidia in #46
- Docs update by @bxyu-nvidia in #47
- Implements LLM-as-Judge for Response Equivalence by @soares-f in #16
- Configure global httpx client by @pjin-nvidia in #50
- Fix OpenAI ResponseReasoningItem.status property by @bxyu-nvidia in #54
- VLLMModel data parallel; explicit RunHelper shutdown handle by @bxyu-nvidia in #52
- removed simple_agent_stateful, uses fastapi to keep track of session by @RahulSChand in #44
- Migrate text_based_game: sudoku and game agent features by @RahulSChand in #30
- Revert "Migrate text_based_game: sudoku and game agent features" by @bxyu-nvidia in #65
- Add data aggregations to data preparation by @fsiino-nvidia in #49
- Instantiate one httpx async client per unique connection / base url by @bxyu-nvidia in #75
- Swap async http backend from httpx to aiohttp; various server infra improvements by @bxyu-nvidia in #77
- Remove unnecessary GHA CI and add uv config to enable dependency scanning by @chtruong814 in #66
- VLLMModel fix whitespace stripping and unwarranted spaces by @bxyu-nvidia in #70
- Fix aggregation rounding in ng_prepare_data by @fsiino-nvidia in #76
- Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering by @bxyu-nvidia in #79
- Delete .github/ISSUE_TEMPLATE directory by @pablo-garay in #87
- Add support for
num_repeatsby @MahanFathi in #99 - Comp coding fixes; lots of misc infra items by @bxyu-nvidia in #90
- chore: Update cherry-pick workflow to use v0.63.0 by @pablo-garay in #108
- Make Workbench stateful and sign commits by @abhibha-nvidia in #110
- Clean deprecated Comp coding by @bxyu-nvidia in #106
- Bxyu/misc infra 20251001 by @bxyu-nvidia in #116
- Resource Server Organization by @fsiino-nvidia in #80
- Add metrics conflict error FAQ to Readme by @fsiino-nvidia in #93
- Azure OpenAI model support by @bxyu-nvidia in #112
- Use python env for precommit hook; alter files trigger by @fsiino-nvidia in #125
- Update issue templates by @bxyu-nvidia in #152
- Add back Nemo Framework templates by @bxyu-nvidia in #153
- Fix Workbench invalid function name by @bxyu-nvidia in #167
- VLLMModel enable reasoning parsing by @bxyu-nvidia in #129
- Add Attributions for Third Party Softwares by @banghuaz-nvidia in #154
- Fix infinite OpenAI endpoint query; misc improvements by @bxyu-nvidia in #171
- docs: Add Tutorial 00 - Key Terminology by @cwing-nvidia in #180
- docs: Add tutorial README with learning path structure by @cwing-nvidia in #177
- Redirect main Gym readme to Tutorials by @bxyu-nvidia in #201
- docs: Add Tutorial 01 - Understanding Core Concepts by @cwing-nvidia in #181
- docs: Add Tutorial 09 - Configuration Management by @cwing-nvidia in #183
- Add CODE_OF_CONDUCT.md for community guidelines by @cwing-nvidia in #148
- Add SECURITY.md with NVIDIA security policy by @cwing-nvidia in #149
- Make metrics conflict criteria less strict by @fsiino-nvidia in #150
- Move tutorials to docs by @bxyu-nvidia in #205
- docs: Replace README with improved version by @cwing-nvidia in #192
- Large docs improvement PR from @cwing-nvidia by @bxyu-nvidia in #208
- Add back How-To's and FAQs by @bxyu-nvidia in #209
- Docs fixes by @bxyu-nvidia in #210
- Improve CONTRIBUTING.md by @cwing-nvidia in #151
- feat (OpenQA): Add OpenQA support with per-record regex and rescue features by @psgundecha-nv in #155
- feat(mcqa): Add custom answer extraction via template_metadata to support STEM MCQA dataset by @psgundecha-nv in #128
- Add README to docs folder by @bxyu-nvidia in #216
- Ray comp coding infra by @sdevare-nv in #195
- Misc docs fixes by @bxyu-nvidia in #218
- CLI help and command help; misc improvements by @bxyu-nvidia in #229
- Misc infra 20251024 by @bxyu-nvidia in #234
- Fix ray version mismatch by @sdevare-nv in #231
- Misc fixes 20251027 by @bxyu-nvidia in #243
- Validate server port selection by @fsiino-nvidia in #233
- bxyu/misc-infra-20251027-001 by @bxyu-nvidia in #247
- Fix input assistant messages by @bxyu-nvidia in #248
- Misc infra 20251028 002 by @bxyu-nvidia in #253
- Structured Outputs JSON Environment by @jkyi-nvidia in #251
- Bump OpenAI version to 2.6.1; improve dependency constrain resolution by @bxyu-nvidia in #255
- Update missing header and attributions by @banghuaz-nvidia in #237
- Misc infra 20251031 by @bxyu-nvidia in #263
- Update math dataset examples and metrics by @damon-mosk-aoyama-nvidia in #265
- Misc infra 20251101 by @bxyu-nvidia in #267
- Almost-server detection and reporting by @fsiino-nvidia in #249
- Miniswe env by @sdevare-nv in #241
- Differentiate Example-only and Training Resource Servers...