Use cases, pain points, and background
NeMo Gym has no formal abstraction for an "environment." Today, an environment is implicitly the combination of a dataset, an agent server, and a resources server, but this is never expressed as a first-class abstraction. This results in several points of friction:
- Users can't easily answer "what environments exist?" The closest thing to an environment catalog is a script that generates a README table of resources servers. But a resources server is not an environment.
workplace_assistant is a resources server that pairs with simple_agent. mini_swe_agent has a stub resources server (returns reward=1.0) with all real logic in a custom agent. harbor_agent and swe_agents have no resources server at all. There is no unified way to discover, inspect, or understand what environments are available.
- Users can't answer "how do I build a new environment?" We have a CLI command to init a new resources server, but the environment scope is broader than that and includes an agent server, but new users may not realize this gap.
Description:
Introduce a first-class Environment abstraction that:
- Defines what an environment is - which interaction pattern it uses (multi-step, multi-turn, custom), which verification strategy (programmatic, judge, reward model, composite), which datasets it has, and how these components wire together.
- Normalizes all existing environments - every benchmark (including agent-only ones like harbor and SWE) should be expressible through the abstraction
- Provides composable building blocks - we should maintain flexibility for how to compose orchestration logic, verifier design etc but make the integration more explicit
- Integrates with the environment registry so that all environments are discoverable through a single catalog
Design:
What files should be touched? What logic should be written?
Out of scope:
What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?
Acceptance Criteria:
Use cases, pain points, and background
NeMo Gym has no formal abstraction for an "environment." Today, an environment is implicitly the combination of a dataset, an agent server, and a resources server, but this is never expressed as a first-class abstraction. This results in several points of friction:
workplace_assistantis a resources server that pairs withsimple_agent.mini_swe_agenthas a stub resources server (returns reward=1.0) with all real logic in a custom agent.harbor_agentandswe_agentshave no resources server at all. There is no unified way to discover, inspect, or understand what environments are available.Description:
Introduce a first-class Environment abstraction that:
Design:
What files should be touched? What logic should be written?
Out of scope:
What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?
Acceptance Criteria: