Skip to content

feat: environment abstraction #1031

@cwing-nvidia

Description

@cwing-nvidia

Use cases, pain points, and background
NeMo Gym has no formal abstraction for an "environment." Today, an environment is implicitly the combination of a dataset, an agent server, and a resources server, but this is never expressed as a first-class abstraction. This results in several points of friction:

  • Users can't easily answer "what environments exist?" The closest thing to an environment catalog is a script that generates a README table of resources servers. But a resources server is not an environment. workplace_assistant is a resources server that pairs with simple_agent. mini_swe_agent has a stub resources server (returns reward=1.0) with all real logic in a custom agent. harbor_agent and swe_agents have no resources server at all. There is no unified way to discover, inspect, or understand what environments are available.
  • Users can't answer "how do I build a new environment?" We have a CLI command to init a new resources server, but the environment scope is broader than that and includes an agent server, but new users may not realize this gap.

Description:
Introduce a first-class Environment abstraction that:

  1. Defines what an environment is - which interaction pattern it uses (multi-step, multi-turn, custom), which verification strategy (programmatic, judge, reward model, composite), which datasets it has, and how these components wire together.
  2. Normalizes all existing environments - every benchmark (including agent-only ones like harbor and SWE) should be expressible through the abstraction
  3. Provides composable building blocks - we should maintain flexibility for how to compose orchestration logic, verifier design etc but make the integration more explicit
  4. Integrates with the environment registry so that all environments are discoverable through a single catalog

Design:
What files should be touched? What logic should be written?

Out of scope:
What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?

Acceptance Criteria:

  • Individual items that need to be finished in order for this issue to be considered completed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions