Skip to content

Option to remove navigation/embodiment requirements for a task #4

@PeterAJansen

Description

@PeterAJansen

We've had an external feature request on whether it's possible to remove the navigation/embodiment (e.g. object manipulation) aspects of tasks through setting a flag (e.g. embodiment = false) to try to distill the scientific discovery aspects of the tasks from other skills (even more so than the unit tests do).

Adding this as a feature request so we can start a thread on how this might be accomplished, since there are a number of implementation routes/challenges:

  • Navigation: Currently we have 'teleport_to_location' and 'teleport_to_object' actions that greatly reduce the navigation needs of an agent. What might it look like to have zero navigation needs, i.e. not needing to move, search for items, etc? Would this look like, e.g., being able to see and manipulate all objects in the environment, regardless of the agent's current location (i.e. an omniscient agent)? If so, this changes the model (e.g. it's no longer a partially-observable environment).
  • Object manipulation: It's not immediately clear to me how to remove these requirements -- e.g. picking up objects that you need, etc. -- unless we also make the agent able to interact with every object regardless of its location (i.e. no checks for whether the objects that an action is performed on are accessible), which might be a simple change.

Some of the challenges here are:

  • Would these be faithful tests of removing navigation/object manipulation requirements, while maintaining discovery task difficulty? If not, what might alternate modifications be?
  • Observation: If the agent is omniscient and able to view many more(/all) objects in the environment, suddenly the size of the observation from the environment might become very large if it has to enumerate ~1000 objects, instead of only the objects within a short distance from the agent. That would definitely increase the load on an LLM/other agent model.
  • There are tasks that have steps that are spatial in nature (e.g. rocket science), and it's not clear how these modifications would translate.
  • How do we maintain a faithful representation of the visual output, if the agent isn't moving/can manipulate everything? Possibly teleport it to the last object that it interacted with, and use that as the visual observation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions