Symbolic Embodied Reasoning Environments (SERE)

SERE is a lightweight framework for building symbolic planning environments for LLM training and evaluation. Agents solve PDDL planning problems using tool-based interaction in a sandboxed workspace.

Features

Native PDDL support -- load standard .pddl domain/problem files directly; 20 IPC benchmark domains included out of the box.
Agentic sandbox -- miniSWE-style environment where an LLM agent uses coding tools (bash, read_file, write_file, str_replace) plus PDDL tools (validate, simulate) to solve planning problems.
Pure plan validation -- standalone validator with step-by-step diagnostics, partial validation, and world state simulation.
Verifiers integration -- native MultiTurnEnv for use with the verifiers framework and prime eval.
YAML-defined domains -- for domains needing stochastic outcomes, energy management, or features beyond what PDDL expresses (kitchen, assembly).
World state engine -- maintains objects, facts, numeric fluents, and enforces invariants.
Derived predicates -- author higher-level semantics without extra code.
Numeric fluents, durations, energy -- model time, resources, and stochastic outcomes.
Reward shaping & termination rules -- milestones, potential-based shaping, and structured termination.
Multi-agent joint actions -- one action per robot, applied simultaneously.
Reference plans & regression tests -- validate domains and ensure backward compatibility.

Architecture

src/sere/
+-- core/
|   +-- world_state.py       # Facts, objects, fluents, invariants
|   +-- semantics.py         # Clause + numeric evaluation, traces
|   +-- invariants.py        # Generic + domain-specific plugins
|   +-- validator.py         # Pure plan validation engine
|   +-- agentic_env.py       # miniSWE-style sandbox (tools + workspace)
|   +-- pddl_env/            # RL-style environment
|       +-- env.py           # Env: reset/step/reward/done
|       +-- engine.py        # Action application, stochastic outcomes
|       +-- planning.py      # Parse/execute action blocks
|       +-- rendering.py     # Messages + obs stitching
|       +-- prompt_formatter.py
|       +-- run_mode.py      # interactive / batch / open_loop
|
+-- pddl/
|   +-- pddl_parser.py       # Native PDDL domain/problem parser
|   +-- domain_spec.py       # DomainSpec, ActionSpec, PredicateSpec
|   +-- grounding.py         # S-expression grounding engine
|   +-- sexpr.py             # S-expression tokenizer/parser
|
+-- io/
|   +-- task_loader.py       # Unified loader (YAML + PDDL dispatch)
|   +-- pddl_loader.py       # PDDL directory loader + agentic task factory
|
+-- assets/
    +-- domain/              # YAML domains (kitchen, assembly)
    +-- tasks/               # Task YAMLs (per domain)
    +-- pddl/                # Native PDDL domains (20 IPC benchmarks)
        +-- blocksworld/     #   domain.pddl + extensions.yaml + problems/
        +-- logistics/
        +-- sokoban/
        +-- ...

integrations/
+-- verifiers/               # Verifiers framework integration
    +-- vf_sere.py           # SereEnv + AgenticSereEnv (MultiTurnEnv)
    +-- dataset.py           # Task discovery and dataset building

Included Domains

PDDL domains (20)

Standard IPC benchmarks, loaded natively from .pddl files. Each has 5-20 graded problem instances.

Domain	Problems	Description
blocksworld	15	Stack blocks to match a goal configuration
miconic	15	Elevator: pick up and deliver passengers to floors
logistics	15	Move packages between cities via trucks and airplanes
gripper	20	Robot with two grippers moving balls between rooms
freecell	18	Card game: move cards to foundation piles
transport	15	Vehicles delivering packages across a road network
depots	22	Logistics with crates, hoists, and pallets
driverlog	20	Drivers walk/drive to deliver packages
zenotravel	20	Fly passengers between cities with fuel management
sokoban	18	Push boxes onto goal positions in a grid
barman	18	Bartender preparing cocktails with shakers
parking	20	Rearrange cars across curb/border positions
hiking	18	Couples hiking between waypoints with tents/cars
satellite	20	Point satellites and capture images
rovers	20	Mars rovers collecting and transmitting data
peg-solitaire	22	Jump pegs to clear the board
ferry	10	Ferry cars between locations
childsnack	15	Prepare and serve sandwiches to children
tpp	15	Travelling purchaser: buy and deliver goods
visitall	15	Visit every cell in a grid

YAML domains (2)

Extended domains with stochastic outcomes, energy management, and clutter generation.

Domain	Tasks	Features
kitchen	15	Stochastic actions, energy, numeric temps, derived predicates, multi-agent
assembly	7	Stochastic fastening, quality tracking, tool management, energy

Installation

git clone https://github.com/hallerite/SERE.git
cd SERE
uv sync

Requires Python 3.11+.

For the verifiers integration:

uv sync --extra verifiers

Agentic Sandbox (Primary Mode)

The agentic sandbox gives an LLM a real filesystem workspace with PDDL files and coding tools. The agent reads the domain/problem, writes a plan, and validates it -- just like a coding agent solving a programming task.

Workspace

workspace/
  domain.pddl   -- the planning domain (read-only)
  problem.pddl  -- the planning problem (read-only)
  plan.pddl     -- the agent writes its solution here

Tools

Tool	Description
`bash(command)`	Run shell commands in the workspace
`read_file(path)`	Read a file
`write_file(path, content)`	Create/overwrite a file
`str_replace(path, old_str, new_str)`	Targeted string replacement
`validate(up_to_step?)`	Validate plan.pddl (full plan = submission attempt)
`simulate(up_to_step?)`	Run plan.pddl and show resulting world state

Guards

domain.pddl and problem.pddl are read-only (restored if modified via bash)
Initial state is copied fresh for each validation (immutable)
Goal expression is immutable
Only full validate() counts as a submission attempt; partial validate and simulate are free

Usage

from sere.io.pddl_loader import load_agentic_task

env, meta = load_agentic_task("path/to/blocksworld", "path/to/instance-1.pddl")

# Get system prompt and tool schemas for the LLM
system_prompt = env.system_prompt()
tools = env.tool_schemas()

# Dispatch tool calls from the LLM
feedback, done = env.handle_tool_call("read_file", {"path": "domain.pddl"})
feedback, done = env.handle_tool_call("write_file", {
    "path": "plan.pddl",
    "content": "(pick-up A)\n(stack A B)\n",
})
feedback, done = env.handle_tool_call("validate", {})  # full submission

print(env.solved)    # True/False
print(env.attempts)  # number of full submissions
env.cleanup()        # remove temp workspace

Verifiers Integration

SERE provides two MultiTurnEnv implementations for the verifiers framework:

AgenticSereEnv -- agentic sandbox with tool use (recommended)
SereEnv -- pure PDDL prompting (single-turn plan generation)

With prime CLI

# Agentic (tool use)
prime eval run integrations.verifiers.vf_sere:load_agentic_environment \
  -m openai/gpt-4.1-mini -n 10 -r 1

# Pure PDDL prompting
prime eval run integrations.verifiers.vf_sere:load_environment \
  -m openai/gpt-4.1-mini -n 10

Programmatic

from integrations.verifiers import load_agentic_environment

env = load_agentic_environment(
    domains=["blocksworld", "gripper", "ferry"],
    num_tasks_per_domain=5,
    episodes_per_task=1,
    max_attempts=8,
)

See integrations/verifiers/README.md for full details.

Gym-Style API

For step-by-step RL-style interaction:

from sere.io.task_loader import load_task

env, meta = load_task(
    "src/sere/assets/pddl/blocksworld",
    "src/sere/assets/pddl/blocksworld/problems/instance-1.pddl",
)

obs, info = env.reset()
obs, reward, done, info = env.step("(pick-up A)")

YAML tasks (kitchen, assembly):

uv run python -m sere.cli.run_task kitchen/t01_one_step_steep.yaml

PDDL Domain Structure

Each PDDL domain lives in assets/pddl/<name>/:

blocksworld/
+-- domain.pddl         # Standard PDDL domain definition
+-- extensions.yaml     # Optional: reward shaping, metadata
+-- problems/
    +-- instance-1.pddl # Small (4 blocks)
    +-- instance-2.pddl
    +-- ...             # Graded by difficulty

Tests

uv run python -m pytest tests/ -q

438 tests, 0 failures.

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
docs		docs
environments		environments
integrations		integrations
scripts		scripts
src/sere		src/sere
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
README.md		README.md
REVIEW.md		REVIEW.md
pyproject.toml		pyproject.toml
test_long_horizon.py		test_long_horizon.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Symbolic Embodied Reasoning Environments (SERE)

Features

Architecture

Included Domains

PDDL domains (20)

YAML domains (2)

Installation

Agentic Sandbox (Primary Mode)

Workspace

Tools

Guards

Usage

Verifiers Integration

With prime CLI

Programmatic

Gym-Style API

PDDL Domain Structure

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Symbolic Embodied Reasoning Environments (SERE)

Features

Architecture

Included Domains

PDDL domains (20)

YAML domains (2)

Installation

Agentic Sandbox (Primary Mode)

Workspace

Tools

Guards

Usage

Verifiers Integration

With prime CLI

Programmatic

Gym-Style API

PDDL Domain Structure

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages