feat: add Terminal-Bench environment #5

ZixuanZZhang · 2026-02-10T23:58:47Z

Summary

Adds environment support for Terminal-Bench 2.0, a benchmark for evaluating AI agents on real-world terminal tasks. Each task runs in an isolated Docker container with the agent interacting via shell commands. The same environment can be used for Terminal-Bench benchmark.

Changes: new environment: `TerminalBenchEnv`

Docker-based environment using Harbor for container management
Provides execute_command tool for shell interaction within containers
Automatic container lifecycle management (build, start, cleanup)
Binary reward function based on test script execution

Test results:

All existing unit tests pass (102 passed in 4.15s)

Additional dependencies:

Require Harbor for Docker environment management.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Lawhy

LGTM!

feat: add terminal-bench environment

e1c987a

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Lawhy approved these changes Feb 11, 2026

View reviewed changes

Lawhy merged commit c637e83 into horizon-rl:main Feb 11, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Terminal-Bench environment #5

feat: add Terminal-Bench environment #5

Uh oh!

ZixuanZZhang commented Feb 10, 2026

Uh oh!

Lawhy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add Terminal-Bench environment #5

feat: add Terminal-Bench environment #5

Uh oh!

Conversation

ZixuanZZhang commented Feb 10, 2026

Summary

Changes: new environment: TerminalBenchEnv

Test results:

Additional dependencies:

Uh oh!

Lawhy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Changes: new environment: `TerminalBenchEnv`