Shell Command Execution Tool for AI Agents

## Description

Enable AI agents to autonomously execute terminal/shell commands, similar to tools like Claude Code and Codex CLI. This would provide AI agents with an "ultimate" tool with near-limitless options.

Note: This issue is about giving agents access to the host machine's terminal/shell. It is *not* about giving agents access to Terminal widgets of Theia which are driven by the user.

## Architecture Considerations

### Tool Levels

Terminal access can be separated into different levels of power:

1. **Level 1 - Read-only tools**: Specialized commands for read-only operations, e.g., `grep`, `find`, `cat`, which are directly executed without any shell
2. **Level 2 - Write-enabled tools**: Specialized commands that can modify local data or communicate with external services, e.g., `git`, `npm`, which are directly executed without any shell
3. **Level 3 - Generic exec**: A generic exec command, i.e., `exec <command> <args...>`, which spawns the respective command without any shell
4. **Level 4 - Shell execution**: Full shell execution, e.g., `bash "npm run build | tee log.txt"`, with full shell support including pipes, variable expansion, etc.

### Existing Theia Tools (Level 1 Equivalent)

Theia AI already provides several read-only tools implemented in [`@theia/ai-ide`](https://github.com/eclipse-theia/theia/tree/master/packages/ai-ide), for example: `getFileContent`, `searchInWorkspace` and `findFilesByPattern`.

These tools are implemented against Theia's services (`FileService`, `SearchInWorkspaceService`), not terminal commands. This makes them much more portable than terminal-based equivalents, as they work in browser-only mode, remote container contexts, etc. and additionally respect workspace boundaries.

Therefore, we do **not** need special terminal-based implementations for basic file reading and searching in the short-term.

However, terminal commands like `grep` and `find` can be used in a more flexible way (e.g., complex regex patterns, specific flags). These capabilities become available automatically when Level 4 (shell execution) is implemented, although specialized support might still be beneficial in the long term.

### Security Concept

There are multiple layers of security that can be applied for terminal access, balancing between strictness and user convenience.

#### Permissions

The permission system controls which commands can be executed and under what conditions. Permissions can be configured at different granularities.

##### Disapprove by Default

By default, every terminal interaction must be approved by the user, showing the exact command and all arguments to be executed.
Conceptually, this will allow all malicious terminal interactions to be prevented, at the cost of convenience.

##### Level 1 Read-Only Tools

Level 1 tools wrap specific commands (like `grep`, `find`, `cat`) for which there is special support and knowledge within Theia AI. These are limited to read-only operations that do not modify local data or communicate with external services.

Level 1 "read-only" tools should be configurable to be always allowed, as this likely matches user intent, especially if access is limited to files in the current workspace.

##### Level 2 Write-Enabled Tools

Level 2 "write-enabled" tools, like `git` and `npm`, can:
- Query data (e.g., `git log`, `npm list`)
- Modify local data (e.g., `git commit`, `npm install`)
- Communicate with external services (e.g., `git push`, `npm publish`)

Such a tool could be separated into multiple tools to split between "read-only" and "write-enabled", or they can work with separate whitelists.
Tools like Claude Code allow defining patterns like `git log *`, allowing all `git log` invocations without allowing `git push`.

These patterns/whitelists could be restricted to Level 1 and Level 2 tools to avoid Level 4 bypasses like [this one](https://github.com/anthropics/claude-code/issues/4956), e.g., the pattern `echo *` allowing `echo 'test' && rm -rf /`.

##### Level 3 Tools

Level 3 tools execute arbitrary commands for which there is no special support in Theia AI, i.e., we don't know their behavior, whether they can modify local data, or communicate with external services.

These tools should either be handled in the same way as Level 4 tools, or users/adopters can configure "safe" usage of them, allowing these configured tools to be part of the "read-only" permission tier.

##### Level 4 Tools

Level 4 tools give full access to a shell, e.g., `bash`, which therefore exposes a large security risk.

This risk can be partially mitigated but, besides disapproving by default, never fully avoided.

Mitigation techniques include:
- Sandboxing shell invocation
- Analysis of the command via patterns or [tree-sitter](https://github.com/tree-sitter/tree-sitter-bash) to detect chaining issues
- Analysis of the command via LLMs

##### Workspace Access

It is good practice, even if Level 1 read-only tools are configured to be allowed, to separate between read access of the current workspace (expected by the user) and read access of the remaining system.
Depending on the available security features, this can be securely achieved via sandboxing and containerization, or fall back to argument analysis.

#### Isolation

##### Sandboxing

Linux and macOS systems come with convenient sandboxing tools that could be leveraged by Theia AI, e.g., [Bubblewrap](https://github.com/containers/bubblewrap) on Linux and [Seatbelt](https://developer.apple.com/documentation/security/app_sandbox) on macOS.

These tools allow restricting filesystem and network access for commands of all levels, significantly reducing the attack surface of malicious commands.

**Windows limitation**: Native sandboxing tools are not available on Windows. On Windows, the implementation will fall back to the permission system only, meaning users must rely on approval of commands rather than sandboxed execution.

##### Containerization

By isolating the terminal execution (and possibly the overall Theia backend) into an isolated container, the attack surface on the host can be greatly reduced, and issues like deleting the whole filesystem can be avoided.

## Technical Considerations

### Terminal Infrastructure

It must be checked whether we want to reuse Theia's terminal infrastructure and/or implement an own integration with the Node backend.

### Shell Support

Initial implementation should focus on `bash` as the primary shell for Level 4 execution due to its ubiquity and well-documented behavior.

Generalizing to multiple shells could be a follow-up enhancement.

### Long-Running Commands

Commands can run for extended periods (e.g., e2e tests taking minutes) or indefinitely (e.g., `watch` commands).

Aspects to consider:

- Configurable per-command timeout with sensible defaults (e.g., 2 minutes for generic commands, less/more time for known short/long-running commands)
- Allow commands to run in background with output streaming
  - Needs some UI specialization
  - Needs a concept on how the LLM can check the output
- Provide mechanism for user or agent to cancel running commands (without canceling the chat request/session)

### Output Handling

Commands can produce large amounts of output that would consume excessive context.
The LLM can be given options to limit the output size.

Mitigation strategies to consider:

- Give LLM the option to configure the amount of output size and set some reasonable default limits
- In the same direction, we could show first N and last M lines for large outputs
- For Level 4, the agent can use shell features like `| head -n 100` and `grep` to self-limit
- Optionally use another LLM to summarize large outputs

### Sandboxing Integration

Sandboxing should be:
- Optional but recommended for all tool levels (Level 1-4)
- Configurable
- Gracefully degraded when sandboxing utilities are unavailable (fall back to permission-only security)

## Other Considerations

- Persistence of command history
- Integration in remote development scenarios

## Implementation Plan

### MVP - Shell Command Execution Tool

**Goal**: Provide full shell access with essential security measures.

#### 1.1 Core Infrastructure

- Define base interfaces for terminal tools (e.g. `TerminalTool`, `TerminalToolResult`)
- Evaluate Terminal reuse or implement shell execution service
- Add output capture and truncation handling (configurable max characters, head/tail options)
- Implement timeout handling with configurable defaults
- Handle "endless" commands

#### 1.2 Security Foundation

- Basic permission system: all bash commands require approval by default. Reuse or enhance current approval UI for tools
- Implement workspace path detection in commands. Warn users for access outside of workspace bounds.
- Add logging for all executed commands

#### 1.3 Sandboxing Integration (Linux)

- Integrate with [Bubblewrap](https://github.com/containers/bubblewrap) for Linux
- Configure sandbox to restrict filesystem access to workspace by default
- Add option to enable/disable network access in sandbox
- Gracefully fall back to non-sandboxed execution when Bubblewrap is unavailable (with warning)

#### 1.4 Agent Integration

- Register `bash`/`shell` tool
- Integrate with Coder Agent via new prompt variant (or create dedicated agent)
- Specialize tool results rendering in chat UI

### Enhanced Security & Permissions

**Goal**: Add permission patterns and command analysis for safer autonomous operation.

Could be part of the MVP or delivered as a followup

#### 2.1 Permission Patterns

- Implement pattern-based whitelisting (e.g., `git log *`, `npm test`)
- Add per-command approval mechanism ("always allow `make`")
- Add session-scoped permissions
- Create permission management UI in settings

#### 2.2 Command Analysis

- Integrate [tree-sitter-bash](https://github.com/tree-sitter/tree-sitter-bash) for command parsing
- Implement compound command detection (`&&`, `||`, `;`, pipes)
- Detect dangerous patterns (`rm -rf`, `eval`, command substitution)
- Warn user or prevent completely when pattern whitelist could be bypassed via shell features

#### 2.3 Extended Sandboxing

- Implement [Seatbelt](https://developer.apple.com/documentation/security/app_sandbox) integration for macOS
- Make sandbox configuration configurable for users

### Potential followup: Level 3 Generic Exec

**Goal**: Allow execution of arbitrary commands without shell interpretation.

This is specialized support for commands to reduce the risk of malicious interactions while safely allowing users to whitelist them, i.e. no malicious shell-based attacks are possible for them.

- Reuse/Implement `exec` tool that spawns commands + arguments directly (no shell)
- Add command + arguments allow list configuration

This support can be reused to support Level 1 commands out of the box with a known whitelist of commands + argument combinations.

### Other potential followups

#### Enhanced UI

- Command history view
- Security status indicators (sandboxed, approved, etc.)

#### Advanced Security

- LLM-based command analysis
- Anomaly detection (unusual command patterns)

#### Remote/Container Integration

- Run commands in (dev-)container on demand or by default


Shell Command Execution Tool for AI Agents #16772

Description

Description

Architecture Considerations

Tool Levels

Existing Theia Tools (Level 1 Equivalent)

Security Concept

Permissions

Disapprove by Default

Level 1 Read-Only Tools

Level 2 Write-Enabled Tools

Level 3 Tools

Level 4 Tools

Workspace Access

Isolation

Sandboxing

Containerization

Technical Considerations

Terminal Infrastructure

Shell Support

Long-Running Commands

Output Handling

Sandboxing Integration

Other Considerations

Implementation Plan

MVP - Shell Command Execution Tool

1.1 Core Infrastructure

1.2 Security Foundation

1.3 Sandboxing Integration (Linux)

1.4 Agent Integration

Enhanced Security & Permissions

2.1 Permission Patterns

2.2 Command Analysis

2.3 Extended Sandboxing

Potential followup: Level 3 Generic Exec

Other potential followups

Enhanced UI

Advanced Security

Remote/Container Integration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions