A lightweight terminal daemon designed to be observed by LLMs, not humans.
Single static Rust binary. ~2,000 lines. Manages PTY sessions, maintains virtual terminal state, renders screenshots on demand, and emits events when interesting things happen. Exposes everything over a newline-delimited JSON protocol on a Unix socket. Ships with a thin CLI client.
Current coding agents interact with terminals by capturing stdout/stderr as text. This is wasteful — a cargo build produces thousands of lines the agent doesn't need, and structured terminal output (tables, diffs, TUI programs) becomes a mess of escape codes consuming context tokens.
Humans don't work this way. They glance at the terminal, see the overall state, and focus on specific lines when needed. t4a gives LLMs the same experience: a screenshot to glance at, line-level text extraction for precision, and event notifications so the agent doesn't waste cycles polling.
The screenshot-as-primary-observation pattern exploits the fact that vision tokens are cheap (a full terminal screenshot is ~1,334 tokens in a single tile) and that LLMs process images with bidirectional attention. For high-contrast monospace text on a fixed grid, visual comprehension is near-perfect.
Each terminal is a PTY paired with a VT100 state machine. The state machine absorbs all output (including escape codes) and maintains the current screen contents as a character grid. Screenshots are rendered from this grid on demand. The event watcher monitors the raw byte stream for patterns (silence, prompts, bell, process exit) and emits notifications.
Newline-delimited JSON over a Unix socket at /tmp/t4a.sock (configurable via T4A_SOCKET).
Each CLI invocation connects, sends one JSON line, reads one JSON response line, and disconnects. All requests include a "cmd" field. All responses include "ok": true on success or "ok": false, "error": "..." on failure.
create — Create a new terminal.
// Request
{"cmd": "create", "cols": 80, "rows": 24, "cmd_args": ["bash"], "cwd": "/home/x", "env": {}}
// Response
{"ok": true, "id": "t1", "cols": 80, "rows": 24, "pid": 12345}All fields except cmd are optional. Defaults: 80x24, $SHELL or bash.
list — List all terminals.
// Request
{"cmd": "list"}
// Response
{"ok": true, "terminals": [{"id": "t1", "cols": 80, "rows": 24, "pid": 12345, "alive": true, "title": "bash"}]}kill — Kill terminal and clean up. Sends SIGHUP to the shell process group.
{"cmd": "kill", "id": "t1"}send — Write to the terminal.
// Request
{"cmd": "send", "id": "t1", "input": "cargo build --release\n"}
// Response
{"ok": true}Input is a string. Use \n for enter, \x03 for Ctrl+C, \x1b[A for arrow up, etc. Also accepts "input_base64" for raw bytes.
screenshot — Render the current viewport as PNG.
// Request
{"cmd": "screenshot", "id": "t1", "cursor": true, "pad": 1, "scale": 66}
// Response (two parts)
{"ok": true, "len": 12345}
<12345 raw PNG bytes follow immediately after the JSON line>The response is a JSON header line with the byte length, followed by exactly that many raw PNG bytes. All parameters are optional.
text — Read text from the terminal buffer.
// Request — last 5 lines (indexed from bottom, 0 = last line)
{"cmd": "text", "id": "t1", "start": 0, "end": 5, "trim": true}
// Request — all lines (omit start/end)
{"cmd": "text", "id": "t1", "trim": true}
// Response
{"ok": true, "lines": ["$ cargo build", " Compiling..."], "region": "viewport", "start": 0, "end": 5, "total_lines": 24}cursor — Cursor position and state.
// Request
{"cmd": "cursor", "id": "t1"}
// Response
{"ok": true, "row": 5, "col": 32, "visible": true}resize — Resize terminal. Sends SIGWINCH to the child process.
{"cmd": "resize", "id": "t1", "cols": 120, "rows": 40}events — Stream of terminal events as newline-delimited JSON.
// Request
{"cmd": "events", "terminal": "t1"}
// Response (streaming, one JSON line per event until disconnect)
{"event": "idle", "terminal": "t1", "after_ms": 2000}
{"event": "bell", "terminal": "t1"}
{"event": "command_done", "terminal": "t1", "code": 0}
{"event": "exit", "terminal": "t1", "code": 0}
{"event": "title", "terminal": "t1", "title": "vim src/main.rs"}
{"event": "activity", "terminal": "t1"}The terminal filter is optional — omit it to receive events from all terminals.
Event types:
| Event | Trigger | Use |
|---|---|---|
idle |
No output for N ms after activity (configurable, default 2000ms) | Command probably finished |
bell |
BEL character (\x07) received |
Program wants attention |
command_done |
Shell integration OSC received after command completes | Shell is ready for input, includes exit code |
exit |
Child process exited | Terminal is dead |
title |
OSC title sequence received | Window title changed |
activity |
Output resumed after idle period | Something is happening again |
config — Update daemon configuration.
{
"cmd": "config",
"idle_timeout_ms": 2000
}The CLI is a thin client for the Unix socket. Every subcommand maps to one protocol message. The daemon auto-starts on first command.
t4a daemon [timeout_ms]
t4a create [-- cmd...]
t4a list
t4a send <id> <input>
t4a screenshot <id> [-o file.png]
t4a text <id> [start:end]
t4a cursor <id>
t4a resize <id> <cols> <rows>
t4a events [id]
t4a kill <id>
The send command reads from stdin if no input argument is given. The screenshot command writes PNG to stdout by default. The events command streams newline-delimited JSON to stdout.
The intended usage pattern — this is not part of t4a itself but shows how an agent harness uses it:
import socket, json
def t4a_request(req):
s = socket.socket(socket.AF_UNIX)
s.connect("/tmp/t4a.sock")
s.sendall(json.dumps(req).encode() + b"\n")
line = b""
while not line.endswith(b"\n"):
line += s.recv(4096)
return json.loads(line)
def t4a_screen(tid):
s = socket.socket(socket.AF_UNIX)
s.connect("/tmp/t4a.sock")
s.sendall(json.dumps({"cmd": "screenshot", "id": tid}).encode() + b"\n")
line = b""
while not line.endswith(b"\n"):
line += s.recv(4096)
header = json.loads(line)
png = b""
while len(png) < header["len"]:
png += s.recv(header["len"] - len(png))
return png
# Create terminal
t = t4a_request({"cmd": "create", "cols": 80, "rows": 24})
tid = t["id"]
# Define tools for the LLM
tools = [
{
"name": "terminal_send",
"description": "Send input to the terminal. Use \\n for enter, \\x03 for Ctrl+C.",
"parameters": {"input": "string"}
},
{
"name": "terminal_screen",
"description": "Get a screenshot of the terminal viewport. Returns PNG.",
"parameters": {}
},
{
"name": "terminal_read",
"description": "Read exact text from specific terminal lines.",
"parameters": {"start": "int", "end": "int"}
},
]A typical agent turn:
1. LLM calls terminal_send("cargo build --release\n")
2. LLM calls terminal_wait()
3. ← Agent harness blocks until idle event, returns screenshot
4. LLM sees screenshot: "build failed, error near bottom of screen"
5. LLM calls terminal_read(18, 23) # read the error lines
6. ← Returns exact text: "error[E0308]: mismatched types..."
7. LLM reasons about the fix, calls terminal_send("vim src/main.rs\n")
8. LLM calls terminal_screen()
9. ← Screenshot of vim, LLM navigates visually
Token budget per turn: ~1,334 (screenshot) + ~100 (text read) = ~1,434 tokens of observation. Compare to dumping 50KB of build output as text (~12,000+ tokens).
[dependencies]
portable-pty = "0.9" # PTY creation and management
vt100 = "0.16" # VT100/xterm state machine
png = "0.18" # PNG encoding
image = "0.25" # Image scaling
tokio = "1" # async runtime (socket accept + PTY reading)
serde = "1" # JSON serialization
serde_json = "1"
nix = "0.31" # Signal handling
noto-sans-mono-bitmap = "0.3" # Embedded monospace fontEmbed a monospace bitmap font directly in the binary. At 20px height and 10px width, an 80×24 terminal renders to 800×480 pixels, then downscaled to ~528×332 at 66% for optimal vision token efficiency (~257 tokens per screenshot).
The vt100 crate maintains a cell grid with character + attributes (bold, color, inverse, etc.). The renderer walks this grid and blits each character from the embedded font, applying foreground/background colors. No text shaping, no kerning, no ligatures. It's a fixed grid of glyphs.
A background task per terminal reads from the PTY master fd and:
- Feeds bytes into the
vt100::Parserto update screen state - Watches for event triggers:
- Idle: track timestamp of last byte received. A timer fires if no bytes for
idle_timeout_ms. Reset on new bytes. - Bell: watch for
\x07in the byte stream (before VT100 parsing) - Command done: shell integration OSC
\033]7777;done;<code>\007via precmd/PROMPT_COMMAND hook - Exit:
waitpidon the child PID - Title: the
vt100crate exposes the window title set by OSC sequences - Activity: transition from idle state to receiving bytes
- Idle: track timestamp of last byte received. A timer fires if no bytes for
Events are broadcast to all listeners via a tokio::sync::broadcast channel.
The daemon is single-threaded async (tokio). Each terminal has:
- A task reading from the PTY master fd and updating the VT100 state
- A task running the idle timer
Requests are handled concurrently. The VT100 screen state is behind a Mutex — reads (screenshot, text) and writes (PTY output processing) are serialized. Contention is minimal since writes are fast (just feeding bytes to the parser).
In scope:
- PTY lifecycle management
- VT100 terminal emulation (via
vt100crate) - Screenshot rendering with embedded font
- Text extraction from viewport and scrollback
- Event detection and streaming notification
- NDJSON protocol over Unix socket
- CLI client
- Multi-terminal support
Out of scope (for v1):
- Mouse input support
- Sixel/image protocol rendering
- Recording/replay of terminal sessions
- Authentication/authorization on the socket
- Windows support
- Remote access (TCP/TLS) — use SSH tunneling if needed
- Built-in MCP server — build this as a separate thin adapter
src/
main.rs # CLI parsing, daemon entry point
daemon.rs # Unix socket server, JSON dispatch
terminal.rs # Terminal struct: PTY + VT100 + scrollback
pool.rs # Terminal pool management, ID generation
renderer.rs # VT100 screen → PNG rendering
font.rs # Embedded bitmap font data and glyph lookup
events.rs # Event detection, broadcast channel
cli.rs # CLI client (JSON over Unix socket)
Cargo.toml
README
spec.md