Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions docs/adr/goal-driven-cronjob.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# ADR: Goal-Driven CronJob (disable_on_success)

- **Status:** Proposed
- **Date:** 2026-05-13
- **Author:** @chaodu-agent
- **Related:** [Basic CronJob ADR](./basic-cronjob.md), [CronJob Docs](../cronjob.md)

---

## 1. User Story & Requirements

As an OpenAB operator, I want to define a **goal** that agents must achieve, where a CronJob periodically checks if the goal is met and keeps prompting agents until it is — so that I can assign persistent objectives without manually following up.

As a team lead, I want agents to self-organize ("escape room" mode) — I tell them the goal, not the steps.

Requirements:
- Extend existing usercron `[[jobs]]` with `disable_on_success` fields
- Before sending the scheduled message, run the specified command
- If command exits 0 and prints the configured `disable_on_success_match` string to stdout/stderr → goal achieved, post `✅ Goal achieved` to thread, auto-disable the job, do NOT send the regular failure message
- If command exits 0 without the required match string → goal not met, send message as normal
- If command exits non-zero → goal not met, send message as normal (agents continue working)
- Auto-disable state must persist across restarts
- Human can re-enable a completed goal by setting `enabled = true`
- All communication stays in a single stable thread

---

## 2. Context & Decision Drivers

### The "Escape Room" Pattern

Traditional agent interaction is reactive: human sends message, agent responds. This ADR introduces **goal-driven** interaction: human sets an objective, agents work autonomously across multiple rounds until the objective is met.

The key insight: we don't need a complex goal orchestrator for Phase 1. The existing CronJob scheduler already provides periodic execution — we just need to add a "stop condition."

### Why Extend CronJob (Not a New System)

We considered two approaches:

| Approach | Pros | Cons |
|----------|------|------|
| New `[[goals]]` config section | Clean separation, dedicated semantics | New scheduler, new state machine, large MVP |
| Extend usercron `[[jobs]]` | Minimal change, reuses existing infra | Slightly overloaded config section |

**Decision: Extend usercron `[[jobs]]`** — Phase 1 is literally "cron + exit check + auto-disable." The existing scheduler, channel routing, and thread handling all apply. A full goal runner with state delta detection and escalation is deferred to Phase 2.

### Design Principle: Smallest Useful Increment

> "Don't build a goal orchestrator when a conditional cron job will do."

Phase 1 proves the concept. Phase 2 adds sophistication only after validation.

---

## 3. Design

### Configuration

`disable_on_success` is **only supported in usercron** (`$HOME/.openab/cronjob.toml`), NOT in global config. This is because auto-disable needs to write state back to the file, and only usercron is writable by the OpenAB scheduler at runtime.

```toml
# $HOME/.openab/cronjob.toml (usercron format uses [[jobs]])
[[jobs]]
id = "unit-tests-pass" # required for disable_on_success jobs
schedule = "*/10 * * * *"
channel = "123456789012345678"
thread_id = "" # auto-created on first fire if empty
message = "Goal not met: all unit tests must pass. Please continue working."
disable_on_success = "npm test && echo GOAL_ACHIEVED" # command to evaluate goal
disable_on_success_match = "GOAL_ACHIEVED" # required marker in command output
disable_on_success_timeout_secs = 60 # command timeout
disable_on_success_working_dir = "/repo" # working directory
enabled = true # scheduler sets to false on success
```

### New Fields

| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `id` | ✅ (when `disable_on_success` set) | — | Stable unique identifier for state persistence. Missing `id` on a job with `disable_on_success` is a **startup error**. |
| `disable_on_success` | | — | Shell command that evaluates the goal |
| `disable_on_success_match` | ✅ (when `disable_on_success` set) | — | Required marker string that must appear as a **substring** in the combined stdout+stderr output (case-sensitive), in addition to exit 0, before the goal is considered achieved. Choose a unique marker (e.g. `GOAL_ACHIEVED`) that won't appear in normal command output to avoid false positives. |
| `disable_on_success_timeout_secs` | | `60` | Max seconds before command is killed |
| `disable_on_success_working_dir` | | — | Working directory for command execution |

### Execution Flow

```
CronJob schedule fires
Is enabled = false in usercron?
┌────┴────┐
Yes No
│ │
▼ ▼
Skip Run disable_on_success command
(done) │
┌────┴─────────────┐
│ │
exit 0 + marker? No / Timeout
│ │
▼ ▼
Post ✅, Send message
set enabled to channel/thread
= false (agents keep working)
```

### State Persistence

No separate state file needed. When goal is achieved, the **OpenAB scheduler** writes `enabled = false` directly to `$HOME/.openab/cronjob.toml`. State lives in the config itself.

| Event | Action |
|-------|--------|
| Goal achieved (exit 0 + marker) | Scheduler posts `✅ Goal achieved: <description>` to thread, then sets `enabled = false` in usercron file |
| Human re-enables | Human sets `enabled = true` in usercron file |
| Thread auto-created | Scheduler writes `thread_id` back to usercron file |

This works because usercron is designed to be runtime-writable (hot-reloaded by the scheduler), unlike global config.

### Re-enable Logic

Human edits `$HOME/.openab/cronjob.toml`:
- Set `enabled = true`

That's it. Scheduler hot-reloads the file, sees `enabled = true`, and resumes firing. No generation counter, no state comparison needed.

### Thread Lifecycle

| Scenario | Behavior |
|----------|----------|
| `thread_id` provided in config | Use that thread for all fires |
| `thread_id` empty | Auto-create thread on first fire, persist in state |

All messages go to the **same thread** — agents need conversation history as context across rounds.

### Security

| Concern | Mitigation |
|---------|-----------|
| Arbitrary shell execution | Trust config source (same as existing cron). Only maintainers edit config. |
| False-positive success | Require both exit 0 and an explicit `disable_on_success_match` in command stdout/stderr |
| Runaway commands | `disable_on_success_timeout_secs` kills long-running processes |
| Command injection | Config is static TOML, not user-input at runtime |

Future phases may add container isolation or command whitelists.

---

## 4. Implementation Plan

### Phase 1 (This ADR)

1. Parse new fields from usercron `[[jobs]]` (`$HOME/.openab/cronjob.toml`). Validate at load time: any job with `disable_on_success` set MUST have `id` and `disable_on_success_match` — reject with a startup error if missing.
2. On cron fire, if `disable_on_success` is set:
- Check `enabled` — if false, skip
- Execute command with `disable_on_success_timeout_secs` and `disable_on_success_working_dir`
- exit 0 and stdout/stderr contains `disable_on_success_match` → scheduler posts `✅ Goal achieved` to thread, writes `enabled = false` to usercron file
- exit != 0 / timeout exceeded / marker missing → send message as normal
3. Thread auto-creation: if `thread_id` empty, create thread on first fire, scheduler writes back to usercron file
4. No separate state file — usercron IS the state

### Phase 2 (Future — Not This ADR)

Introduce `[[goals]]` config section with:
- `progress_check` — state delta detection between rounds
- `stuck_threshold` — escalate after N rounds without progress
- `max_rounds` — hard cap
- LLM judge — tie-breaker after command passes
- Escalation messages with decision options
- Round counter and progress reporting

Phase 1 usercron `[[jobs]]` entries with `disable_on_success` remain valid and coexist with Phase 2 `[[goals]]` — no migration required.

---

## 5. Test Scenarios

### Happy Path

1. Repo has one failing test
2. Cron fires every 10 min with `disable_on_success = "npm test && echo GOAL_ACHIEVED"` and `disable_on_success_match = "GOAL_ACHIEVED"`
3. `npm test` fails → message sent → agents discuss and fix
4. Next fire → `npm test` passes and output contains `GOAL_ACHIEVED` → scheduler posts `✅ Goal achieved`, sets `enabled = false`

### Restart Resilience

1. Job is auto-disabled (scheduler wrote `enabled = false` to usercron)
2. Process restarts
3. Usercron loaded → `enabled = false` → job stays disabled

### Re-enable

1. Job is disabled (`enabled = false` in usercron)
2. Human edits `$HOME/.openab/cronjob.toml`: sets `enabled = true`
3. Scheduler hot-reloads → job fires again on next schedule

### Timeout

1. `disable_on_success` command hangs
2. After `disable_on_success_timeout_secs` → killed
3. Treated as failure → message sent

### Missing Marker

1. `disable_on_success` exits 0 but does not print `disable_on_success_match`
2. Treated as failure → regular message sent

---

## 6. Open Questions

1. **Multi-agent coordination** — How do agents avoid conflicting actions when self-organizing?
2. **Observability** — Should we log command output / exit codes for debugging?
3. **Context overflow** — Long-running goals accumulate thread history; summarization strategy TBD

---

## 7. References

- [Basic CronJob ADR](./basic-cronjob.md)
- [CronJob Docs](../cronjob.md)
- [Design Discussion (Discord)](https://discord.com/channels/1491295327620169908/1504239931940409587)
Loading