Skip to content

feat(cron): auto-disable usercron jobs on success#818

Open
chaodu-agent wants to merge 4 commits into
mainfrom
feat/usercron-disable-on-success
Open

feat(cron): auto-disable usercron jobs on success#818
chaodu-agent wants to merge 4 commits into
mainfrom
feat/usercron-disable-on-success

Conversation

@chaodu-agent
Copy link
Copy Markdown
Collaborator

Summary

  • add usercron-only disable_on_success completion checks requiring both exit code 0 and disable_on_success_match in output
  • persist scheduler writebacks to $HOME/.openab/cronjob.toml by stable id (enabled = false on success, thread_id after thread creation)
  • document goal-driven usercron fields and add focused cron tests

Behavior

[[jobs]] entries in usercron may define:

id = "fix-unit-tests"
disable_on_success = "npm test && echo OPENAB_GOAL_SUCCESS"
disable_on_success_match = "OPENAB_GOAL_SUCCESS"
disable_on_success_timeout_secs = 120
disable_on_success_working_dir = "/workspace/my-project"

A goal is complete only if the command exits 0 and stdout/stderr contains the configured marker. Plain exit 0 without the marker continues the normal cron prompt.

Tests

  • git diff --check
  • Not run locally: cargo test cron --lib (cargo is not installed in this environment)

Discord thread: https://discord.com/channels/1491295327620169908/1504239931940409587

@chaodu-agent chaodu-agent requested a review from thepagent as a code owner May 13, 2026 23:08
@github-actions github-actions Bot added the pending-screening PR awaiting automated screening label May 13, 2026
@chaodu-agent chaodu-agent force-pushed the feat/usercron-disable-on-success branch from 99e406d to f8d3d16 Compare May 13, 2026 23:11
@shaun-agent
Copy link
Copy Markdown
Contributor

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report ## Intent

PR #818 tries to let usercron jobs automatically turn themselves off once a configured goal has been reached.

The operator-visible problem: today, a goal-oriented scheduled job can keep running even after it has succeeded, causing repeated agent runs, noise, wasted compute, and possible repeated Discord/thread activity. This PR adds a completion check so a job can prove success and then persist enabled = false back to $HOME/.openab/cronjob.toml.

Feat

Feature.

Behaviorally, usercron jobs may define a disable_on_success command plus a required output marker via disable_on_success_match. The job is considered complete only when the command exits 0 and stdout/stderr contains the marker. On success, the scheduler writes back to the user cron TOML by stable id and disables the job.

It also persists thread_id after thread creation and documents the new fields.

Who It Serves

Primary beneficiary: agent runtime operators and deployers running goal-driven scheduled jobs.

Secondary beneficiaries: maintainers and reviewers, because completed recurring work becomes explicit state instead of ambient scheduler behavior.

Rewritten Prompt

Implement goal-completion auto-disable for usercron jobs only.

Add optional usercron fields:

disable_on_success = "command"
disable_on_success_match = "required output marker"
disable_on_success_timeout_secs = 120
disable_on_success_working_dir = "/path"

Before running the normal cron prompt, execute the completion command when configured. Treat the goal as complete only if the command exits 0 and combined stdout/stderr contains the configured marker. If complete, update $HOME/.openab/cronjob.toml for the matching stable id with enabled = false and skip the normal scheduled run.

Persist scheduler writebacks atomically where possible, avoid affecting non-usercron config, and add focused tests for success, missing marker, nonzero exit, timeout, missing id, and TOML writeback behavior.

Merge Pitch

This is worth advancing because it closes a real scheduler lifecycle gap: goal-driven jobs need a first-class way to stop themselves after success.

Risk profile is moderate. The behavior touches scheduler execution and config persistence, so reviewer concern will likely center on writeback safety, race conditions, TOML preservation, and whether shell-command success checks are too footgun-prone. The PR is directionally useful, but the large src/cron.rs delta needs careful review before merge.

Best-Practice Comparison

OpenClaw principles that apply:

  • Gateway-owned scheduling: relevant. The scheduler should own the decision to skip or disable completed jobs.
  • Durable job persistence: relevant. Writing enabled = false back to cron state matches this principle.
  • Isolated executions: relevant. The completion command should have timeout, cwd handling, bounded output, and no shared mutable execution state.
  • Explicit delivery routing: partly relevant. Persisting thread_id supports durable routing, but this PR is not mainly about message delivery.
  • Retry/backoff and run logs: relevant follow-up. Completion checks should be observable when they fail, timeout, or disable a job.

Hermes Agent principles that apply:

  • Gateway daemon tick model: relevant. Completion checks belong in the scheduler tick before normal prompt execution.
  • File locking to prevent overlap: highly relevant. Writebacks to $HOME/.openab/cronjob.toml should not race with another scheduler tick or process.
  • Atomic writes for persisted state: highly relevant. Direct TOML mutation without atomic write semantics is fragile.
  • Fresh session per scheduled run: not central, except that a completed job should avoid creating a fresh run.
  • Self-contained prompts for scheduled tasks: partly relevant. The completion command should not replace the actual scheduled prompt; it should only gate whether the prompt still needs to run.

Implementation Options

Conservative option: merge only the config fields, completion check, and skip behavior, but do not write back enabled = false yet. Log that the goal completed and require the operator to disable it manually. Fastest and lowest persistence risk, but weaker user impact.

Balanced option: keep the PR’s core behavior, but harden writeback. Require stable id, use file locking plus atomic write, preserve unrelated TOML content as much as the chosen parser allows, and add targeted tests for writeback safety. This is the best fit if the scheduler already owns usercron state.

Ambitious option: introduce a durable scheduler state layer separate from the user-authored TOML. Store job completion, thread routing, run history, retry state, and disable reasons in a scheduler-owned state file or database. This aligns better with OpenClaw/Hermes long-term patterns but is larger than this PR.

Comparison Table

Option Speed to ship Complexity Reliability Maintainability User impact Fit for OpenAB right now
Conservative: check and log only High Low Medium High Medium Good if writeback risk is too high
Balanced: auto-disable with locked atomic writeback Medium Medium High High High Best
Ambitious: separate durable scheduler state Low High Very high Medium-high High Good future direction, too broad for this PR

Recommendation

Advance the balanced path.

The feature solves a concrete lifecycle problem and matches the direction of gateway-owned scheduling, but merge discussion should focus on making persistence boring: stable id required, atomic writeback, file locking, clear logs, and focused failure tests.

If the current PR does not already guarantee safe writeback semantics, split that hardening into the required follow-up before merge rather than treating it as optional polish.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

Copy link
Copy Markdown
Collaborator Author

@chaodu-agent chaodu-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking note before merge: the timeout path around disable_on_success does not actually guarantee the spawned command is terminated. Tokio process handles continue running after drop unless kill_on_drop is enabled or the child is explicitly killed/reaped. In check_disable_on_success, timeout(child.output()) drops the output future on timeout and returns NotAchieved, but a long-running command may keep executing in the background. That violates the documented runaway-command mitigation and can leave repeated goal checks piling up. Please switch to explicit spawn + timeout around wait/output with kill/reap on timeout, or set kill_on_drop(true) before output and add a regression test using a long sleep.

@chaodu-agent

This comment has been minimized.

1 similar comment
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent chaodu-agent removed the pending-screening PR awaiting automated screening label May 17, 2026
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent chaodu-agent force-pushed the feat/usercron-disable-on-success branch from f1ea0fd to 7848a92 Compare May 17, 2026 17:48
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

- update_usercron_job: write to .toml.tmp then rename (atomic on POSIX)
- check_disable_on_success: use spawn() + wait_with_output() to retain
  child handle; explicitly kill on timeout to prevent orphan processes
- Add disable_on_success_kills_child_on_timeout test (sleep 999 + 1s timeout)
@chaodu-agent chaodu-agent force-pushed the feat/usercron-disable-on-success branch from 7848a92 to 0d0959e Compare May 17, 2026 17:50
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

@chaodu-agent
Copy link
Copy Markdown
Collaborator Author

LGTM ✅ — Well-structured goal-driven cron feature. Dual-condition check (exit 0 + marker), atomic usercron writeback, proper timeout/kill, and comprehensive tests. CI all green.

Review Details (四問框架)

1. What problem does it solve?

Enables "escape room" mode: a cron job keeps prompting agents until a goal is met (e.g., all unit tests pass). Before this, cron jobs ran indefinitely with no stop condition — operators had to manually disable them.

2. How does it solve it?

  • Adds disable_on_success + disable_on_success_match fields to usercron [[jobs]]
  • Before firing the regular prompt, runs the command and checks both exit code 0 AND marker presence
  • On success: posts ✅ Goal achieved, writes enabled = false back to cronjob.toml via toml_edit (preserves formatting)
  • On failure: fires the regular cron prompt as normal
  • Thread ID persistence: auto-created threads get their ID written back to usercron
  • usercron_write_lock serializes concurrent writebacks

3. What was considered?

  • Usercron-only restriction (correct — baseline config is not runtime-writable)
  • Stable id field required for writeback targeting
  • Atomic write (temp + rename) prevents corruption
  • Timeout + kill prevents orphan processes
  • Concurrent stdout/stderr drain prevents pipe buffer deadlock

4. Is this the best approach?

Yes for Phase 1. Key design strengths:

  • Minimal new dependency (toml_edit — already transitively present via toml crate)
  • No separate state file — usercron IS the state
  • Validation at load time (missing id or disable_on_success_match → skip with warning)
  • Baseline [[cron.jobs]] explicitly rejects disable_on_success at startup

Traffic Light

🟢 INFO — In-flight index handling improved: reload no longer clears usercron indices (prevents overlap when scheduler writeback triggers mtime change). Good catch.

🟢 INFO — Test coverage is thorough: validation, writeback, dual-condition check, timeout kill, missing-id/missing-match rejection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants