Skip to content

Commit ee5e4dd

Browse files
feat: add cron-doctor — diagnose cron death-traps before they ship (#757)
1 parent 5565c8d commit ee5e4dd

4 files changed

Lines changed: 958 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,7 @@ Key source families include:
349349

350350
- **[mattpocock/skills](https://github.com/mattpocock/skills)**: Source for 17 Matt Pocock workflow skills - codebase design, TDD, bug diagnosis, triage, PRDs, issues, prototyping, handoff, teaching, and skill-writing guidance (MIT).
351351
- **[emilkowalski/skills](https://github.com/emilkowalski/skills)**: Source for Emil Kowalski design engineering skills - UI polish, motion review, animation standards, component craft, and high-taste frontend guidance (MIT).
352+
- **[takeaseatventure/devops-skills](https://github.com/takeaseatventure/devops-skills)**: Source for the `cron-doctor` skill - cron expression diagnosis, validation, trap detection, and zero-dependency schedule analysis tooling (MIT).
352353
- **[drogers0/gh-image](https://github.com/drogers0/gh-image)**: Source for the `gh-image` skill - GitHub CLI image uploads that return canonical `user-attachments` embed URLs for PRs, issues, comments, and README screenshots (MIT).
353354
- **[Genefold/arrowspace-skills](https://github.com/Genefold/arrowspace-skills)**: Source for the `arrowspace` skill - spectral vector search using graph Laplacian eigenstructure for structurally aware retrieval (Apache-2.0).
354355
- **[yaojingang/yao-meta-skill](https://github.com/yaojingang/yao-meta-skill)**: Source for the `yao-meta-skill` skill - governed skill creation, refactoring, evaluation, packaging, review, and distribution workflows (MIT).

skills/cron-doctor/SKILL.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
---
2+
name: cron-doctor
3+
description: "Diagnose and validate cron expressions before they ship. Catches the five silent death-traps: impossible dates that never fire, OR-semantics that fire too often, midnight spikes, uneven step drift, and leap-year February 29."
4+
category: devops
5+
risk: safe
6+
source: community
7+
source_repo: takeaseatventure/devops-skills
8+
source_type: community
9+
date_added: "2026-06-26"
10+
author: takeaseat
11+
tags: [cron, crontab, scheduling, devops, debugging, kubernetes, validation]
12+
tools: [claude, cursor, codex, gemini, opencode]
13+
license: "MIT"
14+
license_source: "https://github.com/takeaseatventure/devops-skills/blob/main/LICENSE"
15+
---
16+
17+
# cron-doctor
18+
19+
## Overview
20+
21+
Cron is deceptively error-prone. The failure mode is **silent** — a syntactically
22+
valid expression that simply never fires, or fires far more often than intended.
23+
`0 0 30 2 *` parses cleanly and then sits dead forever (February has no 30th).
24+
`0 0 1,15 * 1` looks like "1st and 15th if Monday" but actually means "1st, 15th,
25+
**OR** every Monday" — ~6 fires/month instead of ~2.
26+
27+
This skill teaches an agent to catch those before they reach production. It comes
28+
with a zero-dependency validation engine (`scripts/cron-engine.js`, no install
29+
needed) that parses, describes, deep-validates, and computes next fire times.
30+
31+
## When to Use This Skill
32+
33+
- Use when a user writes, edits, reviews, or deploys a cron expression — in a
34+
crontab, a Kubernetes `CronJob`, a GitHub Actions `schedule`, an Airflow DAG,
35+
a Celery beat schedule, a systemd timer, or any scheduled task.
36+
- Use when debugging a job that "didn't fire" or "fired at the wrong time."
37+
- Use when a user asks "what does this cron expression mean?" or "when will this
38+
run next?" or "how often does this run per year?"
39+
- Use when reviewing a CI/CD pipeline or infrastructure config that contains a
40+
`schedule` field.
41+
- Use when a user pastes a 5-field cron expression and asks for a sanity check.
42+
43+
## How It Works
44+
45+
### Step 1: Parse the expression
46+
47+
Split on whitespace into 5 fields: minute, hour, day-of-month, month, day-of-week.
48+
Confirm valid ranges:
49+
50+
| Field | Position | Range | Notes |
51+
|-------|----------|-------|-------|
52+
| minute | 1 | 0–59 | |
53+
| hour | 2 | 0–23 | |
54+
| day-of-month | 3 | 1–31 | |
55+
| month | 4 | 1–12 | names (JAN–DEC) accepted |
56+
| day-of-week | 5 | 0–7 | 0 and 7 both = Sunday; names (SUN–SAT) accepted |
57+
58+
### Step 2: Describe it in plain English
59+
60+
State what the user *thinks* it does vs. what it *actually* does. Be explicit
61+
about OR-vs-AND semantics for day-of-month + day-of-week (see death-trap #2).
62+
63+
### Step 3: Run the trap checklist
64+
65+
Check the five death-traps below and flag any that apply.
66+
67+
### Step 4: Calculate next runs and annual fire count
68+
69+
Compute the next 5 fire times as concrete dates so the user can verify the
70+
schedule behaves as expected. Estimate annual fire count — a schedule that fires
71+
365×/year vs. 12×/year is a ~30× cost and load difference.
72+
73+
## The Five Cron Death-Traps
74+
75+
These are the bugs that pass `crontab -l` validation but break in production.
76+
77+
### 1. Impossible dates — the "never fires" bug
78+
79+
```
80+
0 0 30 2 *
81+
```
82+
83+
**Valid syntax. Never fires.** February has no 30th. This schedule is a dead job
84+
that silently sits forever. The same applies to day 31 in any 30-day month:
85+
`0 0 31 4 *`, `0 0 31 6 *`, `0 0 31 9 *`, `0 0 31 11 *`.
86+
87+
**Fix:** use `0 0 28-31 * *` and check for end-of-month in the script, or use `L`
88+
(last day) syntax if your scheduler supports it.
89+
90+
### 2. OR-semantics — the "fires too often" bug
91+
92+
```
93+
0 0 1,15 * 1
94+
```
95+
96+
**Does NOT mean** "midnight on the 1st and 15th if it's Monday."
97+
**Does mean** "midnight on the 1st, the 15th, **OR** every Monday." That's ~6
98+
fires/month instead of ~2.
99+
100+
This is the single most misunderstood cron rule. When **both** day-of-month AND
101+
day-of-week are restricted (neither is `*`), cron uses OR logic, not AND.
102+
103+
**Fix:** if you need "1st and 15th only if Monday," run daily and check in the
104+
script:
105+
106+
```bash
107+
0 0 * * 1 [ "$(date +%d)" = "01" -o "$(date +%d)" = "15" ] && your-command
108+
```
109+
110+
### 3. Midnight spike — the "everything at once" bug
111+
112+
```
113+
0 0 * * *
114+
```
115+
116+
Every job scheduled at `0 0` competes for resources at exactly 00:00. Database
117+
backups, log rotations, cert renewals, report generation — all fire simultaneously.
118+
This causes load spikes, connection-pool exhaustion, and cascading timeouts.
119+
120+
**Fix:** stagger jobs across the hour. Use `17 2 * * *` or `43 3 * * *` instead of
121+
`0 0`. Jitter is your friend.
122+
123+
### 4. Uneven steps — the "drift" bug
124+
125+
```
126+
*/7 * * * *
127+
```
128+
129+
**Does NOT mean** "every 7 minutes evenly." It means "every 7 minutes starting at
130+
0, then resets at 60." So: 0, 7, 14, 21, 28, 35, 42, 49, 56 — then 0 again
131+
(a 4-minute gap). The intervals drift: 7,7,7,7,7,7,7,7,**4**.
132+
133+
**Fix:** 60 is not divisible by 7. Use step values that divide 60 evenly: `*/5`,
134+
`*/10`, `*/15`, `*/20`, `*/30`. If you truly need every-7-minutes, use a loop with
135+
`sleep 420`.
136+
137+
### 5. Leap-year February 29 — the "annual surprise"
138+
139+
```
140+
0 0 29 2 *
141+
```
142+
143+
Fires only on leap years — February 29, 2024 / 2028 / 2032… If someone writes this
144+
expecting "end of February," they'll be confused for 3 out of every 4 years.
145+
146+
**Fix:** use `0 0 28 2 *` and handle the 29th case in the script if needed.
147+
148+
## Using the validation script
149+
150+
This skill ships a zero-dependency engine at `scripts/cron-engine.js` (Node.js, no
151+
`npm install` needed). You can use it programmatically or from the CLI:
152+
153+
```javascript
154+
// Programmatic — Node.js, zero dependencies
155+
const { describe, validate, nextRuns, formatNextRuns } = require('./scripts/cron-engine.js');
156+
157+
// Parse + describe -> returns { text, error, parsed }
158+
const d = describe('0 0 30 2 *');
159+
console.log(d.text); // "At 00:00, on day-of-month 30 in in FEB"
160+
161+
// Deep validation -> catches the traps
162+
const result = validate('0 0 30 2 *');
163+
console.log(result.valid); // true (syntax is valid)
164+
console.log(result.observations); // includes the "never fires" insight
165+
console.log(result.suggestions); // e.g. "Midnight is a common spike..."
166+
167+
// Next 5 fire times -> returns Date[]
168+
const runs = nextRuns('0 9 * * 1-5', new Date(), 5);
169+
console.log(formatNextRuns(runs, new Date())); // [{ date, relative, formatted }, ...]
170+
```
171+
172+
```bash
173+
# CLI (via the bundled wrapper)
174+
node scripts/cli.js describe "*/5 * * * *"
175+
node scripts/cli.js validate "0 0 30 2 *"
176+
node scripts/cli.js next "0 9 * * 1-5" 5
177+
```
178+
179+
## Common cron presets
180+
181+
| Expression | Description | Use case |
182+
|-----------|-------------|----------|
183+
| `*/5 * * * *` | Every 5 minutes | Health checks, polling |
184+
| `0 * * * *` | Every hour | Hourly aggregation |
185+
| `0 */2 * * *` | Every 2 hours | Semi-frequent sync |
186+
| `0 9 * * 1-5` | 9am Mon–Fri | Business-hours task |
187+
| `0 2 * * *` | 2am daily | Off-peak batch (avoid midnight) |
188+
| `0 0 * * 0` | Midnight Sunday | Weekly maintenance |
189+
| `0 0 1 * *` | Midnight 1st of month | Monthly report |
190+
| `0 0 1 1 *` | Midnight Jan 1st | Annual task |
191+
192+
## Best Practices
193+
194+
- ✅ Always provide the plain-English description AND run the trap checklist.
195+
- ✅ Stagger midnight jobs to avoid the spike.
196+
- ✅ Prefer step values that divide 60 evenly (`*/5`, `*/15`, `*/30`).
197+
- ✅ Add a comment above every crontab line explaining intent.
198+
- ✅ Set an explicit timezone (`CRON_TZ`) on schedulers that support it.
199+
- ❌ Don't trust `crontab -l` validation — it only checks syntax, not semantics.
200+
- ❌ Don't restrict both day-of-month and day-of-week without confirming OR-logic.
201+
- ❌ Don't schedule everything at `0 0`.
202+
203+
## Common Pitfalls
204+
205+
- **Problem:** "My cron job isn't running."
206+
**Solution:** Check for an impossible date (trap #1) and confirm the daemon is
207+
running (`service cron status` / `systemctl status crond`). Verify the file
208+
ends with a newline and has correct ownership.
209+
210+
- **Problem:** "My job runs far more often than expected."
211+
**Solution:** You hit OR-semantics (trap #2). If both day-of-month and
212+
day-of-week are set, cron ORs them. Move one to `*` or guard in-script.
213+
214+
- **Problem:** "Intervals are uneven — sometimes 7 min, sometimes 4."
215+
**Solution:** Step value doesn't divide 60 evenly (trap #4). Use a divisor of 60.
216+
217+
- **Problem:** "My job works locally but not in the cluster."
218+
**Solution:** Timezone mismatch. Kubernetes `CronJob` and GitHub Actions default
219+
to UTC. Confirm `timeZone` / `TZ` is set as intended.
220+
221+
## Limitations
222+
223+
- This skill targets standard 5-field cron as implemented by Vixie cron, systemd
224+
timers, Kubernetes `CronJob`, GitHub Actions `schedule`, and most libraries. It
225+
does **not** validate Quartz 6/7-field expressions with seconds/years, nor
226+
non-standard `@reboot` / `L` / `#` extensions without a note.
227+
- Estimated annual fire counts assume a non-leap reference year; February 29
228+
schedules (trap #5) are flagged explicitly.
229+
- This skill does not replace environment-specific validation, testing, or expert
230+
review. Stop and ask for clarification if required inputs, permissions, or
231+
safety boundaries are missing.
232+
233+
## Related Skills
234+
235+
- `docker-expert` — when the cron job runs inside a container and the issue is the
236+
container/entrypoint rather than the schedule.
237+
- `kubernetes-deployment` — when validating a `CronJob` manifest's `spec.schedule`
238+
field alongside the broader resource config.
239+
240+
## Security & Safety Notes
241+
242+
This skill is read-only and `risk: safe`. The validation script performs no file
243+
writes, network calls, or mutations — it only parses and computes. It is safe to
244+
run against any cron expression without preconditions.

skills/cron-doctor/scripts/cli.js

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
#!/usr/bin/env node
2+
'use strict';
3+
4+
// Minimal CLI wrapper for cron-engine.js. Zero dependencies.
5+
// Usage:
6+
// node cli.js describe "<cron>"
7+
// node cli.js validate "<cron>"
8+
// node cli.js next "<cron>" [count]
9+
10+
const cron = require('./cron-engine.js');
11+
const expr = process.argv[3];
12+
const cmd = process.argv[2];
13+
14+
if (!cmd || !expr) {
15+
console.error('Usage: node cli.js <describe|validate|next> "<cron-expr>" [count]');
16+
console.error('Examples:');
17+
console.error(' node cli.js describe "*/5 * * * *"');
18+
console.error(' node cli.js validate "0 0 30 2 *"');
19+
console.error(' node cli.js next "0 9 * * 1-5" 5');
20+
process.exit(2);
21+
}
22+
23+
function safe(fn) {
24+
try {
25+
fn();
26+
} catch (e) {
27+
console.error('Error: ' + (e.message || e));
28+
process.exit(1);
29+
}
30+
}
31+
32+
switch (cmd) {
33+
case 'describe':
34+
safe(() => {
35+
const d = cron.describe(expr);
36+
console.log(d.text || d.description || JSON.stringify(d));
37+
});
38+
break;
39+
40+
case 'validate':
41+
safe(() => {
42+
const r = cron.validate(expr);
43+
console.log('valid: ' + r.valid);
44+
if (r.description) console.log('description: ' + r.description);
45+
if (r.warnings && r.warnings.length) {
46+
console.log('warnings:');
47+
r.warnings.forEach((w) => console.log(' - ' + w));
48+
}
49+
if (r.observations && r.observations.length) {
50+
console.log('observations:');
51+
r.observations.forEach((o) => console.log(' [' + (o.level || 'info') + '] ' + o.message));
52+
}
53+
if (r.suggestions && r.suggestions.length) {
54+
console.log('suggestions:');
55+
r.suggestions.forEach((s) => console.log(' [' + (s.level || 'info') + '] ' + s.message));
56+
}
57+
});
58+
break;
59+
60+
case 'next':
61+
safe(() => {
62+
const count = parseInt(process.argv[4] || '5', 10);
63+
const runs = cron.nextRuns(expr, new Date(), count);
64+
const formatted = cron.formatNextRuns(runs, new Date());
65+
formatted.forEach((f) =>
66+
console.log(f.relative + '\t' + f.formatted + '\t' + f.date.toString())
67+
);
68+
});
69+
break;
70+
71+
default:
72+
console.error('Unknown command: ' + cmd);
73+
console.error('Commands: describe, validate, next');
74+
process.exit(2);
75+
}

0 commit comments

Comments
 (0)