Releases: razroo/iso
@razroo/iso-eval v0.1.0
Initial release of @razroo/iso-eval — behavioral eval runner for AI coding agents.
agentmd lints prompt structure, isolint lints prompt prose, iso-harness fans out the compiled source into every harness file layout. None of them answer did the agent actually do the task? — that's what iso-eval scores.
You give it a suite of tasks (baseline workspace + prompt + checks); it snapshots the workspace per trial, hands it to a runner, and verifies the resulting filesystem / command state against your checks.
v0.1 scope
- Deterministic `fake` runner (executes `$ …` lines as shell in the snapshotted workspace) — exercises the orchestration layer offline and in CI
- Checks: `command`, `file_exists`, `file_contains`, `file_not_contains`, `file_matches`, `llm_judge`
- Real-agent runners (`claude-code`, `codex`, `cursor-agent`) coming in v0.2; the library API already accepts any `RunnerFn` today
Install
```bash
npm install -D @razroo/iso-eval
iso-eval run eval.yml
```
See the package README for the full suite shape and library API.
iso-v0.1.1
Full Changelog: iso-harness-v0.2.0...iso-v0.1.1
iso-harness-v0.3.0
Full Changelog: iso-harness-v0.2.0...iso-harness-v0.3.0
iso-harness v0.2.0
What's new
iso-harness validate
Schema-check the iso/ source directory without writing anything:
iso-harness validate --source iso/
iso-harness validate --source iso/ --format json
Catches: missing command on an MCP server, non-string env values, duplicate agent names, unknown target-override keys (typos like cursor: skip written as Cursor: skip), non-string model fields, and empty descriptions/bodies.
build now gates on validation
iso-harness build runs the validator first and refuses to write output if the source has schema errors. Warnings (empty description, empty body, unknown-harness overrides) are surfaced in the build summary but do not block.
This is a behavior change: if your iso/ source had a latent schema bug, previous versions would silently generate wrong output across all four harnesses. 0.2.0 fails fast instead.
Test coverage
18 unit tests (was: 1 smoke-build script). Covers validation, build, source loading, frontmatter skip rules, TOML escaping, and the refuse-to-emit-on-error guarantee.
Full changelog: packages/iso-harness/CHANGELOG.md
agentmd v0.3.0
What's new
lint --format sarif
agentmd lint can now emit SARIF 2.1.0 for upload to GitHub code scanning. Same dialect as isolint --format sarif — driver name agentmd, rule IDs are the L-codes, severities map to error / warning / note.
agentmd lint prompts/*.md --format sarif > agentmd.sarif
# then upload with github/codeql-action/upload-sarif@v3
Full changelog: packages/agentmd/CHANGELOG.md
agentmd v0.2.0
Highlights
lintaccepts multiple files, shell globs, and stdin (-).- Machine-readable lint output:
--format json(for CI tools) and--format github(workflow annotations). render -reads source from stdin for pipeline composition.--flag=valueworks anywhere alongside--flag value.test --timeout <ms>kills hung backends instead of stalling.- New:
agentmd --version.
Parser
- Strip UTF-8 BOM and normalize CRLF / bare CR so Windows and editor-exported files parse correctly (previously the
# Agent:heading would silently fail). - Flag duplicate
# Agent:headings as warning L12 instead of silently dropping them.
Linter
- L9 split into L9a (agent heading), L9b (procedure), L9c (at least one rule) — each check is now individually filterable.
- L3 duplicate-ID diagnostics now carry a line number and point at the first definition.
Tests
- 85 tests (was 69). New subprocess CLI coverage for stdin, multi-file, globs, JSON/github formats,
--flag=value.
agentmd v0.1.0
First release of @razroo/agentmd — a structured-markdown dialect for authoring agent prompts, with a linter for structure and a fixture-driven harness that measures per-rule adherence against a target model.