fix(compress): harden validation, CLI, benchmark, and default model#481
fix(compress): harden validation, CLI, benchmark, and default model#481brandonwie wants to merge 5 commits into
Conversation
The in-repo default was the stale id claude-sonnet-4-5. Update it to the current claude-sonnet-4-6 and add a pointer to the canonical model list so it does not drift again. The CAVEMAN_MODEL env override still takes priority. Co-authored-by: claude <noreply@anthropic.com>
The usage/help text advertised 'caveman <filepath>', but there is no guaranteed caveman executable for this skill — the documented invocation is 'python3 -m scripts <filepath>' from the skill dir. Fix the module docstring and print_usage() to match. Co-authored-by: claude <noreply@anthropic.com>
Glob/no-args mode called sys.exit(1) when tests/caveman-compress was absent (e.g. an install copy without the repo's tests/ tree). Benchmarking the bundled fixtures is an optional helper, so print a clear note and exit 0 instead. Direct file-pair mode is unchanged. Co-authored-by: claude <noreply@anthropic.com>
SKIP_EXTENSIONS only carried .dockerfile/.makefile, which match the rare '*.dockerfile' form. Real build files are extensionless (Dockerfile, Makefile) or carry an unrelated suffix (Dockerfile.prod), so they fell through to the content sniff and could be classified as natural language and compressed. Add a case-insensitive basename match (dockerfile/makefile with optional suffix) that runs before the extension rules so they are skipped. Co-authored-by: claude <noreply@anthropic.com>
…ths)
Five reviewer-flagged validation gaps in validate.py:
- Variable-length fences: extract_inline_codes only stripped exact 3-char
```/~~~ fences, so backticks inside a 4+ backtick block were treated as
inline code -> false failures. It now reuses the line-based, CommonMark
fence-aware scanner (closing fence >= opening length).
- Indented code blocks: 4-space / tab blocks were treated as prose and could
be silently rewritten while validation passed. They are now extracted and
preserved like fenced blocks.
- Headings exact: a heading text/order/level change with equal count was only
a warning, which compress_file accepted. Anchors, TOCs, and cross-links
depend on exact headings, so it is now a hard error.
- Redundant warning: a count mismatch fired both an error and the text-change
warning. The text branch is now elif, and (per the headings-exact change)
emits an error -- so a count mismatch = one error, equal-count text change =
one error, no noise.
- File paths exact: a dropped/changed referenced path was only a warning.
A LOST path is now a hard error; an extra path-like token that appears only
in the compressed output stays a warning, because PATH_REGEX is broad and
caveman prose ("pros/cons", "req/min") can coincidentally match.
Tests: add fail/pass coverage for variable-length fences, indented blocks,
heading-exact failures (text/level/count, no redundant warning), and
path-loss-vs-prose. The claude-md-project compressed fixture mutated a
heading level (### -> ##); that encoded the old warn-only behavior and now
fails the headings-exact rule, so the fixture is corrected to keep the
original level (a correct compression).
Co-authored-by: claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR tightens the caveman-compress validator’s Markdown fidelity checks by improving code block detection (variable-length fences + indented blocks) and by escalating heading/path mismatches from warnings to errors, with new tests covering the edge cases.
Changes:
- Add robust code block scanning to exclude fenced/indented blocks from inline-code detection and to extract code blocks verbatim.
- Make heading changes and lost file paths hard validation errors; keep “added path-like tokens” as warnings.
- Improve detection of extensionless build files (Dockerfile/Makefile) and update CLI/help text + benchmark behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_validate_inline.py | Adds regression tests for variable-length fences, indented code blocks, heading strictness, and path loss behavior. |
| tests/caveman-compress/claude-md-project.md | Updates a heading level in a bundled Markdown fixture. |
| skills/caveman-compress/scripts/validate.py | Implements _scan_code_blocks, updates inline-code extraction, and makes heading/path checks stricter. |
| skills/caveman-compress/scripts/detect.py | Skips extensionless Dockerfile/Makefile variants via basename regex. |
| skills/caveman-compress/scripts/compress.py | Updates default Anthropic model name and adds a reference comment. |
| skills/caveman-compress/scripts/cli.py | Updates usage text to python3 -m scripts <filepath>. |
| skills/caveman-compress/scripts/benchmark.py | Makes missing fixtures non-fatal (prints note and returns). |
| plugins/caveman/skills/caveman-compress/scripts/validate.py | Mirrors validator changes for the plugin copy. |
| plugins/caveman/skills/caveman-compress/scripts/detect.py | Mirrors basename skipping for extensionless build files. |
| plugins/caveman/skills/caveman-compress/scripts/compress.py | Mirrors default model update. |
| plugins/caveman/skills/caveman-compress/scripts/cli.py | Mirrors usage text update. |
| plugins/caveman/skills/caveman-compress/scripts/benchmark.py | Mirrors benchmark handling of missing fixtures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for start, end, _ in _scan_code_blocks(lines): | ||
| block_lines.update(range(start, end)) | ||
| kept = [line for idx, line in enumerate(lines) if idx not in block_lines] | ||
| return re.findall(r"`([^`]+)`", "\n".join(kept)) |
| lost = p1 - p2 | ||
| added = p2 - p1 | ||
| if lost: | ||
| result.add_error(f"File path(s) lost in compression: {lost}") | ||
| if added: | ||
| result.add_warning(f"Path-like tokens added (possibly prose): {added}") |
| # Unclosed fences are silently skipped — they indicate malformed | ||
| # markdown and including them would cause false-positive failures. | ||
| continue |
| def _is_blank(line): | ||
| return line.strip() == "" | ||
|
|
||
|
|
||
| def _is_indented_code(line): | ||
| """A line is indented-code content if it starts with a tab or >=4 spaces.""" | ||
| if line.startswith("\t"): | ||
| return True | ||
| return len(line) - len(line.lstrip(" ")) >= 4 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33b3d7e251
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if lost: | ||
| result.add_error(f"File path(s) lost in compression: {lost}") |
There was a problem hiding this comment.
Avoid failing on slash-separated prose
When original prose contains slash terms that PATH_REGEX admits (for example input/output, and/or, or pros/cons) and compression rewrites or drops them, this new hard error rejects the file even though no real file path was lost; the synced plugin copy has the same branch. Since the regex is already documented as broad, keep ambiguous no-prefix matches as warnings or only hard-fail definite paths such as ./, /, ../, drive paths, or extension-bearing paths.
Useful? React with 👍 / 👎.
Summary
Five fixes to the
caveman-compressskill (inskills/caveman-compress/scripts/and the synced copy). Each closes a correctness gap where the validator could
pass while the compressed output lost structure, or where a helper failed in an
install-only copy.
Fixes
validate.py) — five gaps:extract_inline_codesonly stripped exact 3-char```/~~~fences, so backticks inside a 4+ backtick block were misread asinline code → false failures. Now reuses the CommonMark fence-aware scanner
(closing fence ≥ opening length).
silently rewritten while validation passed. Now extracted and preserved like
fenced blocks.
a warning (accepted by
compress_file). Anchors/TOCs/cross-links depend onexact headings → now a hard error.
warning; the text branch is now
elif→ one error per condition, no noise.lost path is now a hard error; an extra path-like token only in the output
stays a warning (PATH_REGEX is broad; caveman prose like "pros/cons" can
coincidentally match).
detect.py) —SKIP_EXTENSIONSonly matched therare
*.dockerfile/*.makefileforms, so realDockerfile/Makefile(extensionless or
Dockerfile.prod) fell through and could be compressed asprose. Adds a case-insensitive basename match that runs before the extension
rules.
benchmark.py) — glob/no-args mode calledsys.exit(1)whentests/caveman-compresswas absent (an install copy withoutthe repo's
tests/tree). Benchmarking bundled fixtures is optional → prints anote and exits 0. Direct file-pair mode unchanged.
cli.py) — corrects the printed usage to the module invocation.compress.py) — bumps the default toclaude-sonnet-4-6.Tests
tests/test_validate_inline.py: fail/pass coverage for variable-lengthfences, indented blocks, heading-exact failures (text/level/count, no redundant
warning), and path-loss-vs-prose.
claude-md-projectcompressed fixture, which had mutated a headinglevel (
###→##) — that encoded the old warn-only behavior and now fails theheadings-exact rule.
Local verification on this branch:
python3 -m unittest tests.test_validate_inline tests.test_compress_safety→ 29 tests OKpython3 tests/verify_repo.py→ all checks passedCo-authored-by: claude noreply@anthropic.com