fix(compress): harden validation, CLI, benchmark, and default model by brandonwie · Pull Request #481 · JuliusBrussee/caveman

brandonwie · 2026-06-01T14:17:52Z

Summary

Five fixes to the caveman-compress skill (in skills/caveman-compress/scripts/
and the synced copy). Each closes a correctness gap where the validator could
pass while the compressed output lost structure, or where a helper failed in an
install-only copy.

Fixes

Validation hardening (validate.py) — five gaps:
- Variable-length fences: extract_inline_codes only stripped exact 3-char
```/~~~ fences, so backticks inside a 4+ backtick block were misread as
inline code → false failures. Now reuses the CommonMark fence-aware scanner
(closing fence ≥ opening length).
- Indented code blocks: 4-space/tab blocks were treated as prose and could be
  silently rewritten while validation passed. Now extracted and preserved like
  fenced blocks.
- Headings exact: a heading text/order/level change with equal count was only
  a warning (accepted by compress_file). Anchors/TOCs/cross-links depend on
  exact headings → now a hard error.
- Redundant warning: a count mismatch fired both an error and the text-change
  warning; the text branch is now elif → one error per condition, no noise.
- File paths exact: a dropped/changed referenced path was only a warning → a
  lost path is now a hard error; an extra path-like token only in the output
  stays a warning (PATH_REGEX is broad; caveman prose like "pros/cons" can
  coincidentally match).
Extensionless build files (detect.py) — SKIP_EXTENSIONS only matched the
rare *.dockerfile/*.makefile forms, so real Dockerfile/Makefile
(extensionless or Dockerfile.prod) fell through and could be compressed as
prose. Adds a case-insensitive basename match that runs before the extension
rules.
Benchmark degrades gracefully (benchmark.py) — glob/no-args mode called
sys.exit(1) when tests/caveman-compress was absent (an install copy without
the repo's tests/ tree). Benchmarking bundled fixtures is optional → prints a
note and exits 0. Direct file-pair mode unchanged.
CLI usage (cli.py) — corrects the printed usage to the module invocation.
Default model (compress.py) — bumps the default to claude-sonnet-4-6.

Tests

Adds tests/test_validate_inline.py: fail/pass coverage for variable-length
fences, indented blocks, heading-exact failures (text/level/count, no redundant
warning), and path-loss-vs-prose.
Corrects the claude-md-project compressed fixture, which had mutated a heading
level (### → ##) — that encoded the old warn-only behavior and now fails the
headings-exact rule.

Local verification on this branch:

python3 -m unittest tests.test_validate_inline tests.test_compress_safety → 29 tests OK
python3 tests/verify_repo.py → all checks passed

Co-authored-by: claude noreply@anthropic.com

The in-repo default was the stale id claude-sonnet-4-5. Update it to the current claude-sonnet-4-6 and add a pointer to the canonical model list so it does not drift again. The CAVEMAN_MODEL env override still takes priority. Co-authored-by: claude <noreply@anthropic.com>

The usage/help text advertised 'caveman <filepath>', but there is no guaranteed caveman executable for this skill — the documented invocation is 'python3 -m scripts <filepath>' from the skill dir. Fix the module docstring and print_usage() to match. Co-authored-by: claude <noreply@anthropic.com>

Glob/no-args mode called sys.exit(1) when tests/caveman-compress was absent (e.g. an install copy without the repo's tests/ tree). Benchmarking the bundled fixtures is an optional helper, so print a clear note and exit 0 instead. Direct file-pair mode is unchanged. Co-authored-by: claude <noreply@anthropic.com>

SKIP_EXTENSIONS only carried .dockerfile/.makefile, which match the rare '*.dockerfile' form. Real build files are extensionless (Dockerfile, Makefile) or carry an unrelated suffix (Dockerfile.prod), so they fell through to the content sniff and could be classified as natural language and compressed. Add a case-insensitive basename match (dockerfile/makefile with optional suffix) that runs before the extension rules so they are skipped. Co-authored-by: claude <noreply@anthropic.com>

…ths) Five reviewer-flagged validation gaps in validate.py: - Variable-length fences: extract_inline_codes only stripped exact 3-char ```/~~~ fences, so backticks inside a 4+ backtick block were treated as inline code -> false failures. It now reuses the line-based, CommonMark fence-aware scanner (closing fence >= opening length). - Indented code blocks: 4-space / tab blocks were treated as prose and could be silently rewritten while validation passed. They are now extracted and preserved like fenced blocks. - Headings exact: a heading text/order/level change with equal count was only a warning, which compress_file accepted. Anchors, TOCs, and cross-links depend on exact headings, so it is now a hard error. - Redundant warning: a count mismatch fired both an error and the text-change warning. The text branch is now elif, and (per the headings-exact change) emits an error -- so a count mismatch = one error, equal-count text change = one error, no noise. - File paths exact: a dropped/changed referenced path was only a warning. A LOST path is now a hard error; an extra path-like token that appears only in the compressed output stays a warning, because PATH_REGEX is broad and caveman prose ("pros/cons", "req/min") can coincidentally match. Tests: add fail/pass coverage for variable-length fences, indented blocks, heading-exact failures (text/level/count, no redundant warning), and path-loss-vs-prose. The claude-md-project compressed fixture mutated a heading level (### -> ##); that encoded the old warn-only behavior and now fails the headings-exact rule, so the fixture is corrected to keep the original level (a correct compression). Co-authored-by: claude <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR tightens the caveman-compress validator’s Markdown fidelity checks by improving code block detection (variable-length fences + indented blocks) and by escalating heading/path mismatches from warnings to errors, with new tests covering the edge cases.

Changes:

Add robust code block scanning to exclude fenced/indented blocks from inline-code detection and to extract code blocks verbatim.
Make heading changes and lost file paths hard validation errors; keep “added path-like tokens” as warnings.
Improve detection of extensionless build files (Dockerfile/Makefile) and update CLI/help text + benchmark behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_validate_inline.py	Adds regression tests for variable-length fences, indented code blocks, heading strictness, and path loss behavior.
tests/caveman-compress/claude-md-project.md	Updates a heading level in a bundled Markdown fixture.
skills/caveman-compress/scripts/validate.py	Implements `_scan_code_blocks`, updates inline-code extraction, and makes heading/path checks stricter.
skills/caveman-compress/scripts/detect.py	Skips extensionless Dockerfile/Makefile variants via basename regex.
skills/caveman-compress/scripts/compress.py	Updates default Anthropic model name and adds a reference comment.
skills/caveman-compress/scripts/cli.py	Updates usage text to `python3 -m scripts <filepath>`.
skills/caveman-compress/scripts/benchmark.py	Makes missing fixtures non-fatal (prints note and returns).
plugins/caveman/skills/caveman-compress/scripts/validate.py	Mirrors validator changes for the plugin copy.
plugins/caveman/skills/caveman-compress/scripts/detect.py	Mirrors basename skipping for extensionless build files.
plugins/caveman/skills/caveman-compress/scripts/compress.py	Mirrors default model update.
plugins/caveman/skills/caveman-compress/scripts/cli.py	Mirrors usage text update.
plugins/caveman/skills/caveman-compress/scripts/benchmark.py	Mirrors benchmark handling of missing fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    for start, end, _ in _scan_code_blocks(lines):
+        block_lines.update(range(start, end))
+    kept = [line for idx, line in enumerate(lines) if idx not in block_lines]
+    return re.findall(r"`([^`]+)`", "\n".join(kept))


+    lost = p1 - p2
+    added = p2 - p1
+    if lost:
+        result.add_error(f"File path(s) lost in compression: {lost}")
+    if added:
+        result.add_warning(f"Path-like tokens added (possibly prose): {added}")


+            # Unclosed fences are silently skipped — they indicate malformed
+            # markdown and including them would cause false-positive failures.
+            continue


+def _is_blank(line):
+    return line.strip() == ""
+
+
+def _is_indented_code(line):
+    """A line is indented-code content if it starts with a tab or >=4 spaces."""
+    if line.startswith("\t"):
+        return True
+    return len(line) - len(line.lstrip(" ")) >= 4


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33b3d7e251

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T14:20:41Z

+    if lost:
+        result.add_error(f"File path(s) lost in compression: {lost}")


Avoid failing on slash-separated prose

When original prose contains slash terms that PATH_REGEX admits (for example input/output, and/or, or pros/cons) and compression rewrites or drops them, this new hard error rejects the file even though no real file path was lost; the synced plugin copy has the same branch. Since the regex is already documented as broad, keep ambiguous no-prefix matches as warnings or only hard-fail definite paths such as ./, /, ../, drive paths, or extension-bearing paths.

Useful? React with 👍 / 👎.

brandonwie and others added 5 commits June 1, 2026 21:47

Copilot AI review requested due to automatic review settings June 1, 2026 14:17

Copilot AI reviewed Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(compress): harden validation, CLI, benchmark, and default model#481

fix(compress): harden validation, CLI, benchmark, and default model#481
brandonwie wants to merge 5 commits into
JuliusBrussee:mainfrom
brandonwie:fix/compress-validation-hardening

brandonwie commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if lost:
		result.add_error(f"File path(s) lost in compression: {lost}")

Uh oh!

Conversation

brandonwie commented Jun 1, 2026

Summary

Fixes

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants