Skip to content

fix(compress): harden validation, CLI, benchmark, and default model#481

Open
brandonwie wants to merge 5 commits into
JuliusBrussee:mainfrom
brandonwie:fix/compress-validation-hardening
Open

fix(compress): harden validation, CLI, benchmark, and default model#481
brandonwie wants to merge 5 commits into
JuliusBrussee:mainfrom
brandonwie:fix/compress-validation-hardening

Conversation

@brandonwie

Copy link
Copy Markdown

Summary

Five fixes to the caveman-compress skill (in skills/caveman-compress/scripts/
and the synced copy). Each closes a correctness gap where the validator could
pass while the compressed output lost structure, or where a helper failed in an
install-only copy.

Fixes

  • Validation hardening (validate.py) — five gaps:
    • Variable-length fences: extract_inline_codes only stripped exact 3-char
      ```/~~~ fences, so backticks inside a 4+ backtick block were misread as
      inline code → false failures. Now reuses the CommonMark fence-aware scanner
      (closing fence ≥ opening length).
    • Indented code blocks: 4-space/tab blocks were treated as prose and could be
      silently rewritten while validation passed. Now extracted and preserved like
      fenced blocks.
    • Headings exact: a heading text/order/level change with equal count was only
      a warning (accepted by compress_file). Anchors/TOCs/cross-links depend on
      exact headings → now a hard error.
    • Redundant warning: a count mismatch fired both an error and the text-change
      warning; the text branch is now elif → one error per condition, no noise.
    • File paths exact: a dropped/changed referenced path was only a warning → a
      lost path is now a hard error; an extra path-like token only in the output
      stays a warning (PATH_REGEX is broad; caveman prose like "pros/cons" can
      coincidentally match).
  • Extensionless build files (detect.py)SKIP_EXTENSIONS only matched the
    rare *.dockerfile/*.makefile forms, so real Dockerfile/Makefile
    (extensionless or Dockerfile.prod) fell through and could be compressed as
    prose. Adds a case-insensitive basename match that runs before the extension
    rules.
  • Benchmark degrades gracefully (benchmark.py) — glob/no-args mode called
    sys.exit(1) when tests/caveman-compress was absent (an install copy without
    the repo's tests/ tree). Benchmarking bundled fixtures is optional → prints a
    note and exits 0. Direct file-pair mode unchanged.
  • CLI usage (cli.py) — corrects the printed usage to the module invocation.
  • Default model (compress.py) — bumps the default to claude-sonnet-4-6.

Tests

  • Adds tests/test_validate_inline.py: fail/pass coverage for variable-length
    fences, indented blocks, heading-exact failures (text/level/count, no redundant
    warning), and path-loss-vs-prose.
  • Corrects the claude-md-project compressed fixture, which had mutated a heading
    level (#####) — that encoded the old warn-only behavior and now fails the
    headings-exact rule.

Local verification on this branch:

  • python3 -m unittest tests.test_validate_inline tests.test_compress_safety29 tests OK
  • python3 tests/verify_repo.pyall checks passed

Co-authored-by: claude noreply@anthropic.com

brandonwie and others added 5 commits June 1, 2026 21:47
The in-repo default was the stale id claude-sonnet-4-5. Update it to the
current claude-sonnet-4-6 and add a pointer to the canonical model list so
it does not drift again. The CAVEMAN_MODEL env override still takes priority.

Co-authored-by: claude <noreply@anthropic.com>
The usage/help text advertised 'caveman <filepath>', but there is no
guaranteed caveman executable for this skill — the documented invocation is
'python3 -m scripts <filepath>' from the skill dir. Fix the module docstring
and print_usage() to match.

Co-authored-by: claude <noreply@anthropic.com>
Glob/no-args mode called sys.exit(1) when tests/caveman-compress was absent
(e.g. an install copy without the repo's tests/ tree). Benchmarking the
bundled fixtures is an optional helper, so print a clear note and exit 0
instead. Direct file-pair mode is unchanged.

Co-authored-by: claude <noreply@anthropic.com>
SKIP_EXTENSIONS only carried .dockerfile/.makefile, which match the rare
'*.dockerfile' form. Real build files are extensionless (Dockerfile,
Makefile) or carry an unrelated suffix (Dockerfile.prod), so they fell
through to the content sniff and could be classified as natural language and
compressed. Add a case-insensitive basename match (dockerfile/makefile with
optional suffix) that runs before the extension rules so they are skipped.

Co-authored-by: claude <noreply@anthropic.com>
…ths)

Five reviewer-flagged validation gaps in validate.py:

- Variable-length fences: extract_inline_codes only stripped exact 3-char
  ```/~~~ fences, so backticks inside a 4+ backtick block were treated as
  inline code -> false failures. It now reuses the line-based, CommonMark
  fence-aware scanner (closing fence >= opening length).
- Indented code blocks: 4-space / tab blocks were treated as prose and could
  be silently rewritten while validation passed. They are now extracted and
  preserved like fenced blocks.
- Headings exact: a heading text/order/level change with equal count was only
  a warning, which compress_file accepted. Anchors, TOCs, and cross-links
  depend on exact headings, so it is now a hard error.
- Redundant warning: a count mismatch fired both an error and the text-change
  warning. The text branch is now elif, and (per the headings-exact change)
  emits an error -- so a count mismatch = one error, equal-count text change =
  one error, no noise.
- File paths exact: a dropped/changed referenced path was only a warning.
  A LOST path is now a hard error; an extra path-like token that appears only
  in the compressed output stays a warning, because PATH_REGEX is broad and
  caveman prose ("pros/cons", "req/min") can coincidentally match.

Tests: add fail/pass coverage for variable-length fences, indented blocks,
heading-exact failures (text/level/count, no redundant warning), and
path-loss-vs-prose. The claude-md-project compressed fixture mutated a
heading level (### -> ##); that encoded the old warn-only behavior and now
fails the headings-exact rule, so the fixture is corrected to keep the
original level (a correct compression).

Co-authored-by: claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 1, 2026 14:17

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR tightens the caveman-compress validator’s Markdown fidelity checks by improving code block detection (variable-length fences + indented blocks) and by escalating heading/path mismatches from warnings to errors, with new tests covering the edge cases.

Changes:

  • Add robust code block scanning to exclude fenced/indented blocks from inline-code detection and to extract code blocks verbatim.
  • Make heading changes and lost file paths hard validation errors; keep “added path-like tokens” as warnings.
  • Improve detection of extensionless build files (Dockerfile/Makefile) and update CLI/help text + benchmark behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_validate_inline.py Adds regression tests for variable-length fences, indented code blocks, heading strictness, and path loss behavior.
tests/caveman-compress/claude-md-project.md Updates a heading level in a bundled Markdown fixture.
skills/caveman-compress/scripts/validate.py Implements _scan_code_blocks, updates inline-code extraction, and makes heading/path checks stricter.
skills/caveman-compress/scripts/detect.py Skips extensionless Dockerfile/Makefile variants via basename regex.
skills/caveman-compress/scripts/compress.py Updates default Anthropic model name and adds a reference comment.
skills/caveman-compress/scripts/cli.py Updates usage text to python3 -m scripts <filepath>.
skills/caveman-compress/scripts/benchmark.py Makes missing fixtures non-fatal (prints note and returns).
plugins/caveman/skills/caveman-compress/scripts/validate.py Mirrors validator changes for the plugin copy.
plugins/caveman/skills/caveman-compress/scripts/detect.py Mirrors basename skipping for extensionless build files.
plugins/caveman/skills/caveman-compress/scripts/compress.py Mirrors default model update.
plugins/caveman/skills/caveman-compress/scripts/cli.py Mirrors usage text update.
plugins/caveman/skills/caveman-compress/scripts/benchmark.py Mirrors benchmark handling of missing fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for start, end, _ in _scan_code_blocks(lines):
block_lines.update(range(start, end))
kept = [line for idx, line in enumerate(lines) if idx not in block_lines]
return re.findall(r"`([^`]+)`", "\n".join(kept))
Comment on lines +193 to +198
lost = p1 - p2
added = p2 - p1
if lost:
result.add_error(f"File path(s) lost in compression: {lost}")
if added:
result.add_warning(f"Path-like tokens added (possibly prose): {added}")
Comment on lines +91 to +93
# Unclosed fences are silently skipped — they indicate malformed
# markdown and including them would cause false-positive failures.
continue
Comment on lines +41 to +49
def _is_blank(line):
return line.strip() == ""


def _is_indented_code(line):
"""A line is indented-code content if it starts with a tab or >=4 spaces."""
if line.startswith("\t"):
return True
return len(line) - len(line.lstrip(" ")) >= 4

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33b3d7e251

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +195 to +196
if lost:
result.add_error(f"File path(s) lost in compression: {lost}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid failing on slash-separated prose

When original prose contains slash terms that PATH_REGEX admits (for example input/output, and/or, or pros/cons) and compression rewrites or drops them, this new hard error rejects the file even though no real file path was lost; the synced plugin copy has the same branch. Since the regex is already documented as broad, keep ambiguous no-prefix matches as warnings or only hard-fail definite paths such as ./, /, ../, drive paths, or extension-bearing paths.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants