Add summarize-changes skill for snapshot PR review

wmitsuda · claude · wmitsuda · commit bece18ea69b9 · 2026-02-17T00:25:59.000-03:00
Adds a Claude Code skill that analyzes snapshot automation PRs and
produces structured review reports with hash change detection,
unexpected deletion alerts, merged range tables, and new MDBX data
file listings.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/.claude/skills/summarize-changes/SKILL.md b/.claude/skills/summarize-changes/SKILL.md
@@ -0,0 +1,168 @@
+---
+name: summarize-changes
+description: Analyze snapshot automation PR changes and produce a structured review report. Use when asked to summarize or review a snapshot PR.
+argument-hint: "<PR number or URL>"
+---
+
+Analyze the changes in a snapshot automation PR and produce a structured review report.
+
+The PR to analyze: $PR_URL_OR_NUMBER
+
+If $PR_URL_OR_NUMBER is a full URL, extract the PR number from it. If it is just a number, use it directly. The repo is erigontech/erigon-snapshot.
+
+## Step 1: Fetch PR Data and Run Analysis
+
+Run these commands using the Bash tool:
+
+1. `gh pr view <number> --repo erigontech/erigon-snapshot` to get PR title, description, and metadata
+2. Save the diff to a temp file and run the analysis script:
+   ```
+   gh pr diff <number> --repo erigontech/erigon-snapshot > /tmp/pr_diff.txt && python3 "$(git rev-parse --show-toplevel)/.claude/skills/summarize-changes/analyze_diff.py" /tmp/pr_diff.txt
+   ```
+
+The script (`analyze_diff.py` in the skill directory) parses the diff, classifies all changes, and outputs structured sections. Use its output to build the final report.
+
+### What the script detects
+
+- **Hash Changes**: same filename in both removed and added sets with different hash (CRITICAL)
+- **Range Merges**: multiple smaller removed ranges replaced by a single larger added range
+- **Version Upgrades**: removed entries at one version replaced by added entries at a newer version
+- **New Data Pruned from MDBX**: added entries beyond the previously highest block number
+- **Unexpected Deletions**: removed entries not covered by any of the above (CRITICAL)
+
+## Step 4: Generate Report
+
+### File grouping
+
+Throughout the ENTIRE report, group files into these high-level sections:
+
+- **State Snapshots**: files under `accessor/`, `domain/`, `history/`, `idx/`
+- **CL Snapshots** (Consensus Layer): files under `caplin/`
+- **EL Block Snapshots** (Execution Layer): root-level files with version-range patterns (bodies, headers, transactions, beaconblocks and their indices)
+- **Other Files**: root-level files without version-range patterns (e.g., `salt-blocks.txt`, `salt-state.txt`)
+
+Apply this grouping to ALL sections: Hash Changes, Unexpected Deletions, Merged Ranges, New Data Pruned from MDBX, Version Upgrades, and Other Changes.
+
+### Output structure
+
+---
+
+## Snapshot PR Summary: [PR Title]
+
+**PR:** #N | **Chain:** chain | **Top Block:** X
+
+---
+
+### HASH CHANGES
+
+If hash changes exist, use 🚨 emoji and show:
+
+### 🚨🚨🚨 HASH CHANGES — ACTION REQUIRED
+
+> **Changed hashes mean existing snapshot content was regenerated. Nodes that already downloaded the old version will have mismatched data.**
+
+Group changed files by State Snapshots / CL Snapshots / EL Block Snapshots:
+
+| File | Old Hash | New Hash |
+|------|----------|----------|
+| ... | ... | ... |
+
+If NO hash changes, use ✅ emoji:
+
+### ✅ Hash Changes
+
+**No hash changes detected.** All existing files retain their original hashes.
+
+---
+
+### UNEXPECTED DELETIONS
+
+If unexpected deletions exist, use 🚨 emoji and show:
+
+### 🚨🚨🚨 UNEXPECTED DELETIONS — ACTION REQUIRED
+
+> **Deleted files not accounted for by merges or version upgrades could mean data loss.**
+
+Group by State Snapshots / CL Snapshots / EL Block Snapshots. List each with its range and explain why it's concerning.
+
+If NO unexpected deletions, use ✅ emoji:
+
+### ✅ Unexpected Deletions
+
+**No unexpected deletions detected.** All removed files are accounted for by range merges or version upgrades.
+
+---
+
+### Merged Ranges
+
+Present merges in a table format, with one table per high-level group (State Snapshots / CL Snapshots / EL Block Snapshots). Sort rows by subdir (accessor, domain, history, idx, caplin, or root) then by snapshot type (datatype).
+
+If a merge also involves a version upgrade, note it in the Notes column.
+
+Table format:
+
+| Subdir | Type | Ext | Old Ranges | New Range | Notes |
+|--------|------|-----|------------|-----------|-------|
+| accessor | code | .vi | 32-40, 40-44, 44-46 | 32-48 [v1.1] | |
+| domain | accounts | .kv | 32-40, 40-44, 44-46 | 32-48 [v2.0] | cross-version: absorbs v1.1 |
+
+When multiple types share the exact same merge pattern (same old ranges, same new range, same version), they can be combined into a single row with types comma-separated.
+
+IMPORTANT: Keep table columns narrow so they render as proper tables in the terminal. When a merge has more than 3 old ranges, split them across multiple continuation rows. Each continuation row has empty cells for all columns except Old Ranges. Example with 8 old ranges:
+
+| Subdir | Type | Ext | Old Ranges | New Range | Notes |
+|--------|------|-----|------------|-----------|-------|
+| (root) | bodies, headers, txns | .seg | 2100-2110, 2110-2120, 2120-2130 | 2100-2200 [v1.1] | |
+| | | | 2130-2140, 2140-2150 | | |
+| | | | 2150-2151, 2151-2152, 2152-2153 | | |
+
+This keeps each row under ~40 chars in the Old Ranges column so the terminal renders it as a proper table.
+
+---
+
+### New Data Pruned from MDBX
+
+Present in a table format, with one table per high-level group (State Snapshots / CL Snapshots / EL Block Snapshots). List every individual file, one per row.
+
+Table format:
+
+| Subdir | File |
+|--------|------|
+| accessor | accessor/v1.1-code.48-50.vi |
+| | accessor/v1.1-commitment.48-50.vi |
+| | accessor/v1.1-rcache.48-50.vi |
+| | accessor/v1.1-storage.48-50.vi |
+| domain | domain/v1.1-accounts.48-50.bt |
+| | domain/v1.1-accounts.48-50.kvei |
+
+Use continuation rows (empty Subdir cell) for subsequent files in the same subdir. Start a new subdir label when the subdir changes. Files are sorted by: subdir, extension, snapshot type, range, then version.
+
+---
+
+### Version Upgrades
+
+Group by State Snapshots / CL Snapshots / EL Block Snapshots. List version transitions (e.g., v1.1 -> v2.0) by category and datatype.
+
+---
+
+### Other Changes
+
+Any changes to Other Files (salt files, etc.) or anything not fitting the above categories. If none, say "No other changes."
+
+## Step 5: Offer to Post as PR Comment
+
+After displaying the full report, ask the user if they want you to post it as a comment on the PR.
+
+If the user confirms, post the report as a GitHub PR comment using:
+
+```
+gh pr comment <number> --repo erigontech/erigon-snapshot --body-file /tmp/pr_comment.txt
+```
+
+Before posting, write the comment body to `/tmp/pr_comment.txt`. The comment MUST start with the following header before the report content:
+
+```
+> 🤖 This report was generated by [Claude Code](https://claude.ai/claude-code).
+```
+
+Then include the full report (everything from `## Snapshot PR Summary` onward).
diff --git a/.claude/skills/summarize-changes/analyze_diff.py b/.claude/skills/summarize-changes/analyze_diff.py
@@ -0,0 +1,204 @@
+#!/usr/bin/env python3
+"""Analyze a snapshot PR diff file and classify all changes.
+
+Usage: python3 analyze_diff.py <diff_file>
+
+The diff file should be the raw output of `gh pr diff`.
+Outputs structured text sections for hash changes, merges, version upgrades,
+new data pruned from MDBX, and unexpected deletions.
+"""
+
+import re
+import sys
+from collections import defaultdict
+
+
+def parse_diff(path):
+    removed = {}
+    added = {}
+    with open(path) as f:
+        for line in f:
+            line = line.rstrip("\n")
+            if not line or line[0] not in ("+", "-"):
+                continue
+            m = re.match(r"^([+-])'([^']+)'\s*=\s*'([a-f0-9]+)'", line)
+            if not m:
+                continue
+            sign, fname, hsh = m.group(1), m.group(2), m.group(3)
+            if sign == "-":
+                removed[fname] = hsh
+            else:
+                added[fname] = hsh
+    return removed, added
+
+
+def parse_filename(fname):
+    if fname in ("salt-blocks.txt", "salt-state.txt"):
+        return {"cat": "other", "fname": fname}
+    m = re.match(r"^caplin/(v[\d.]+)-(\d+)-(\d+)-([^.]+)\.(.+)$", fname)
+    if m:
+        return {"cat": "caplin", "ver": m[1], "s": int(m[2]), "e": int(m[3]), "dt": m[4], "ext": m[5]}
+    m = re.match(r"^(accessor|domain|history|idx)/(v[\d.]+)-([^.]+)\.(\d+)-(\d+)\.(.+)$", fname)
+    if m:
+        return {"cat": m[1], "ver": m[2], "dt": m[3], "s": int(m[4]), "e": int(m[5]), "ext": m[6]}
+    m = re.match(r"^(v[\d.]+)-(\d+)-(\d+)-(transactions-to-block|[^.]+)\.(.+)$", fname)
+    if m:
+        return {"cat": "blocks", "ver": m[1], "dt": m[4], "s": int(m[2]), "e": int(m[3]), "ext": m[5]}
+    return {"cat": "unknown", "fname": fname}
+
+
+def hgroup(cat):
+    if cat in ("accessor", "domain", "history", "idx"):
+        return "state"
+    if cat == "caplin":
+        return "cl"
+    if cat == "blocks":
+        return "el"
+    return "other"
+
+
+def classify(removed, added):
+    # 1. Hash changes
+    hash_changes = []
+    for fname in removed:
+        if fname in added and removed[fname] != added[fname]:
+            hash_changes.append((fname, removed[fname], added[fname]))
+    hash_change_fnames = set(f for f, _, _ in hash_changes)
+
+    # 2. Build groups by (cat, dt, ext)
+    groups = defaultdict(lambda: {"rem": [], "add": []})
+    for f, h in removed.items():
+        if f in hash_change_fnames:
+            continue
+        info = parse_filename(f)
+        if info["cat"] in ("other", "unknown"):
+            continue
+        groups[(info["cat"], info["dt"], info["ext"])]["rem"].append({**info, "fname": f})
+    for f, h in added.items():
+        if f in hash_change_fnames:
+            continue
+        info = parse_filename(f)
+        if info["cat"] in ("other", "unknown"):
+            continue
+        groups[(info["cat"], info["dt"], info["ext"])]["add"].append({**info, "fname": f})
+
+    merges = []
+    version_upgrades_list = []
+    frontier = []
+    unexpected = []
+    explained_r = set()
+    explained_a = set()
+
+    for key, data in sorted(groups.items()):
+        cat, dt, ext = key
+        rem = sorted(data["rem"], key=lambda x: (x["s"], x["e"]))
+        add = sorted(data["add"], key=lambda x: (x["s"], x["e"]))
+
+        for a in add:
+            covered = [r for r in rem if r["s"] >= a["s"] and r["e"] <= a["e"] and r["fname"] not in explained_r]
+            if covered:
+                old_vers = list(set(r["ver"] for r in covered))
+                is_vu = a["ver"] not in set(r["ver"] for r in covered)
+                info = {
+                    "cat": cat, "dt": dt, "ext": ext,
+                    "rem_ranges": [(r["s"], r["e"], r["ver"]) for r in covered],
+                    "add_range": (a["s"], a["e"], a["ver"]),
+                    "is_vu": is_vu, "old_vers": old_vers, "new_ver": a["ver"],
+                }
+                if is_vu:
+                    version_upgrades_list.append(info)
+                if len(covered) >= 2 or (len(covered) == 1 and (covered[0]["s"] != a["s"] or covered[0]["e"] != a["e"])):
+                    merges.append(info)
+                for r in covered:
+                    explained_r.add(r["fname"])
+                explained_a.add(a["fname"])
+
+        for a in add:
+            if a["fname"] not in explained_a:
+                if not any(r["s"] < a["e"] and r["e"] > a["s"] for r in rem):
+                    frontier.append({"cat": cat, "dt": dt, "ext": ext, "s": a["s"], "e": a["e"], "ver": a["ver"], "fname": a["fname"]})
+                    explained_a.add(a["fname"])
+
+        for r in rem:
+            if r["fname"] not in explained_r:
+                unexpected.append({"cat": cat, "dt": dt, "ext": ext, "s": r["s"], "e": r["e"], "ver": r["ver"], "fname": r["fname"]})
+
+    return hash_changes, merges, version_upgrades_list, frontier, unexpected
+
+
+def print_report(removed, added, hash_changes, merges, version_upgrades_list, frontier, unexpected):
+    # Hash changes
+    print("=== HASH CHANGES ===")
+    for f, oh, nh in hash_changes:
+        p = parse_filename(f)
+        print(f"  [{hgroup(p['cat'])}] {f}  old={oh}  new={nh}")
+    print(f"  count={len(hash_changes)}")
+
+    # Unexpected deletions
+    print("=== UNEXPECTED DELETIONS ===")
+    for u in unexpected:
+        print(f"  [{hgroup(u['cat'])}] {u['fname']}")
+    print(f"  count={len(unexpected)}")
+
+    # Merges table
+    print("=== MERGES TABLE ===")
+    mp = defaultdict(list)
+    for m in merges:
+        rr = tuple((s, e) for s, e, v in m["rem_ranges"])
+        ar = (m["add_range"][0], m["add_range"][1])
+        ov = tuple(sorted(m["old_vers"]))
+        pk = (hgroup(m["cat"]), m["cat"], rr, ar, m["new_ver"], ov, m["is_vu"])
+        mp[pk].append(f"{m['dt']} (.{m['ext']})")
+
+    for (hg, cat, rr, ar, nv, ov, is_vu), items in sorted(mp.items()):
+        old_r = ", ".join(f"{s}-{e}" for s, e in rr)
+        note = ""
+        if is_vu:
+            note = f"cross-version: absorbs {','.join(ov)}"
+        else:
+            mixed = [v for v in ov if v != nv]
+            if mixed:
+                note = f"cross-version: absorbs {','.join(mixed)}"
+        types_str = ", ".join(sorted(set(items)))
+        print(f"  [{hg}] | {cat} | {types_str} | {old_r} | {ar[0]}-{ar[1]} [{nv}] | {note}")
+
+    # Version upgrades
+    print("=== VERSION UPGRADES ===")
+    vup = defaultdict(list)
+    for vu in version_upgrades_list:
+        vk = (hgroup(vu["cat"]), vu["cat"], tuple(sorted(vu["old_vers"])), vu["new_ver"])
+        rr = [(s, e) for s, e, v in vu["rem_ranges"]]
+        vup[vk].append(f"{vu['dt']} (.{vu['ext']}): {', '.join(f'{s}-{e}' for s, e in rr)} -> {vu['add_range'][0]}-{vu['add_range'][1]}")
+    for (hg, cat, ov, nv), items in sorted(vup.items()):
+        print(f"  [{hg}] {cat}: {','.join(ov)} -> {nv}")
+        for i in sorted(items):
+            print(f"    {i}")
+
+    # Frontier / new data pruned from MDBX
+    print("=== NEW DATA PRUNED FROM MDBX ===")
+    fg = defaultdict(list)
+    for f in frontier:
+        fg[(hgroup(f["cat"]), f["cat"])].append(f)
+    for (hg, cat), items in sorted(fg.items()):
+        items_sorted = sorted(items, key=lambda x: (x["ext"], x["dt"], x["s"], x["e"], x["ver"]))
+        print(f"  [{hg}] {cat}: {len(items)} files")
+        for item in items_sorted:
+            print(f"    {item['fname']}")
+
+    # Totals
+    print(f"=== TOTALS: removed={len(removed)} added={len(added)} hash_changes={len(hash_changes)} merges={len(merges)} vu={len(version_upgrades_list)} frontier={len(frontier)} unexpected={len(unexpected)} ===")
+
+
+def main():
+    if len(sys.argv) != 2:
+        print(f"Usage: {sys.argv[0]} <diff_file>", file=sys.stderr)
+        sys.exit(1)
+
+    diff_file = sys.argv[1]
+    removed, added = parse_diff(diff_file)
+    hash_changes, merges, version_upgrades_list, frontier, unexpected = classify(removed, added)
+    print_report(removed, added, hash_changes, merges, version_upgrades_list, frontier, unexpected)
+
+
+if __name__ == "__main__":
+    main()