Skip to content

Commit bece18e

Browse files
wmitsudaclaude
andcommitted
Add summarize-changes skill for snapshot PR review
Adds a Claude Code skill that analyzes snapshot automation PRs and produces structured review reports with hash change detection, unexpected deletion alerts, merged range tables, and new MDBX data file listings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 626f1fe commit bece18e

File tree

2 files changed

+372
-0
lines changed

2 files changed

+372
-0
lines changed
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
name: summarize-changes
3+
description: Analyze snapshot automation PR changes and produce a structured review report. Use when asked to summarize or review a snapshot PR.
4+
argument-hint: "<PR number or URL>"
5+
---
6+
7+
Analyze the changes in a snapshot automation PR and produce a structured review report.
8+
9+
The PR to analyze: $PR_URL_OR_NUMBER
10+
11+
If $PR_URL_OR_NUMBER is a full URL, extract the PR number from it. If it is just a number, use it directly. The repo is erigontech/erigon-snapshot.
12+
13+
## Step 1: Fetch PR Data and Run Analysis
14+
15+
Run these commands using the Bash tool:
16+
17+
1. `gh pr view <number> --repo erigontech/erigon-snapshot` to get PR title, description, and metadata
18+
2. Save the diff to a temp file and run the analysis script:
19+
```
20+
gh pr diff <number> --repo erigontech/erigon-snapshot > /tmp/pr_diff.txt && python3 "$(git rev-parse --show-toplevel)/.claude/skills/summarize-changes/analyze_diff.py" /tmp/pr_diff.txt
21+
```
22+
23+
The script (`analyze_diff.py` in the skill directory) parses the diff, classifies all changes, and outputs structured sections. Use its output to build the final report.
24+
25+
### What the script detects
26+
27+
- **Hash Changes**: same filename in both removed and added sets with different hash (CRITICAL)
28+
- **Range Merges**: multiple smaller removed ranges replaced by a single larger added range
29+
- **Version Upgrades**: removed entries at one version replaced by added entries at a newer version
30+
- **New Data Pruned from MDBX**: added entries beyond the previously highest block number
31+
- **Unexpected Deletions**: removed entries not covered by any of the above (CRITICAL)
32+
33+
## Step 4: Generate Report
34+
35+
### File grouping
36+
37+
Throughout the ENTIRE report, group files into these high-level sections:
38+
39+
- **State Snapshots**: files under `accessor/`, `domain/`, `history/`, `idx/`
40+
- **CL Snapshots** (Consensus Layer): files under `caplin/`
41+
- **EL Block Snapshots** (Execution Layer): root-level files with version-range patterns (bodies, headers, transactions, beaconblocks and their indices)
42+
- **Other Files**: root-level files without version-range patterns (e.g., `salt-blocks.txt`, `salt-state.txt`)
43+
44+
Apply this grouping to ALL sections: Hash Changes, Unexpected Deletions, Merged Ranges, New Data Pruned from MDBX, Version Upgrades, and Other Changes.
45+
46+
### Output structure
47+
48+
---
49+
50+
## Snapshot PR Summary: [PR Title]
51+
52+
**PR:** #N | **Chain:** chain | **Top Block:** X
53+
54+
---
55+
56+
### HASH CHANGES
57+
58+
If hash changes exist, use 🚨 emoji and show:
59+
60+
### 🚨🚨🚨 HASH CHANGES — ACTION REQUIRED
61+
62+
> **Changed hashes mean existing snapshot content was regenerated. Nodes that already downloaded the old version will have mismatched data.**
63+
64+
Group changed files by State Snapshots / CL Snapshots / EL Block Snapshots:
65+
66+
| File | Old Hash | New Hash |
67+
|------|----------|----------|
68+
| ... | ... | ... |
69+
70+
If NO hash changes, use ✅ emoji:
71+
72+
### ✅ Hash Changes
73+
74+
**No hash changes detected.** All existing files retain their original hashes.
75+
76+
---
77+
78+
### UNEXPECTED DELETIONS
79+
80+
If unexpected deletions exist, use 🚨 emoji and show:
81+
82+
### 🚨🚨🚨 UNEXPECTED DELETIONS — ACTION REQUIRED
83+
84+
> **Deleted files not accounted for by merges or version upgrades could mean data loss.**
85+
86+
Group by State Snapshots / CL Snapshots / EL Block Snapshots. List each with its range and explain why it's concerning.
87+
88+
If NO unexpected deletions, use ✅ emoji:
89+
90+
### ✅ Unexpected Deletions
91+
92+
**No unexpected deletions detected.** All removed files are accounted for by range merges or version upgrades.
93+
94+
---
95+
96+
### Merged Ranges
97+
98+
Present merges in a table format, with one table per high-level group (State Snapshots / CL Snapshots / EL Block Snapshots). Sort rows by subdir (accessor, domain, history, idx, caplin, or root) then by snapshot type (datatype).
99+
100+
If a merge also involves a version upgrade, note it in the Notes column.
101+
102+
Table format:
103+
104+
| Subdir | Type | Ext | Old Ranges | New Range | Notes |
105+
|--------|------|-----|------------|-----------|-------|
106+
| accessor | code | .vi | 32-40, 40-44, 44-46 | 32-48 [v1.1] | |
107+
| domain | accounts | .kv | 32-40, 40-44, 44-46 | 32-48 [v2.0] | cross-version: absorbs v1.1 |
108+
109+
When multiple types share the exact same merge pattern (same old ranges, same new range, same version), they can be combined into a single row with types comma-separated.
110+
111+
IMPORTANT: Keep table columns narrow so they render as proper tables in the terminal. When a merge has more than 3 old ranges, split them across multiple continuation rows. Each continuation row has empty cells for all columns except Old Ranges. Example with 8 old ranges:
112+
113+
| Subdir | Type | Ext | Old Ranges | New Range | Notes |
114+
|--------|------|-----|------------|-----------|-------|
115+
| (root) | bodies, headers, txns | .seg | 2100-2110, 2110-2120, 2120-2130 | 2100-2200 [v1.1] | |
116+
| | | | 2130-2140, 2140-2150 | | |
117+
| | | | 2150-2151, 2151-2152, 2152-2153 | | |
118+
119+
This keeps each row under ~40 chars in the Old Ranges column so the terminal renders it as a proper table.
120+
121+
---
122+
123+
### New Data Pruned from MDBX
124+
125+
Present in a table format, with one table per high-level group (State Snapshots / CL Snapshots / EL Block Snapshots). List every individual file, one per row.
126+
127+
Table format:
128+
129+
| Subdir | File |
130+
|--------|------|
131+
| accessor | accessor/v1.1-code.48-50.vi |
132+
| | accessor/v1.1-commitment.48-50.vi |
133+
| | accessor/v1.1-rcache.48-50.vi |
134+
| | accessor/v1.1-storage.48-50.vi |
135+
| domain | domain/v1.1-accounts.48-50.bt |
136+
| | domain/v1.1-accounts.48-50.kvei |
137+
138+
Use continuation rows (empty Subdir cell) for subsequent files in the same subdir. Start a new subdir label when the subdir changes. Files are sorted by: subdir, extension, snapshot type, range, then version.
139+
140+
---
141+
142+
### Version Upgrades
143+
144+
Group by State Snapshots / CL Snapshots / EL Block Snapshots. List version transitions (e.g., v1.1 -> v2.0) by category and datatype.
145+
146+
---
147+
148+
### Other Changes
149+
150+
Any changes to Other Files (salt files, etc.) or anything not fitting the above categories. If none, say "No other changes."
151+
152+
## Step 5: Offer to Post as PR Comment
153+
154+
After displaying the full report, ask the user if they want you to post it as a comment on the PR.
155+
156+
If the user confirms, post the report as a GitHub PR comment using:
157+
158+
```
159+
gh pr comment <number> --repo erigontech/erigon-snapshot --body-file /tmp/pr_comment.txt
160+
```
161+
162+
Before posting, write the comment body to `/tmp/pr_comment.txt`. The comment MUST start with the following header before the report content:
163+
164+
```
165+
> 🤖 This report was generated by [Claude Code](https://claude.ai/claude-code).
166+
```
167+
168+
Then include the full report (everything from `## Snapshot PR Summary` onward).
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
#!/usr/bin/env python3
2+
"""Analyze a snapshot PR diff file and classify all changes.
3+
4+
Usage: python3 analyze_diff.py <diff_file>
5+
6+
The diff file should be the raw output of `gh pr diff`.
7+
Outputs structured text sections for hash changes, merges, version upgrades,
8+
new data pruned from MDBX, and unexpected deletions.
9+
"""
10+
11+
import re
12+
import sys
13+
from collections import defaultdict
14+
15+
16+
def parse_diff(path):
17+
removed = {}
18+
added = {}
19+
with open(path) as f:
20+
for line in f:
21+
line = line.rstrip("\n")
22+
if not line or line[0] not in ("+", "-"):
23+
continue
24+
m = re.match(r"^([+-])'([^']+)'\s*=\s*'([a-f0-9]+)'", line)
25+
if not m:
26+
continue
27+
sign, fname, hsh = m.group(1), m.group(2), m.group(3)
28+
if sign == "-":
29+
removed[fname] = hsh
30+
else:
31+
added[fname] = hsh
32+
return removed, added
33+
34+
35+
def parse_filename(fname):
36+
if fname in ("salt-blocks.txt", "salt-state.txt"):
37+
return {"cat": "other", "fname": fname}
38+
m = re.match(r"^caplin/(v[\d.]+)-(\d+)-(\d+)-([^.]+)\.(.+)$", fname)
39+
if m:
40+
return {"cat": "caplin", "ver": m[1], "s": int(m[2]), "e": int(m[3]), "dt": m[4], "ext": m[5]}
41+
m = re.match(r"^(accessor|domain|history|idx)/(v[\d.]+)-([^.]+)\.(\d+)-(\d+)\.(.+)$", fname)
42+
if m:
43+
return {"cat": m[1], "ver": m[2], "dt": m[3], "s": int(m[4]), "e": int(m[5]), "ext": m[6]}
44+
m = re.match(r"^(v[\d.]+)-(\d+)-(\d+)-(transactions-to-block|[^.]+)\.(.+)$", fname)
45+
if m:
46+
return {"cat": "blocks", "ver": m[1], "dt": m[4], "s": int(m[2]), "e": int(m[3]), "ext": m[5]}
47+
return {"cat": "unknown", "fname": fname}
48+
49+
50+
def hgroup(cat):
51+
if cat in ("accessor", "domain", "history", "idx"):
52+
return "state"
53+
if cat == "caplin":
54+
return "cl"
55+
if cat == "blocks":
56+
return "el"
57+
return "other"
58+
59+
60+
def classify(removed, added):
61+
# 1. Hash changes
62+
hash_changes = []
63+
for fname in removed:
64+
if fname in added and removed[fname] != added[fname]:
65+
hash_changes.append((fname, removed[fname], added[fname]))
66+
hash_change_fnames = set(f for f, _, _ in hash_changes)
67+
68+
# 2. Build groups by (cat, dt, ext)
69+
groups = defaultdict(lambda: {"rem": [], "add": []})
70+
for f, h in removed.items():
71+
if f in hash_change_fnames:
72+
continue
73+
info = parse_filename(f)
74+
if info["cat"] in ("other", "unknown"):
75+
continue
76+
groups[(info["cat"], info["dt"], info["ext"])]["rem"].append({**info, "fname": f})
77+
for f, h in added.items():
78+
if f in hash_change_fnames:
79+
continue
80+
info = parse_filename(f)
81+
if info["cat"] in ("other", "unknown"):
82+
continue
83+
groups[(info["cat"], info["dt"], info["ext"])]["add"].append({**info, "fname": f})
84+
85+
merges = []
86+
version_upgrades_list = []
87+
frontier = []
88+
unexpected = []
89+
explained_r = set()
90+
explained_a = set()
91+
92+
for key, data in sorted(groups.items()):
93+
cat, dt, ext = key
94+
rem = sorted(data["rem"], key=lambda x: (x["s"], x["e"]))
95+
add = sorted(data["add"], key=lambda x: (x["s"], x["e"]))
96+
97+
for a in add:
98+
covered = [r for r in rem if r["s"] >= a["s"] and r["e"] <= a["e"] and r["fname"] not in explained_r]
99+
if covered:
100+
old_vers = list(set(r["ver"] for r in covered))
101+
is_vu = a["ver"] not in set(r["ver"] for r in covered)
102+
info = {
103+
"cat": cat, "dt": dt, "ext": ext,
104+
"rem_ranges": [(r["s"], r["e"], r["ver"]) for r in covered],
105+
"add_range": (a["s"], a["e"], a["ver"]),
106+
"is_vu": is_vu, "old_vers": old_vers, "new_ver": a["ver"],
107+
}
108+
if is_vu:
109+
version_upgrades_list.append(info)
110+
if len(covered) >= 2 or (len(covered) == 1 and (covered[0]["s"] != a["s"] or covered[0]["e"] != a["e"])):
111+
merges.append(info)
112+
for r in covered:
113+
explained_r.add(r["fname"])
114+
explained_a.add(a["fname"])
115+
116+
for a in add:
117+
if a["fname"] not in explained_a:
118+
if not any(r["s"] < a["e"] and r["e"] > a["s"] for r in rem):
119+
frontier.append({"cat": cat, "dt": dt, "ext": ext, "s": a["s"], "e": a["e"], "ver": a["ver"], "fname": a["fname"]})
120+
explained_a.add(a["fname"])
121+
122+
for r in rem:
123+
if r["fname"] not in explained_r:
124+
unexpected.append({"cat": cat, "dt": dt, "ext": ext, "s": r["s"], "e": r["e"], "ver": r["ver"], "fname": r["fname"]})
125+
126+
return hash_changes, merges, version_upgrades_list, frontier, unexpected
127+
128+
129+
def print_report(removed, added, hash_changes, merges, version_upgrades_list, frontier, unexpected):
130+
# Hash changes
131+
print("=== HASH CHANGES ===")
132+
for f, oh, nh in hash_changes:
133+
p = parse_filename(f)
134+
print(f" [{hgroup(p['cat'])}] {f} old={oh} new={nh}")
135+
print(f" count={len(hash_changes)}")
136+
137+
# Unexpected deletions
138+
print("=== UNEXPECTED DELETIONS ===")
139+
for u in unexpected:
140+
print(f" [{hgroup(u['cat'])}] {u['fname']}")
141+
print(f" count={len(unexpected)}")
142+
143+
# Merges table
144+
print("=== MERGES TABLE ===")
145+
mp = defaultdict(list)
146+
for m in merges:
147+
rr = tuple((s, e) for s, e, v in m["rem_ranges"])
148+
ar = (m["add_range"][0], m["add_range"][1])
149+
ov = tuple(sorted(m["old_vers"]))
150+
pk = (hgroup(m["cat"]), m["cat"], rr, ar, m["new_ver"], ov, m["is_vu"])
151+
mp[pk].append(f"{m['dt']} (.{m['ext']})")
152+
153+
for (hg, cat, rr, ar, nv, ov, is_vu), items in sorted(mp.items()):
154+
old_r = ", ".join(f"{s}-{e}" for s, e in rr)
155+
note = ""
156+
if is_vu:
157+
note = f"cross-version: absorbs {','.join(ov)}"
158+
else:
159+
mixed = [v for v in ov if v != nv]
160+
if mixed:
161+
note = f"cross-version: absorbs {','.join(mixed)}"
162+
types_str = ", ".join(sorted(set(items)))
163+
print(f" [{hg}] | {cat} | {types_str} | {old_r} | {ar[0]}-{ar[1]} [{nv}] | {note}")
164+
165+
# Version upgrades
166+
print("=== VERSION UPGRADES ===")
167+
vup = defaultdict(list)
168+
for vu in version_upgrades_list:
169+
vk = (hgroup(vu["cat"]), vu["cat"], tuple(sorted(vu["old_vers"])), vu["new_ver"])
170+
rr = [(s, e) for s, e, v in vu["rem_ranges"]]
171+
vup[vk].append(f"{vu['dt']} (.{vu['ext']}): {', '.join(f'{s}-{e}' for s, e in rr)} -> {vu['add_range'][0]}-{vu['add_range'][1]}")
172+
for (hg, cat, ov, nv), items in sorted(vup.items()):
173+
print(f" [{hg}] {cat}: {','.join(ov)} -> {nv}")
174+
for i in sorted(items):
175+
print(f" {i}")
176+
177+
# Frontier / new data pruned from MDBX
178+
print("=== NEW DATA PRUNED FROM MDBX ===")
179+
fg = defaultdict(list)
180+
for f in frontier:
181+
fg[(hgroup(f["cat"]), f["cat"])].append(f)
182+
for (hg, cat), items in sorted(fg.items()):
183+
items_sorted = sorted(items, key=lambda x: (x["ext"], x["dt"], x["s"], x["e"], x["ver"]))
184+
print(f" [{hg}] {cat}: {len(items)} files")
185+
for item in items_sorted:
186+
print(f" {item['fname']}")
187+
188+
# Totals
189+
print(f"=== TOTALS: removed={len(removed)} added={len(added)} hash_changes={len(hash_changes)} merges={len(merges)} vu={len(version_upgrades_list)} frontier={len(frontier)} unexpected={len(unexpected)} ===")
190+
191+
192+
def main():
193+
if len(sys.argv) != 2:
194+
print(f"Usage: {sys.argv[0]} <diff_file>", file=sys.stderr)
195+
sys.exit(1)
196+
197+
diff_file = sys.argv[1]
198+
removed, added = parse_diff(diff_file)
199+
hash_changes, merges, version_upgrades_list, frontier, unexpected = classify(removed, added)
200+
print_report(removed, added, hash_changes, merges, version_upgrades_list, frontier, unexpected)
201+
202+
203+
if __name__ == "__main__":
204+
main()

0 commit comments

Comments
 (0)