Skip to content

Commit 9a5b5e6

Browse files
pablodeymoCopilot
andauthored
test(l1): add per-phase timing breakdown to multisync Slack notifications (#6136)
**Motivation** The multisync monitoring script (`docker_monitor.py`) sends Slack notifications at the end of each sync run, but they only report the total sync time per network. When investigating performance regressions or comparing runs, we had to manually SSH into the server and parse raw container logs to figure out which phase was slow. This is time-consuming and error-prone. The sync logs already contain per-phase completion markers like: ``` ✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00 ✓ STORAGE HEALING complete: 87,414 storage accounts healed in 1:42:00 ``` This PR surfaces that data directly in the Slack notification, so performance bottlenecks are visible at a glance. **Description** Adds three things to `tooling/sync/docker_monitor.py`: 1. **`PHASE_COMPLETION_PATTERNS` dict** — Regex patterns for all 8 snap sync phases: - Block Headers, Account Ranges, Account Insertion, Storage Ranges, Storage Insertion, State Healing, Storage Healing, Bytecodes 2. **`parse_phase_timings(run_id, container)` function** — Reads saved container log files from `multisync_logs/run_{run_id}/{container}.log` and extracts `(phase_name, item_count, duration)` for each completed phase. Returns an empty list if logs are missing or if a phase didn't complete (e.g., on a failed run), so the behavior is graceful. 3. **Phase breakdown in Slack and run logs** — After the per-instance status line, a code block is appended showing the full phase timing table. The same breakdown is also written to `run_history.log` and the per-run `summary.txt`. ### Expected Slack output (successful run) The Slack message will now include a section like this for each network instance: ``` 📊 Phase Breakdown — mainnet Block Headers 0:29:00 (25,693,009) Account Ranges 0:45:12 (12,345,678) Account Insertion 0:12:34 (12,345,678) Storage Ranges 0:38:45 (1,234,567) Storage Insertion 0:08:23 (1,234,567) State Healing 0:15:00 (87,414) Storage Healing 1:42:00 (87,414) Bytecodes 0:05:30 (45,678) ``` Phase names are left-aligned with padding for readability. The count in parentheses corresponds to the number of items processed (headers, accounts, storage slots, etc.). ### Expected Slack output (failed run with partial phases) If a run fails mid-sync (e.g., timeout during storage healing), only the phases that completed are shown: ``` 📊 Phase Breakdown — mainnet Block Headers 0:29:00 (25,693,009) Account Ranges 0:45:12 (12,345,678) Account Insertion 0:12:34 (12,345,678) Storage Ranges 0:38:45 (1,234,567) Storage Insertion 0:08:23 (1,234,567) ``` Phases that never completed (State Healing, Storage Healing, Bytecodes in this case) are simply omitted — no placeholder or "N/A" rows. ### Expected text log output (`summary.txt` / `run_history.log`) ``` ✅ mainnet: success (sync: 4h 32m 15s) Phase Breakdown: Block Headers 0:29:00 (25,693,009) Account Ranges 0:45:12 (12,345,678) Account Insertion 0:12:34 (12,345,678) Storage Ranges 0:38:45 (1,234,567) Storage Insertion 0:08:23 (1,234,567) State Healing 0:15:00 (87,414) Storage Healing 1:42:00 (87,414) Bytecodes 0:05:30 (45,678) ``` ### How it works The flow is: 1. `save_all_logs()` saves container logs to disk (already existed, no changes) 2. `log_run_result()` now calls `parse_phase_timings()` and appends breakdown to text log 3. `slack_notify()` now calls `parse_phase_timings()` and appends code blocks to Slack payload Since `save_all_logs()` is called before both `log_run_result()` and `slack_notify()` (lines 721→725 in main loop), the saved log files are always available for parsing. ### Edge cases | Scenario | Behavior | |----------|----------| | Run fails before any phase completes | No breakdown section shown | | Log file missing or unreadable | Empty list returned, no breakdown | | Only some phases completed | Only completed phases listed | | Multiple networks (hoodi, sepolia, mainnet) | Separate breakdown per instance | **Checklist** - [ ] Updated `STORE_SCHEMA_VERSION` (crates/storage/lib.rs) if the PR includes breaking changes to the `Store` requiring a re-sync. N/A — This PR only modifies the Python monitoring script, no Rust code or storage changes. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 59ccf80 commit 9a5b5e6

1 file changed

Lines changed: 60 additions & 0 deletions

File tree

tooling/sync/docker_monitor.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,18 @@
4848
"block_processing": "📦", "success": "🎉", "failed": "❌"
4949
}
5050

51+
# Phase completion patterns for parsing sync logs
52+
PHASE_COMPLETION_PATTERNS = {
53+
"Block Headers": r"✓ BLOCK HEADERS complete: ([\d,]+) headers in (\d+:\d{2}:\d{2})",
54+
"Account Ranges": r"✓ ACCOUNT RANGES complete: ([\d,]+) accounts in (\d+:\d{2}:\d{2})",
55+
"Account Insertion": r"✓ ACCOUNT INSERTION complete: ([\d,]+) accounts inserted in (\d+:\d{2}:\d{2})",
56+
"Storage Ranges": r"✓ STORAGE RANGES complete: ([\d,]+) storage slots in (\d+:\d{2}:\d{2})",
57+
"Storage Insertion": r"✓ STORAGE INSERTION complete: ([\d,]+) storage slots inserted in (\d+:\d{2}:\d{2})",
58+
"State Healing": r"✓ STATE HEALING complete: ([\d,]+) state paths healed in (\d+:\d{2}:\d{2})",
59+
"Storage Healing": r"✓ STORAGE HEALING complete: ([\d,]+) storage accounts healed in (\d+:\d{2}:\d{2})",
60+
"Bytecodes": r"✓ BYTECODES complete: ([\d,]+) bytecodes in (\d+:\d{2}:\d{2})",
61+
}
62+
5163

5264
@dataclass
5365
class Instance:
@@ -262,6 +274,30 @@ def rpc_call(url: str, method: str) -> Optional[Any]:
262274
return None
263275

264276

277+
def parse_phase_timings(run_id: str, container: str) -> list[tuple[str, str, str]]:
278+
"""Parse phase completion times from saved container logs.
279+
280+
Returns list of (phase_name, count, duration) tuples.
281+
"""
282+
log_file = LOGS_DIR / f"run_{run_id}" / f"{container}.log"
283+
if not log_file.exists():
284+
return []
285+
286+
try:
287+
logs = log_file.read_text()
288+
except Exception:
289+
return []
290+
291+
phases = []
292+
for phase_name, pattern in PHASE_COMPLETION_PATTERNS.items():
293+
match = re.search(pattern, logs)
294+
if match:
295+
count = match.group(1)
296+
duration = match.group(2)
297+
phases.append((phase_name, count, duration))
298+
return phases
299+
300+
265301
def slack_notify(run_id: str, run_count: int, instances: list, hostname: str, branch: str, commit: str, build_profile: str = ""):
266302
"""Send a single summary Slack message for the run."""
267303
all_success = all(i.status == "success" for i in instances)
@@ -319,6 +355,21 @@ def slack_notify(run_id: str, run_count: int, instances: list, hostname: str, br
319355
if i.error:
320356
line += f"\n Error: {i.error}"
321357
blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": line}})
358+
359+
# Add phase breakdown for each instance
360+
for i in instances:
361+
phases = parse_phase_timings(run_id, i.container)
362+
if phases:
363+
phase_lines = [f"📊 *Phase Breakdown — {i.name}*", "```"]
364+
max_name_len = max(len(name) for name, _, _ in phases)
365+
for name, count, duration in phases:
366+
phase_lines.append(f"{name:<{max_name_len}} {duration} ({count})")
367+
phase_lines.append("```")
368+
blocks.append({
369+
"type": "section",
370+
"text": {"type": "mrkdwn", "text": "\n".join(phase_lines)}
371+
})
372+
322373
try:
323374
requests.post(url, json={"blocks": blocks}, timeout=10)
324375
except Exception:
@@ -417,6 +468,15 @@ def log_run_result(run_id: str, run_count: int, instances: list[Instance], hostn
417468
if inst.error:
418469
line += f"\n Error: {inst.error}"
419470
lines.append(line)
471+
472+
# Add phase breakdown
473+
phases = parse_phase_timings(run_id, inst.container)
474+
if phases:
475+
lines.append(f" Phase Breakdown:")
476+
max_name_len = max(len(name) for name, _, _ in phases)
477+
for name, count, duration in phases:
478+
lines.append(f" {name:<{max_name_len}} {duration} ({count})")
479+
420480
lines.append("")
421481
# Append to log file
422482
with open(RUN_LOG_FILE, "a") as f:

0 commit comments

Comments
 (0)