Commit 9a5b5e6
test(l1): add per-phase timing breakdown to multisync Slack notifications (#6136)
**Motivation**
The multisync monitoring script (`docker_monitor.py`) sends Slack
notifications at the end of each sync run, but they only report the
total sync time per network. When investigating performance regressions
or comparing runs, we had to manually SSH into the server and parse raw
container logs to figure out which phase was slow. This is
time-consuming and error-prone.
The sync logs already contain per-phase completion markers like:
```
✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00
✓ STORAGE HEALING complete: 87,414 storage accounts healed in 1:42:00
```
This PR surfaces that data directly in the Slack notification, so
performance bottlenecks are visible at a glance.
**Description**
Adds three things to `tooling/sync/docker_monitor.py`:
1. **`PHASE_COMPLETION_PATTERNS` dict** — Regex patterns for all 8 snap
sync phases:
- Block Headers, Account Ranges, Account Insertion, Storage Ranges,
Storage Insertion, State Healing, Storage Healing, Bytecodes
2. **`parse_phase_timings(run_id, container)` function** — Reads saved
container log files from `multisync_logs/run_{run_id}/{container}.log`
and extracts `(phase_name, item_count, duration)` for each completed
phase. Returns an empty list if logs are missing or if a phase didn't
complete (e.g., on a failed run), so the behavior is graceful.
3. **Phase breakdown in Slack and run logs** — After the per-instance
status line, a code block is appended showing the full phase timing
table. The same breakdown is also written to `run_history.log` and the
per-run `summary.txt`.
### Expected Slack output (successful run)
The Slack message will now include a section like this for each network
instance:
```
📊 Phase Breakdown — mainnet
Block Headers 0:29:00 (25,693,009)
Account Ranges 0:45:12 (12,345,678)
Account Insertion 0:12:34 (12,345,678)
Storage Ranges 0:38:45 (1,234,567)
Storage Insertion 0:08:23 (1,234,567)
State Healing 0:15:00 (87,414)
Storage Healing 1:42:00 (87,414)
Bytecodes 0:05:30 (45,678)
```
Phase names are left-aligned with padding for readability. The count in
parentheses corresponds to the number of items processed (headers,
accounts, storage slots, etc.).
### Expected Slack output (failed run with partial phases)
If a run fails mid-sync (e.g., timeout during storage healing), only the
phases that completed are shown:
```
📊 Phase Breakdown — mainnet
Block Headers 0:29:00 (25,693,009)
Account Ranges 0:45:12 (12,345,678)
Account Insertion 0:12:34 (12,345,678)
Storage Ranges 0:38:45 (1,234,567)
Storage Insertion 0:08:23 (1,234,567)
```
Phases that never completed (State Healing, Storage Healing, Bytecodes
in this case) are simply omitted — no placeholder or "N/A" rows.
### Expected text log output (`summary.txt` / `run_history.log`)
```
✅ mainnet: success (sync: 4h 32m 15s)
Phase Breakdown:
Block Headers 0:29:00 (25,693,009)
Account Ranges 0:45:12 (12,345,678)
Account Insertion 0:12:34 (12,345,678)
Storage Ranges 0:38:45 (1,234,567)
Storage Insertion 0:08:23 (1,234,567)
State Healing 0:15:00 (87,414)
Storage Healing 1:42:00 (87,414)
Bytecodes 0:05:30 (45,678)
```
### How it works
The flow is:
1. `save_all_logs()` saves container logs to disk (already existed, no
changes)
2. `log_run_result()` now calls `parse_phase_timings()` and appends
breakdown to text log
3. `slack_notify()` now calls `parse_phase_timings()` and appends code
blocks to Slack payload
Since `save_all_logs()` is called before both `log_run_result()` and
`slack_notify()` (lines 721→725 in main loop), the saved log files are
always available for parsing.
### Edge cases
| Scenario | Behavior |
|----------|----------|
| Run fails before any phase completes | No breakdown section shown |
| Log file missing or unreadable | Empty list returned, no breakdown |
| Only some phases completed | Only completed phases listed |
| Multiple networks (hoodi, sepolia, mainnet) | Separate breakdown per
instance |
**Checklist**
- [ ] Updated `STORE_SCHEMA_VERSION` (crates/storage/lib.rs) if the PR
includes breaking changes to the `Store` requiring a re-sync.
N/A — This PR only modifies the Python monitoring script, no Rust code
or storage changes.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>1 parent 59ccf80 commit 9a5b5e6
1 file changed
Lines changed: 60 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
51 | 63 | | |
52 | 64 | | |
53 | 65 | | |
| |||
262 | 274 | | |
263 | 275 | | |
264 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
265 | 301 | | |
266 | 302 | | |
267 | 303 | | |
| |||
319 | 355 | | |
320 | 356 | | |
321 | 357 | | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
322 | 373 | | |
323 | 374 | | |
324 | 375 | | |
| |||
417 | 468 | | |
418 | 469 | | |
419 | 470 | | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
420 | 480 | | |
421 | 481 | | |
422 | 482 | | |
| |||
0 commit comments