Skip to content

Commit 4e041e6

Browse files
authored
Merge branch 'next' into refactor/catalog-edge-resolve
2 parents 2cc225d + f03abea commit 4e041e6

3 files changed

Lines changed: 1031 additions & 295 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Utoopack Performance Report
2+
3+
**Report ID**: `utoopack_performance_report_20260313_202108`
4+
**Generated**: 2026-03-13 20:21:08
5+
**Trace File**: `trace_20260313_201000.json` (0.6GB, 1.65M spans)
6+
**Test Project**: `examples/with-antd`
7+
8+
---
9+
10+
## Executive Summary
11+
12+
| Metric | Value | Assessment |
13+
|--------|-------|------------|
14+
| Total Wall Time | **2,101.6 ms** | Baseline |
15+
| Total Thread Work (de-duped) | **13,241.5 ms** | Non-overlapping busy time |
16+
| Effective Parallelism | **6.3x** | thread_work / wall_time |
17+
| Working Threads | **13** | Threads with actual spans |
18+
| Thread Utilization | **48.5%** | ⚠️ Suboptimal |
19+
| Total Spans | **1,650,705** | All B/E + X events |
20+
| Meaningful Spans (>= 10us) | **217,778** | (13.2% of total) |
21+
| Tracing Noise (< 10us) | **1,432,927** | (86.8% of total) |
22+
23+
---
24+
25+
## Build Phase Timeline
26+
27+
Shows when each build phase is active and how much CPU it consumes.
28+
**Self-Time** is the time spent *exclusively* in that phase (excluding children).
29+
30+
| Phase | Spans | Inclusive (ms) | Self-Time (ms) | Wall Range (ms) |
31+
|-------|-------|----------------|----------------|-----------------|
32+
| Resolve | 30,353 | 1,112.6 | 746.7 | 902.5 |
33+
| Parse | 6,851 | 2,366.1 | 1,369.6 | 2,037.0 |
34+
| Analyze | 132,394 | 5,490.3 | 4,175.1 | 1,839.8 |
35+
| Chunk | 17,285 | 976.7 | 894.9 | 641.1 |
36+
| Codegen | 20,179 | 1,540.9 | 1,063.5 | 529.7 |
37+
| Emit | 33 | 212.3 | 105.4 | 28.6 |
38+
| Other | 10,683 | 766.0 | 706.5 | 2,101.6 |
39+
40+
---
41+
42+
## Workload Distribution by Diagnostic Tier
43+
44+
| Category | Spans | Inclusive (ms) | % Work | Self-Time (ms) | % Self |
45+
|----------|-------|----------------|--------|----------------|--------|
46+
| P0: Scheduling & Resolution | 161,386 | 6,449.8 | 48.7% | 4,768.7 | 36.0% |
47+
| P1: I/O & Heavy Tasks | 2,928 | 1,415.8 | 10.7% | 1,308.9 | 9.9% |
48+
| P2: Architecture (Locks/Memory) | 1 | 0.0 | 0.0% | 0.0 | 0.0% |
49+
| P3: Asset Pipeline | 42,781 | 3,833.2 | 28.9% | 2,277.6 | 17.2% |
50+
| P4: Bridge/Interop | 0 | 0.0 | 0.0% | 0.0 | 0.0% |
51+
| Other | 10,682 | 766.0 | 5.8% | 706.5 | 5.3% |
52+
53+
---
54+
55+
## Top 20 Tasks by Self-Time
56+
57+
Self-time is the *exclusive* duration: time spent in the task itself, not in sub-tasks.
58+
This is the most accurate indicator of where CPU cycles are actually spent.
59+
60+
| Self (ms) | Inclusive (ms) | Count | Avg Self (us) | P95 Self (ms) | Max Self (ms) | % Work | Task Name | Top Caller |
61+
|-----------|----------------|-------|---------------|---------------|---------------|--------|-----------|------------|
62+
| 1,977.5 | 2,232.7 | 70,323 | 28.1 | 0.1 | 2.4 | 14.9% | `module` | `write all entrypoints to disk` (1%) |
63+
| 1,095.4 | 1,095.4 | 2,170 | 504.8 | 1.6 | 5.5 | 8.3% | `read file` | `parse ecmascript` (91%) |
64+
| 1,015.7 | 1,015.7 | 20,539 | 49.5 | 0.2 | 71.9 | 7.7% | `compute async module info` | `None` (0%) |
65+
| 797.3 | 1,048.6 | 16,341 | 48.8 | 0.1 | 63.6 | 6.0% | `analyze ecmascript module` | `process module` (79%) |
66+
| 672.2 | 725.9 | 10,098 | 66.6 | 0.0 | 159.0 | 5.1% | `write all entrypoints to disk` | `None` (0%) |
67+
| 517.1 | 517.1 | 9,169 | 56.4 | 0.2 | 2.6 | 3.9% | `precompute code generation` | `code generation` (52%) |
68+
| 487.9 | 553.7 | 7,827 | 62.3 | 0.2 | 24.7 | 3.7% | `chunking` | `write all entrypoints to disk` (0%) |
69+
| 465.5 | 942.9 | 9,677 | 48.1 | 0.2 | 14.4 | 3.5% | `code generation` | `chunking` (4%) |
70+
| 393.9 | 394.7 | 9,378 | 42.0 | 0.1 | 23.0 | 3.0% | `compute async chunks` | `None` (0%) |
71+
| 342.8 | 552.8 | 13,346 | 25.7 | 0.1 | 1.7 | 2.6% | `internal resolving` | `resolving` (30%) |
72+
| 318.2 | 1,110.8 | 23,953 | 13.3 | 0.1 | 1.0 | 2.4% | `process module` | `module` (8%) |
73+
| 295.8 | 451.8 | 16,285 | 18.2 | 0.1 | 2.7 | 2.2% | `resolving` | `module` (18%) |
74+
| 274.2 | 1,270.7 | 4,678 | 58.6 | 0.2 | 9.8 | 2.1% | `parse ecmascript` | `analyze ecmascript module` (46%) |
75+
| 108.1 | 108.1 | 722 | 149.7 | 0.3 | 0.6 | 0.8% | `read directory` | `internal resolving` (100%) |
76+
| 105.3 | 105.3 | 13 | 8098.3 | 21.3 | 23.5 | 0.8% | `write file` | `apply effects` (100%) |
77+
| 80.9 | 80.9 | 1,333 | 60.7 | 0.2 | 5.6 | 0.6% | `generate source map` | `code generation` (97%) |
78+
| 45.0 | 45.0 | 639 | 70.4 | 0.2 | 11.2 | 0.3% | `compute binding usage info` | `write all entrypoints to disk` (0%) |
79+
| 16.4 | 16.4 | 584 | 28.1 | 0.0 | 7.0 | 0.1% | `collect mergeable modules` | `compute merged modules` (0%) |
80+
| 13.1 | 28.3 | 80 | 163.4 | 0.4 | 7.4 | 0.1% | `make production chunks` | `chunking` (5%) |
81+
| 8.7 | 9.5 | 311 | 28.0 | 0.1 | 0.4 | 0.1% | `async reference` | `write all entrypoints to disk` (1%) |
82+
83+
---
84+
85+
## Critical Path Analysis
86+
87+
The longest sequential dependency chains that determine wall-clock time.
88+
Focus on reducing the depth of these chains to improve parallelism.
89+
90+
| Rank | Self-Time (ms) | Depth | Path |
91+
|------|----------------|-------|------|
92+
| 1 | 63.6 | 2 | process module → analyze ecmascript module |
93+
| 2 | 23.5 | 2 | apply effects → write file |
94+
| 3 | 21.5 | 2 | process module → analyze ecmascript module |
95+
| 4 | 20.0 | 2 | code generation → generate source map |
96+
| 5 | 19.8 | 2 | apply effects → write file |
97+
98+
---
99+
100+
## Batching Candidates
101+
102+
High-volume tasks dominated by a single parent. If the parent can batch them,
103+
it drastically reduces scheduler overhead.
104+
105+
| Task Name | Count | Top Caller (Attribution) | Avg Self | P95 Self | Total Self |
106+
|-----------|-------|--------------------------|----------|----------|------------|
107+
| `analyze ecmascript module` | 16,341 | `process module` (79%) | 48.8 us | 0.15 ms | 797.3 ms |
108+
109+
---
110+
111+
## Duration Distribution
112+
113+
| Range | Count | Percentage |
114+
|-------|-------|------------|
115+
| <10us | 1,432,927 | 86.8% |
116+
| 10us-100us | 195,084 | 11.8% |
117+
| 100us-1ms | 21,846 | 1.3% |
118+
| 1ms-10ms | 811 | 0.0% |
119+
| 10ms-100ms | 35 | 0.0% |
120+
| >100ms | 2 | 0.0% |
121+
122+
---
123+
124+
## Action Items
125+
1. **[P0]** Focus on tasks with the highest **Self-Time** — these are where CPU cycles are *actually* spent.
126+
2. **[P0]** Use Batching Candidates to identify callers that should use `try_join` or reduce `#[turbo_tasks::function]` granularity.
127+
3. **[P1]** Check Build Phase Timeline for phases with disproportionate wall range vs. self-time (= serialization).
128+
4. **[P1]** Inspect `P95 Self (ms)` for heavy monolith tasks. Focus on long-tail outliers, not averages.
129+
5. **[P1]** Review Critical Paths — reducing the longest chain depth directly improves wall-clock time.
130+
6. **[P2]** If Thread Utilization < 60%, investigate scheduling gaps (lock contention or deep dependency chains).
131+
132+
*Report generated by Utoopack Performance Analysis Agent*

0 commit comments

Comments
 (0)