|
| 1 | +# Utoopack Performance Report |
| 2 | + |
| 3 | +**Report ID**: `utoopack_performance_report_20260313_202108` |
| 4 | +**Generated**: 2026-03-13 20:21:08 |
| 5 | +**Trace File**: `trace_20260313_201000.json` (0.6GB, 1.65M spans) |
| 6 | +**Test Project**: `examples/with-antd` |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Executive Summary |
| 11 | + |
| 12 | +| Metric | Value | Assessment | |
| 13 | +|--------|-------|------------| |
| 14 | +| Total Wall Time | **2,101.6 ms** | Baseline | |
| 15 | +| Total Thread Work (de-duped) | **13,241.5 ms** | Non-overlapping busy time | |
| 16 | +| Effective Parallelism | **6.3x** | thread_work / wall_time | |
| 17 | +| Working Threads | **13** | Threads with actual spans | |
| 18 | +| Thread Utilization | **48.5%** | ⚠️ Suboptimal | |
| 19 | +| Total Spans | **1,650,705** | All B/E + X events | |
| 20 | +| Meaningful Spans (>= 10us) | **217,778** | (13.2% of total) | |
| 21 | +| Tracing Noise (< 10us) | **1,432,927** | (86.8% of total) | |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Build Phase Timeline |
| 26 | + |
| 27 | +Shows when each build phase is active and how much CPU it consumes. |
| 28 | +**Self-Time** is the time spent *exclusively* in that phase (excluding children). |
| 29 | + |
| 30 | +| Phase | Spans | Inclusive (ms) | Self-Time (ms) | Wall Range (ms) | |
| 31 | +|-------|-------|----------------|----------------|-----------------| |
| 32 | +| Resolve | 30,353 | 1,112.6 | 746.7 | 902.5 | |
| 33 | +| Parse | 6,851 | 2,366.1 | 1,369.6 | 2,037.0 | |
| 34 | +| Analyze | 132,394 | 5,490.3 | 4,175.1 | 1,839.8 | |
| 35 | +| Chunk | 17,285 | 976.7 | 894.9 | 641.1 | |
| 36 | +| Codegen | 20,179 | 1,540.9 | 1,063.5 | 529.7 | |
| 37 | +| Emit | 33 | 212.3 | 105.4 | 28.6 | |
| 38 | +| Other | 10,683 | 766.0 | 706.5 | 2,101.6 | |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Workload Distribution by Diagnostic Tier |
| 43 | + |
| 44 | +| Category | Spans | Inclusive (ms) | % Work | Self-Time (ms) | % Self | |
| 45 | +|----------|-------|----------------|--------|----------------|--------| |
| 46 | +| P0: Scheduling & Resolution | 161,386 | 6,449.8 | 48.7% | 4,768.7 | 36.0% | |
| 47 | +| P1: I/O & Heavy Tasks | 2,928 | 1,415.8 | 10.7% | 1,308.9 | 9.9% | |
| 48 | +| P2: Architecture (Locks/Memory) | 1 | 0.0 | 0.0% | 0.0 | 0.0% | |
| 49 | +| P3: Asset Pipeline | 42,781 | 3,833.2 | 28.9% | 2,277.6 | 17.2% | |
| 50 | +| P4: Bridge/Interop | 0 | 0.0 | 0.0% | 0.0 | 0.0% | |
| 51 | +| Other | 10,682 | 766.0 | 5.8% | 706.5 | 5.3% | |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Top 20 Tasks by Self-Time |
| 56 | + |
| 57 | +Self-time is the *exclusive* duration: time spent in the task itself, not in sub-tasks. |
| 58 | +This is the most accurate indicator of where CPU cycles are actually spent. |
| 59 | + |
| 60 | +| Self (ms) | Inclusive (ms) | Count | Avg Self (us) | P95 Self (ms) | Max Self (ms) | % Work | Task Name | Top Caller | |
| 61 | +|-----------|----------------|-------|---------------|---------------|---------------|--------|-----------|------------| |
| 62 | +| 1,977.5 | 2,232.7 | 70,323 | 28.1 | 0.1 | 2.4 | 14.9% | `module` | `write all entrypoints to disk` (1%) | |
| 63 | +| 1,095.4 | 1,095.4 | 2,170 | 504.8 | 1.6 | 5.5 | 8.3% | `read file` | `parse ecmascript` (91%) | |
| 64 | +| 1,015.7 | 1,015.7 | 20,539 | 49.5 | 0.2 | 71.9 | 7.7% | `compute async module info` | `None` (0%) | |
| 65 | +| 797.3 | 1,048.6 | 16,341 | 48.8 | 0.1 | 63.6 | 6.0% | `analyze ecmascript module` | `process module` (79%) | |
| 66 | +| 672.2 | 725.9 | 10,098 | 66.6 | 0.0 | 159.0 | 5.1% | `write all entrypoints to disk` | `None` (0%) | |
| 67 | +| 517.1 | 517.1 | 9,169 | 56.4 | 0.2 | 2.6 | 3.9% | `precompute code generation` | `code generation` (52%) | |
| 68 | +| 487.9 | 553.7 | 7,827 | 62.3 | 0.2 | 24.7 | 3.7% | `chunking` | `write all entrypoints to disk` (0%) | |
| 69 | +| 465.5 | 942.9 | 9,677 | 48.1 | 0.2 | 14.4 | 3.5% | `code generation` | `chunking` (4%) | |
| 70 | +| 393.9 | 394.7 | 9,378 | 42.0 | 0.1 | 23.0 | 3.0% | `compute async chunks` | `None` (0%) | |
| 71 | +| 342.8 | 552.8 | 13,346 | 25.7 | 0.1 | 1.7 | 2.6% | `internal resolving` | `resolving` (30%) | |
| 72 | +| 318.2 | 1,110.8 | 23,953 | 13.3 | 0.1 | 1.0 | 2.4% | `process module` | `module` (8%) | |
| 73 | +| 295.8 | 451.8 | 16,285 | 18.2 | 0.1 | 2.7 | 2.2% | `resolving` | `module` (18%) | |
| 74 | +| 274.2 | 1,270.7 | 4,678 | 58.6 | 0.2 | 9.8 | 2.1% | `parse ecmascript` | `analyze ecmascript module` (46%) | |
| 75 | +| 108.1 | 108.1 | 722 | 149.7 | 0.3 | 0.6 | 0.8% | `read directory` | `internal resolving` (100%) | |
| 76 | +| 105.3 | 105.3 | 13 | 8098.3 | 21.3 | 23.5 | 0.8% | `write file` | `apply effects` (100%) | |
| 77 | +| 80.9 | 80.9 | 1,333 | 60.7 | 0.2 | 5.6 | 0.6% | `generate source map` | `code generation` (97%) | |
| 78 | +| 45.0 | 45.0 | 639 | 70.4 | 0.2 | 11.2 | 0.3% | `compute binding usage info` | `write all entrypoints to disk` (0%) | |
| 79 | +| 16.4 | 16.4 | 584 | 28.1 | 0.0 | 7.0 | 0.1% | `collect mergeable modules` | `compute merged modules` (0%) | |
| 80 | +| 13.1 | 28.3 | 80 | 163.4 | 0.4 | 7.4 | 0.1% | `make production chunks` | `chunking` (5%) | |
| 81 | +| 8.7 | 9.5 | 311 | 28.0 | 0.1 | 0.4 | 0.1% | `async reference` | `write all entrypoints to disk` (1%) | |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## Critical Path Analysis |
| 86 | + |
| 87 | +The longest sequential dependency chains that determine wall-clock time. |
| 88 | +Focus on reducing the depth of these chains to improve parallelism. |
| 89 | + |
| 90 | +| Rank | Self-Time (ms) | Depth | Path | |
| 91 | +|------|----------------|-------|------| |
| 92 | +| 1 | 63.6 | 2 | process module → analyze ecmascript module | |
| 93 | +| 2 | 23.5 | 2 | apply effects → write file | |
| 94 | +| 3 | 21.5 | 2 | process module → analyze ecmascript module | |
| 95 | +| 4 | 20.0 | 2 | code generation → generate source map | |
| 96 | +| 5 | 19.8 | 2 | apply effects → write file | |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Batching Candidates |
| 101 | + |
| 102 | +High-volume tasks dominated by a single parent. If the parent can batch them, |
| 103 | +it drastically reduces scheduler overhead. |
| 104 | + |
| 105 | +| Task Name | Count | Top Caller (Attribution) | Avg Self | P95 Self | Total Self | |
| 106 | +|-----------|-------|--------------------------|----------|----------|------------| |
| 107 | +| `analyze ecmascript module` | 16,341 | `process module` (79%) | 48.8 us | 0.15 ms | 797.3 ms | |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +## Duration Distribution |
| 112 | + |
| 113 | +| Range | Count | Percentage | |
| 114 | +|-------|-------|------------| |
| 115 | +| <10us | 1,432,927 | 86.8% | |
| 116 | +| 10us-100us | 195,084 | 11.8% | |
| 117 | +| 100us-1ms | 21,846 | 1.3% | |
| 118 | +| 1ms-10ms | 811 | 0.0% | |
| 119 | +| 10ms-100ms | 35 | 0.0% | |
| 120 | +| >100ms | 2 | 0.0% | |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Action Items |
| 125 | +1. **[P0]** Focus on tasks with the highest **Self-Time** — these are where CPU cycles are *actually* spent. |
| 126 | +2. **[P0]** Use Batching Candidates to identify callers that should use `try_join` or reduce `#[turbo_tasks::function]` granularity. |
| 127 | +3. **[P1]** Check Build Phase Timeline for phases with disproportionate wall range vs. self-time (= serialization). |
| 128 | +4. **[P1]** Inspect `P95 Self (ms)` for heavy monolith tasks. Focus on long-tail outliers, not averages. |
| 129 | +5. **[P1]** Review Critical Paths — reducing the longest chain depth directly improves wall-clock time. |
| 130 | +6. **[P2]** If Thread Utilization < 60%, investigate scheduling gaps (lock contention or deep dependency chains). |
| 131 | + |
| 132 | +*Report generated by Utoopack Performance Analysis Agent* |
0 commit comments