|
| 1 | +--- |
| 2 | +name: profiling |
| 3 | +description: > |
| 4 | + Profile Rubydex indexer performance — CPU flamegraphs, memory usage, phase-level timing. |
| 5 | + Use this skill whenever the user mentions profiling, performance, flamegraphs, benchmarking, |
| 6 | + "why is X slow", bottlenecks, hot paths, memory usage, or wants to understand where time |
| 7 | + is spent during indexing/resolution. Also trigger when comparing performance before/after |
| 8 | + a change. |
| 9 | +--- |
| 10 | + |
| 11 | +# Profiling Rubydex |
| 12 | + |
| 13 | +This skill helps you profile the Rubydex indexer to find CPU and memory bottlenecks. |
| 14 | +The indexer has a multi-phase pipeline (listing → indexing → resolution → querying). |
| 15 | +Use `--stats` to see which phase dominates, then profile to find what's expensive inside it. |
| 16 | + |
| 17 | +## Profiling tool: samply |
| 18 | + |
| 19 | +Use **samply** — a sampling profiler that opens results in Firefox Profiler (in-browser). |
| 20 | +It captures call stacks at high frequency and produces interactive flamegraphs with filtering, |
| 21 | +timeline views, and per-function cost breakdowns. |
| 22 | + |
| 23 | +Install if needed: |
| 24 | + |
| 25 | +```bash |
| 26 | +cargo install samply |
| 27 | +``` |
| 28 | + |
| 29 | +## Build profile |
| 30 | + |
| 31 | +Profiling needs optimized code *with* debug symbols so you get real function names in the |
| 32 | +flamegraph instead of mangled addresses. The workspace Cargo.toml has a custom profile for this: |
| 33 | + |
| 34 | +```toml |
| 35 | +# rust/Cargo.toml |
| 36 | +[profile.profiling] |
| 37 | +inherits = "release" |
| 38 | +debug = true # Full debug symbols for readable flamegraphs |
| 39 | +strip = false # Keep symbols in the binary |
| 40 | +``` |
| 41 | + |
| 42 | +If this profile doesn't exist yet, **add it** to `rust/Cargo.toml` before profiling. The |
| 43 | +release profile uses `lto = true`, `opt-level = 3`, `codegen-units = 1` — the profiling |
| 44 | +profile inherits all of that and just adds debug info. |
| 45 | + |
| 46 | +Build with: |
| 47 | + |
| 48 | +```bash |
| 49 | +cargo build --profile profiling |
| 50 | +``` |
| 51 | + |
| 52 | +The binary lands at `rust/target/profiling/rubydex_cli` (not `target/release/`). |
| 53 | + |
| 54 | +The first build is slow (LTO + single codegen unit must recompile everything). Subsequent |
| 55 | +builds after small changes are faster since Cargo caches intermediate artifacts in |
| 56 | +`target/profiling/`. Don't delete that directory between runs. |
| 57 | + |
| 58 | +## Running a profile |
| 59 | + |
| 60 | +### Full pipeline |
| 61 | + |
| 62 | +```bash |
| 63 | +samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats |
| 64 | +``` |
| 65 | + |
| 66 | +The `--stats` flag prints the timing breakdown and memory stats to stderr after completion, |
| 67 | +so you get both the samply profile AND the summary stats in one run. |
| 68 | + |
| 69 | +Useful samply flags: |
| 70 | +- `--no-open` — don't auto-open the browser (useful for scripted runs) |
| 71 | +- `--save-only` — save the profile to disk without starting the local server; load later |
| 72 | + with `samply load <profile.json>` |
| 73 | + |
| 74 | +### Isolating a phase |
| 75 | + |
| 76 | +Use `--stop-after` to profile only up to a specific stage. This is useful when you want |
| 77 | +a cleaner flamegraph focused on one phase without the noise of later stages: |
| 78 | + |
| 79 | +```bash |
| 80 | +# Profile only listing + indexing (skip resolution) |
| 81 | +samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats --stop-after indexing |
| 82 | + |
| 83 | +# Profile through resolution (skip querying) |
| 84 | +samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats --stop-after resolution |
| 85 | +``` |
| 86 | + |
| 87 | +Valid `--stop-after` values: `listing`, `indexing`, `resolution`. |
| 88 | + |
| 89 | +### Common target paths |
| 90 | + |
| 91 | +The user should have a `DEFAULT_BENCH_WORKSPACE` configured pointing to a target codebase. |
| 92 | + |
| 93 | +For synthetic corpora, use `utils/bench` with size arguments (tiny/small/medium/large/huge), |
| 94 | +which auto-generates corpora at `../rubydex_corpora/<size>/`. |
| 95 | + |
| 96 | +## Reading the results |
| 97 | + |
| 98 | +When samply finishes, it automatically opens Firefox Profiler in the browser. Key things |
| 99 | +to guide the user through: |
| 100 | + |
| 101 | +### Firefox Profiler tips |
| 102 | + |
| 103 | +1. **Call Tree tab** — shows cumulative time per function, sorted by total cost. Start here |
| 104 | + to find the most expensive call paths. |
| 105 | + |
| 106 | +2. **Flame Graph tab** — visual representation where width = time. Look for wide bars — those |
| 107 | + are the hot functions. Click to zoom in. |
| 108 | + |
| 109 | +3. **Timeline** — shows activity over time. Useful for spotting if one phase is unexpectedly |
| 110 | + long or if there are idle gaps. |
| 111 | + |
| 112 | +4. **Filtering** — type a function name in the filter box to isolate it. |
| 113 | + |
| 114 | +5. **Transform > Focus on subtree** — right-click a function to see only its callees. Perfect |
| 115 | + for drilling into a specific phase. |
| 116 | + |
| 117 | +6. **Transform > Merge function** — collapse recursive calls to see aggregate cost. |
| 118 | + |
| 119 | +### Text-based profiling with `sample` (macOS) |
| 120 | + |
| 121 | +When you can't interact with a browser (e.g., running from a script or agent), use macOS's |
| 122 | +built-in `sample` command for a text-based call tree: |
| 123 | + |
| 124 | +```bash |
| 125 | +# Start the indexer in the background, then sample it |
| 126 | +rust/target/profiling/rubydex_cli <TARGET_PATH> --stats & |
| 127 | +PID=$! |
| 128 | +sample $PID -f /tmp/rubydex-sample.txt |
| 129 | +wait $PID |
| 130 | +``` |
| 131 | + |
| 132 | +Or for a simpler approach, sample for a fixed duration while the indexer runs: |
| 133 | + |
| 134 | +```bash |
| 135 | +rust/target/profiling/rubydex_cli <TARGET_PATH> --stats & |
| 136 | +PID=$! |
| 137 | +sleep 2 # let it get past listing/indexing into resolution |
| 138 | +sample $PID 30 -f /tmp/rubydex-sample.txt # sample for 30 seconds |
| 139 | +wait $PID |
| 140 | +``` |
| 141 | + |
| 142 | +The output is a text call tree with sample counts — sort by "self" samples to find hot functions. |
| 143 | + |
| 144 | +### How to read the profile |
| 145 | + |
| 146 | +Don't assume which functions are hot — let the data tell you. Hot paths change as the |
| 147 | +codebase evolves. |
| 148 | + |
| 149 | +1. **Sort by self-time** (time spent in the function itself, not its callees). This reveals |
| 150 | + the actual hot spots. High total-time but low self-time means the function is just a |
| 151 | + caller — drill into its children. |
| 152 | + |
| 153 | +2. **Look for concentration vs. spread.** A single function dominating self-time suggests |
| 154 | + an algorithmic fix (memoization, better data structure). Time spread across many functions |
| 155 | + suggests the workload itself is large and optimization requires a different approach. |
| 156 | + |
| 157 | +3. **Check for allocation pressure.** If `alloc` / `malloc` / `realloc` show up prominently |
| 158 | + in self-time, the bottleneck is memory allocation, not computation. |
| 159 | + |
| 160 | +## Memory profiling |
| 161 | + |
| 162 | +For memory, the `--stats` flag already reports Maximum RSS at the end. For deeper memory |
| 163 | +analysis: |
| 164 | + |
| 165 | +### Quick check with utils/mem-use |
| 166 | + |
| 167 | +```bash |
| 168 | +utils/mem-use rust/target/profiling/rubydex_cli <TARGET_PATH> --stats |
| 169 | +``` |
| 170 | + |
| 171 | +This wraps the command with `/usr/bin/time -l` and reports: |
| 172 | +- Maximum Resident Set Size (RSS) |
| 173 | +- Peak Memory Footprint |
| 174 | +- Execution Time |
| 175 | + |
| 176 | +## Before/after comparison workflow |
| 177 | + |
| 178 | +When the user has made a change and wants to measure impact: |
| 179 | + |
| 180 | +1. **Get baseline** — run on the current main/branch before changes: |
| 181 | + ```bash |
| 182 | + samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats 2>&1 | tee /tmp/rubydex-baseline.txt |
| 183 | + ``` |
| 184 | + Save the samply profile URL from the browser (Firefox Profiler allows sharing via permalink). |
| 185 | + |
| 186 | +2. **Apply changes** and rebuild: |
| 187 | + ```bash |
| 188 | + cargo build --profile profiling |
| 189 | + ``` |
| 190 | + |
| 191 | +3. **Get new measurement**: |
| 192 | + ```bash |
| 193 | + samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats 2>&1 | tee /tmp/rubydex-after.txt |
| 194 | + ``` |
| 195 | + |
| 196 | +4. **Compare** — parse both output files and show a side-by-side delta of: |
| 197 | + - Total time and per-phase breakdown (listing, indexing, resolution, querying) |
| 198 | + - Memory (RSS) |
| 199 | + - Declaration/definition counts (sanity check that output is equivalent) |
| 200 | + |
| 201 | +Present the comparison as a formatted table showing absolute values and % change. |
| 202 | + |
| 203 | +### Quick benchmark (no flamegraph) |
| 204 | + |
| 205 | +When the user just wants timing/memory numbers without the full profiler overhead: |
| 206 | + |
| 207 | +```bash |
| 208 | +# Release build (faster than profiling profile since no debug symbols) |
| 209 | +cargo build --release |
| 210 | +utils/bench # uses DEFAULT_BENCH_WORKSPACE |
| 211 | +utils/bench medium # synthetic corpus |
| 212 | +utils/bench /path/to/project # specific directory |
| 213 | +``` |
| 214 | + |
| 215 | +## Timing phases (--stats output) |
| 216 | + |
| 217 | +The `--stats` flag on rubydex_cli prints a timing breakdown using the internal timer system. |
| 218 | +The phases are: |
| 219 | + |
| 220 | +| Phase | What it measures | |
| 221 | +|-------|-----------------| |
| 222 | +| Initialization | Setup and configuration | |
| 223 | +| Listing | File discovery (walking directories, filtering .rb files) | |
| 224 | +| Indexing | Parsing Ruby files and extracting definitions/references | |
| 225 | +| Resolution | Computing fully qualified names, resolving constants, linearizing ancestors | |
| 226 | +| Integrity check | Validating graph consistency (optional) | |
| 227 | +| Querying | Building query indices | |
| 228 | +| Cleanup | Time not accounted for by other phases | |
| 229 | + |
| 230 | +It also prints: |
| 231 | +- Maximum RSS in bytes and MB |
| 232 | +- Declaration/definition counts and breakdown by kind |
| 233 | +- Orphan rate (definitions not linked to declarations) |
| 234 | + |
| 235 | +## Troubleshooting |
| 236 | + |
| 237 | +### samply permission errors on macOS |
| 238 | + |
| 239 | +samply uses the `dtrace` backend on macOS which may need elevated permissions. If you get |
| 240 | +permission errors: |
| 241 | + |
| 242 | +```bash |
| 243 | +sudo samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats |
| 244 | +``` |
| 245 | + |
| 246 | +Or grant Terminal/iTerm the "Developer Tools" permission in System Settings > Privacy & Security. |
| 247 | + |
| 248 | +### Empty or unhelpful flamegraphs |
| 249 | + |
| 250 | +If the flamegraph shows mostly `[unknown]` frames: |
| 251 | +- Make sure you built with `--profile profiling` (not `--release`) |
| 252 | +- Verify debug symbols: `dsymutil -s rust/target/profiling/rubydex_cli | head -20` should |
| 253 | + show symbol entries. |
| 254 | +- On macOS, ensure `strip = false` in the profiling profile |
| 255 | + |
| 256 | +### Comparing runs with variance |
| 257 | + |
| 258 | +Indexer performance can vary ±5% between runs due to OS scheduling, file system caching, etc. |
| 259 | +For reliable comparisons, run 3 times and take the median, or at minimum run twice and check |
| 260 | +consistency before drawing conclusions. |
0 commit comments