| name | profiling |
|---|---|
| description | Profile Rubydex indexer performance — CPU flamegraphs, memory usage, phase-level timing. Use this skill whenever the user mentions profiling, performance, flamegraphs, benchmarking, "why is X slow", bottlenecks, hot paths, memory usage, or wants to understand where time is spent during indexing/resolution. Also trigger when comparing performance before/after a change. |
This skill helps you profile the Rubydex indexer to find CPU and memory bottlenecks.
The indexer has a multi-phase pipeline (listing → indexing → resolution → querying).
Use --stats to see which phase dominates, then profile to find what's expensive inside it.
Use samply — a sampling profiler that opens results in Firefox Profiler (in-browser). It captures call stacks at high frequency and produces interactive flamegraphs with filtering, timeline views, and per-function cost breakdowns.
Install if needed:
cargo install samplyProfiling needs optimized code with debug symbols so you get real function names in the flamegraph instead of mangled addresses. The workspace Cargo.toml has a custom profile for this:
# rust/Cargo.toml
[profile.profiling]
inherits = "release"
debug = true # Full debug symbols for readable flamegraphs
strip = false # Keep symbols in the binaryIf this profile doesn't exist yet, add it to rust/Cargo.toml before profiling. The
release profile uses lto = true, opt-level = 3, codegen-units = 1 — the profiling
profile inherits all of that and just adds debug info.
Build with:
cargo build --profile profilingThe binary lands at rust/target/profiling/rubydex_cli (not target/release/).
The first build is slow (LTO + single codegen unit must recompile everything). Subsequent
builds after small changes are faster since Cargo caches intermediate artifacts in
target/profiling/. Don't delete that directory between runs.
samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --statsThe --stats flag prints the timing breakdown and memory stats to stderr after completion,
so you get both the samply profile AND the summary stats in one run.
Useful samply flags:
--no-open— don't auto-open the browser (useful for scripted runs)--save-only— save the profile to disk without starting the local server; load later withsamply load <profile.json>
Use --stop-after to profile only up to a specific stage. This is useful when you want
a cleaner flamegraph focused on one phase without the noise of later stages:
# Profile only listing + indexing (skip resolution)
samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats --stop-after indexing
# Profile through resolution (skip querying)
samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats --stop-after resolutionValid --stop-after values: listing, indexing, resolution.
The user should have a DEFAULT_BENCH_WORKSPACE configured pointing to a target codebase.
For synthetic corpora, use utils/bench with size arguments (tiny/small/medium/large/huge),
which auto-generates corpora at ../rubydex_corpora/<size>/.
When samply finishes, it automatically opens Firefox Profiler in the browser. Key things to guide the user through:
-
Call Tree tab — shows cumulative time per function, sorted by total cost. Start here to find the most expensive call paths.
-
Flame Graph tab — visual representation where width = time. Look for wide bars — those are the hot functions. Click to zoom in.
-
Timeline — shows activity over time. Useful for spotting if one phase is unexpectedly long or if there are idle gaps.
-
Filtering — type a function name in the filter box to isolate it.
-
Transform > Focus on subtree — right-click a function to see only its callees. Perfect for drilling into a specific phase.
-
Transform > Merge function — collapse recursive calls to see aggregate cost.
When you can't interact with a browser (e.g., running from a script or agent), use macOS's
built-in sample command for a text-based call tree:
# Start the indexer in the background, then sample it
rust/target/profiling/rubydex_cli <TARGET_PATH> --stats &
PID=$!
sample $PID -f /tmp/rubydex-sample.txt
wait $PIDOr for a simpler approach, sample for a fixed duration while the indexer runs:
rust/target/profiling/rubydex_cli <TARGET_PATH> --stats &
PID=$!
sleep 2 # let it get past listing/indexing into resolution
sample $PID 30 -f /tmp/rubydex-sample.txt # sample for 30 seconds
wait $PIDThe output is a text call tree with sample counts — sort by "self" samples to find hot functions.
Don't assume which functions are hot — let the data tell you. Hot paths change as the codebase evolves.
-
Sort by self-time (time spent in the function itself, not its callees). This reveals the actual hot spots. High total-time but low self-time means the function is just a caller — drill into its children.
-
Look for concentration vs. spread. A single function dominating self-time suggests an algorithmic fix (memoization, better data structure). Time spread across many functions suggests the workload itself is large and optimization requires a different approach.
-
Check for allocation pressure. If
alloc/malloc/reallocshow up prominently in self-time, the bottleneck is memory allocation, not computation.
For memory, the --stats flag already reports Maximum RSS at the end. For deeper memory
analysis:
utils/mem-use rust/target/profiling/rubydex_cli <TARGET_PATH> --statsThis wraps the command with /usr/bin/time -l and reports:
- Maximum Resident Set Size (RSS)
- Peak Memory Footprint
- Execution Time
When the user has made a change and wants to measure impact:
-
Get baseline — run on the current main/branch before changes:
samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats 2>&1 | tee /tmp/rubydex-baseline.txt
Save the samply profile URL from the browser (Firefox Profiler allows sharing via permalink).
-
Apply changes and rebuild:
cargo build --profile profiling
-
Get new measurement:
samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --stats 2>&1 | tee /tmp/rubydex-after.txt
-
Compare — parse both output files and show a side-by-side delta of:
- Total time and per-phase breakdown (listing, indexing, resolution, querying)
- Memory (RSS)
- Declaration/definition counts (sanity check that output is equivalent)
Present the comparison as a formatted table showing absolute values and % change.
When the user just wants timing/memory numbers without the full profiler overhead:
# Release build (faster than profiling profile since no debug symbols)
cargo build --release
utils/bench # uses DEFAULT_BENCH_WORKSPACE
utils/bench medium # synthetic corpus
utils/bench /path/to/project # specific directoryThe --stats flag on rubydex_cli prints a timing breakdown using the internal timer system.
The phases are:
| Phase | What it measures |
|---|---|
| Initialization | Setup and configuration |
| Listing | File discovery (walking directories, filtering .rb files) |
| Indexing | Parsing Ruby files and extracting definitions/references |
| Resolution | Computing fully qualified names, resolving constants, linearizing ancestors |
| Integrity check | Validating graph consistency (optional) |
| Querying | Building query indices |
| Cleanup | Time not accounted for by other phases |
It also prints:
- Maximum RSS in bytes and MB
- Declaration/definition counts and breakdown by kind
- Orphan rate (definitions not linked to declarations)
samply uses the dtrace backend on macOS which may need elevated permissions. If you get
permission errors:
sudo samply record rust/target/profiling/rubydex_cli <TARGET_PATH> --statsOr grant Terminal/iTerm the "Developer Tools" permission in System Settings > Privacy & Security.
If the flamegraph shows mostly [unknown] frames:
- Make sure you built with
--profile profiling(not--release) - Verify debug symbols:
dsymutil -s rust/target/profiling/rubydex_cli | head -20should show symbol entries. - On macOS, ensure
strip = falsein the profiling profile
Indexer performance can vary ±5% between runs due to OS scheduling, file system caching, etc. For reliable comparisons, run 3 times and take the median, or at minimum run twice and check consistency before drawing conclusions.