- OTLP Profiles export (
--otlp-endpoint) — stream profiling data via gRPC in OpenTelemetry Profiles v1development format to Pyroscope, devfiler, OTel Collector, or any OTLP-compatible backend. Proto compilation usesprotox(pure-Rust, noprotocbinary needed) withtonicfor gRPC transport. - Two export modes, auto-selected — pre-symbolized mode sends function names directly (works with Pyroscope out of the box); native-address mode sends real ELF virtual addresses with
/proc/pid/mapsmappings and htlhash build IDs (for devfiler's server-side symbolization). Mode is chosen automatically based on whether a symbol server is configured. - Symbol server (
symbol-servercrate) — standalone daemon that accepts ELF binary uploads from profile-bee, extracts symbols from.symtab/.dynsymwith Rust/C++ demangling, and serves devfiler-compatible symbfiles (zstd-compressed protobuf with delta-encoded address ranges). Devfiler auto-fetches symbols via--symb-endpoint. - Embedded symbol server (
--symbol-server-listen <port>) — runs the symbol server inside the profile-bee process for single-binary workflows. No separate daemon needed. Feature-gated behindsymbol-server(default on). - Automatic binary upload (
--symbol-server <url>) — discovers ELF binaries from/proc/*/mapsfor all running processes and uploads them to the symbol server in the background. Deduplicates by path, verifies ELF magic, with connect/request timeouts. - Continuous profiling mode (
--flush-interval <ms>) — run indefinitely (or until--timeexpires), uploading a batch of samples every N milliseconds. Headless alternative to--servewithout HTTP server overhead. - Shared
profile-bee-symbolscrate — htlhash FileId computation (SHA-256 of head+tail+length, compatible with devfiler/Elastic) and ELF symbol extraction, shared between profile-bee and symbol-server.
- OTLP works in all modes —
--tui,--serve,--tui --serve(combined), batch, and--flush-intervalall support OTLP export via--otlp-endpoint. - Real instruction pointer addresses —
StackFrameInfo.addressis now populated with raw IPs after blazesym symbolization, enabling the native-address OTLP path. - ELF VA normalization — runtime virtual addresses are normalized to ELF VA space (
runtime_addr - mapping_start + file_offset) so devfiler's SymTree interval lookups resolve correctly. - Non-fatal OTLP errors — transport failures log a warning and reset the gRPC client for automatic reconnection on the next flush cycle. Profiling and local output sinks continue uninterrupted.
- 10-second timeouts on all OTLP
connect()andexport()calls viatokio::time::timeoutto prevent indefinite blocking. - Atomic symbfile writes — symbol-server uses temp+rename for ranges and metadata.json to prevent partial entries.
- Blocking work off async workers — both symbol-server and embedded server use
tokio::task::spawn_blockingfor ELF parsing and symbfile generation. - Readiness signal for embedded server —
spawn_embedded_serverreturns aoneshot::Receiver<()>that fires afterTcpListener::bindsucceeds, eliminating the startup race. - C++ demangling priority — C++ Itanium demangling is attempted before Rust to prevent
_ZN...names from being mis-parsed byrustc_demangle. - Feature independence —
opt.tuiis cfg-gated in the OTLP block sootlpcompiles withouttuifeature.
| Flag | Description |
|---|---|
--otlp-endpoint <host:port> |
OTLP gRPC endpoint for profile export |
--otlp-insecure |
Use plaintext gRPC (default: true) |
--otlp-service-name <name> |
Service name for resource attributes |
--symbol-server <url> |
External symbol server URL for binary upload |
--symbol-server-listen <port> |
Embedded symbol server port |
--flush-interval <ms> |
Continuous profiling flush interval |
- New
docs/otlp_export.md— full guide covering architecture, CLI flags, Pyroscope/devfiler setup, symbol server endpoints, and symbolization approaches across the ecosystem (devfiler, Pyroscope, Parca, OTel Collector, Grafana Alloy). - Updated
docs/NEXT_STEPS.mdwith OTLP export status and future enhancement roadmap. - Updated README with OTLP and symbol-server feature descriptions and quick-start examples.
- Node.js profiling support — profile Node.js applications with JavaScript function names resolved via V8's perf-map files. When spawning via
probee -- node app.js, automatically injectsNODE_OPTIONSwith--perf-basic-prof(writes/tmp/perf-<pid>.map) and--interpreted-frames-native-stack(enables frame pointers in interpreter frames). blazesym's built-in perf-map support reads these files during symbolization. - V8 symbol formatting — V8 perf-map symbols like
LazyCompile:*processData /app/server.js:42:5are formatted into clean display names (processData (server.js:42)) for readable flamegraphs. Handles all V8 symbol types: optimized (*), interpreted (~), builtins, stubs, regex, and eval. - FP-only zones for JIT code — anonymous executable memory mappings (V8 JIT, JVM HotSpot, etc.) are registered in the eBPF LPM trie with
shard_id=SHARD_NONEso the unwinder uses frame-pointer walking through JIT regions while preserving DWARF unwinding for surrounding native frames. Enables correct mixed native/JIT stack traces with--dwarf. - Node.js missing perf-map warning — when profiling an existing Node.js process via
--pidwithout a perf-map file, prints a warning with instructions to restart with--perf-basic-profor useprobee -- node <script>for automatic injection.
- E2E test infrastructure —
run_testnow recognizes exit code 77 as SKIP (autotools convention) with distinct yellow status output, so missing optional dependencies (e.g.node) are reported accurately instead of silently passing. - Node.js E2E tests — three new test cases: sample collection, JS function name resolution, and DWARF mode with FP-only JIT zones. Tests skip gracefully when
nodeis not installed.
- Unified output flag (
-o/--output) — replace per-format flags (--svg,--html,--json,--collapse,--pprof,--codeguru) with a single repeatable-o <file>flag. Format is inferred from file extension:.svg,.html,.json,.folded,.pb.gz,.pprof,.codeguru.json. Multiple outputs at once:-o flame.svg -o profile.pb.gz. (#88) - Unified event/probe flag (
-e/--event) — replace--kprobe,--uprobe,--tracepointwith a single-e <spec>flag using prefix syntax:kprobe:fn,uprobe:spec,uretprobe:spec,tracepoint:cat:name(short forms:k:,u:,ur:,t:,tp:). Bare names default to uprobe. (#25) - Output format inference —
infer_output_format()maps file extensions to output sinks. Compound extensions like.codeguru.jsonand.cg.jsonare supported. Unknown extensions produce a clear error message listing valid options. - Event prefix parsing —
parse_event_prefix()inprobe_spec.rswith doc-tests covering all 9 prefix variants.EventKindenum exported from the library for programmatic use.
- Legacy flags preserved —
--svg,--html,--json,--collapse,--pprof,--codeguru,--kprobe,--uprobe,--tracepointcontinue to work (hidden from--helpbut documented in the examples section). 100% backward compatible. - Simplified help output —
--helpnow shows-oand-eas the primary interface, with legacy flags mentioned in the examples footer. Usage line updated to(sudo) probee [OPTIONS] [-o <FILE>...] [-e <EVENT>...] [-- <COMMAND>...]. --list-probesaccepts unified prefix —--list-probes 'uprobe:pthread_*'works alongside bare specs.- TUI
build_profiler_configuses event parsing — both batch and TUI paths share the sameparse_event_specs()code path. - README and docs updated — all ~37 example commands converted to new syntax.
- eBPF process lifecycle tracking — new
sched_process_execand broadenedsched_process_exittracepoints detect process exec and exit events in the kernel. Events are delivered to userspace via a dedicated ring buffer. Enabled automatically when DWARF unwinding is active; agents can enable it explicitly viaSessionConfig::track_process_lifecycle. (#89) ProcessMetadataCache(process_metadatamodule) — lazy, capacity-bounded cache of per-process metadata read from/proc/[pid]/. Providescmdline,cwd,environ,exe, and mount namespace inode for any PID seen during profiling. Integrates with lifecycle events for automatic cache invalidation (exec) and eviction (exit). Agents can enrich stack traces withcache.get_or_load(pid)andcache.environ_var(pid, "MY_VAR").ProcessEventshared type — unified 16-byte#[repr(C)]struct inprofile-bee-commoncarrying event type, PID, and timestamp for both exec and exit events.
- Thread exit filtering in eBPF —
handle_process_exit()andhandle_process_exec()return early whentid != tgid, avoiding thousands of spurious ring buffer events on thread-heavy workloads (Java, Go runtime threads). - Deferred metadata eviction — exit events are queued in
pending_exit_pidsand evicted at the start of the nextdrain_events()cycle, covering the async delivery race where StackInfo and exit event for the same PID arrive in different drain windows. - Trace count flushed on exec — when a PID execs, its accumulated
trace_countentries are flushed before cache invalidation, so the old binary's samples are not lost. - PID reuse detection —
ProcessMetadataCache::get_or_load()validates cached entries against/proc/[pid]/statstarttime; mismatches trigger automatic reload. - DWARF table reload on exec — exec events are forwarded to the DWARF thread, which reloads unwind tables for the new binary.
- Debug impl redacts secrets —
ProcessMetadata'sDebugoutput shows entry counts instead of rawenviron/cmdlinevalues, preventing accidental secret leakage in logs. - Robust
/procstat handling —ProcessMetadata::load()returnsNonewhen/proc/[pid]/statis unreadable (process already exited), preventingstart_time = 0entries from bypassing PID-reuse detection.get_or_load()removes stale entries when the process disappears between cache hit and validation.
- Group by process (
--group-by-process) — prefix each stack withprocess_name (pid)to split flamegraphs into per-process sub-trees. Works with all output formats. (#86) - TUI process list view — new Processes tab (via Tab key) showing all processes sorted by sample count with CPU% bar visualization. Enter to zoom into a process.
- TUI expandable call tree — press
tto toggle tree mode in Top or Processes views. Shows a perf report-style expandable call tree with overhead% and self%, expand/collapse with Enter/h. Expanded state persists across live data refreshes. - TUI PID mode toggle — press
pto toggle PID mode on the fly, splitting the flamegraph by process without restarting. - CodeGuru idle stack classification — idle/swapper stacks (pid == 0) now use the
IDLEcounter type, which CodeGuru excludes from CPU and Latency views.
- Library API (
ProfilingSession) — profile-bee can now be used as a Rust library, not just a CLI binary.ProfilingSession::new(config)consolidates the entire eBPF + DWARF setup sequence into a single call. Supports batch and streaming modes viaOutputSinktrait. (#48) - pprof output (
--pprof) — gzip-compressed protobuf format compatible withgo tool pprof, Grafana/Pyroscope, Speedscope, Datadog, and Polar Signals/Parca. - AWS CodeGuru Profiler JSON (
--codeguru) — recursive call-tree format with proper thread-state counter types (RUNNABLEfor on-CPU,WAITINGfor off-CPU). Uploadable via AWS CLI. - Direct CodeGuru upload (
--codeguru-upload) — uploads profiles directly to CodeGuru'sPostAgentProfileAPI using the AWS SDK. Uses the standard credential chain. Included by default (behindawsfeature flag). - CodeGuru format documentation —
docs/codeguru_format.mdwith full schema reference covering all 7 counter types, metadata fields, and CodeGuru console visualization views.
- Library refactor — moved ~660 lines of orchestration from the binary into reusable library modules:
session.rs,event_loop.rs,pipeline.rs. Binary reduced from 1753 to ~1100 lines. - println/eprintln replaced with tracing in all library code (spawn.rs, ebpf.rs, html.rs, trace_handler.rs).
- Parameterized web server port —
html::start_server_on_port(port)for library consumers. - Process-exit monitoring added to TUI modes —
--pidauto-stop and DWARF cleanup now work in--tuiand--tui --servemodes (was missing). - Ring buffer tasks exit on receiver drop — prevents background tasks from running indefinitely after profiling completes.
- Ctrl-C handler logs errors instead of treating signal setup failure as Ctrl-C received.
- Size checks on ring buffer reads — defensive guard before unsafe pointer casts.
syscall_name_to_nrgated to x86_64 —#[cfg(target_arch = "x86_64")]with stubs for other architectures.- Sink duration accuracy — pprof and CodeGuru sinks now receive actual profiling duration via
set_actual_duration_msinstead of the requested timeout. - TUI warns about ignored output flags —
--tui --pprofetc. now prints a warning instead of silently dropping the output.
- Fix misleading "DWARF-unwound stack" log message appearing without
--dwarf(the stacked_pointers map is shared by FP and DWARF paths). - Fix
CodeGuruUploadSinkpanic: "Cannot start a runtime from within a runtime" — replacedblock_on()withspawn()+ channel bridge. - Fix
event_loop.rsbatch-mode channel disconnect not settingstopped = true. - Fix
session.rsgroup_by_cpuhardcoded tofalse— now wired throughSessionConfig.
- Replace PROC_INFO HashMap with EXEC_MAPPINGS LPM trie — O(log n) address-to-mapping lookups replace O(n) linear scan, removing the per-process 8-mapping limit. Supports up to 200K total LPM entries across all processes.
- Correct ExecMappingKey alignment — changed from
#[repr(C, packed)]to#[repr(C)]with explicit padding to avoid unaligned 64-bit access in eBPF and userspace. - Prevent overflow in
summarize_address_range— use u128 arithmetic so range-length computation cannot wrap when address ranges approachu64::MAX. - DWARF mapping refresh rebuilds from scratch — process mappings are recomputed each scan instead of cloning and skipping existing ranges, preventing stale shard/load_bias data when memory ranges are reused.
- Always propagate exec mapping updates to eBPF —
send_refreshis now called after every successfulrefresh_process, not only when new shards are created, ensuring dlopen'd libraries with cached binaries get LPM trie entries (matches lightswitch's approach of unconditionally writing mappings to BPF). - Reduce refresh channel overhead —
send_refreshnow only clones the changed process's mappings instead of all tracked processes. - Surface LPM trie insert failures — replaced silent
let _ = trie.insert(...)with explicit error logging including tgid, mapping range, and block details. - Guard against invalid mapping ranges — added
debug_assertto catch corrupted begin/end values before LPM trie population. - Derive LPM key bit-width from struct size — replaced magic
128inLpmKey::newcalls withEXEC_MAPPING_KEY_BITSconstant derived fromsize_of::<ExecMappingKey>().
- Added 7 unit tests for
summarize_address_rangeedge cases (empty range, single address, power-of-two boundaries, nearu64::MAX, full address space).
- Updated DWARF design docs to reflect LPM trie architecture (replaces old PROC_INFO references).
- Off-CPU profiling (
--off-cpu) — trace context switches viakprobe:finish_task_switchto find where threads block on I/O, locks, or sleep. Per-CPU tracking with configurable block-time filters (--min-block-time,--max-block-time). All output formats supported (TUI, SVG, HTML, JSON, collapse, web). - TUI mouse support — click to select frames, double-click to zoom, scroll wheel navigation. Enabled by default (
--no-tui-mouseto disable). - Web UI improvements — rewritten flamegraph viewer with viewport-sized canvas, live controls, client-side accumulate mode, sort-by-name, bottom-up toggle, pause/refresh, green live indicator. Fixed zoom, Y layout, and ancestor frame visibility.
- ArrayOfMaps for DWARF shards — replaced 8 individual
shard_0..shard_7eBPF Array maps with a singleBPF_MAP_TYPE_ARRAY_OF_MAPS. Supports up to 64 binaries (was 8) with up to 131K unwind entries each (was 65K). Inner maps created on-demand to reduce idle memory usage. - Oversized unwind tables truncated instead of skipped — large binaries (e.g. glibc) now get partial DWARF coverage instead of none.
- DWARF refresh and truncation logs moved to debug level — no longer interfere with TUI mode. Use
RUST_LOG=debugto see them. - Test fixture binaries removed from git — rebuilt from source via
tests/build_fixtures.sh. E2E test runner auto-detects missing or stale fixtures.
- Fix HTML output script injection vulnerability — escape user-controlled strings in generated HTML
- Fix HTML file write ordering — defer writes to avoid partial output on error
- Fix HTML replacement order for correct template substitution
- Fix eBPF
get_stackidtype inference with updated aya API (turbofish annotations) - Fix kernel <5.14 compatibility for ArrayOfMaps — use fixed
max_entriesmatching the eBPF template
aya/aya-ebpf: switched togithub.com/zz85/ayabrancharray-of-maps(addsBPF_MAP_TYPE_ARRAY_OF_MAPSsupport)
- E2E test suite expanded to 16 tests (added off-CPU profiling tests)
- Prebuilt eBPF binary updated for
cargo installcompatibility
The first release published to crates.io. Install with cargo install profile-bee — no nightly Rust required.
- Interactive TUI flamegraph viewer — live, real-time flamegraphs directly in your terminal with vim-style navigation, search, zoom, and freeze/unfreeze. Forked and adapted from flamelens. Three update modes: reset, accumulate, and decay.
- DWARF-based stack unwinding in eBPF — profiles binaries compiled without frame pointers (
-O2/-O3). Parses.eh_framesections into flat unwind tables loaded into eBPF maps for in-kernel stack walking. Supports PIE, shared libraries, vDSO, PLT stubs, and signal trampolines. - Smart uprobe/uretprobe targeting — GDB-style symbol resolution with auto-discovery across loaded ELF binaries. Supports glob (
pthread_*), regex (/pattern/), demangled C++/Rust names, source file:line (DWARF), explicit library prefixes, and multi-attach. Discovery mode (--list-probes) to inspect matches without attaching. - Raw tracepoint support — bypasses BPF LSM restrictions on
PERF_EVENT_IOC_SET_BPF. Multi-tier fallback: syscall-specific raw_tp → task_pt_regs raw_tp → generic raw_tp → perf tracepoint.
- TUI mode (
--tui) with real-time flamegraph updates, configurable refresh interval (--tui-refresh-ms), and update modes (--update-mode reset|accumulate|decay) - Combined TUI + web server mode (
--tui --serve) for simultaneous terminal and browser access - DWARF unwinding (
--dwarf) with background thread for detectingdlopen-loaded libraries within ~1 second - Smart uprobes (
--uprobe) with glob, regex, demangled name, and source:line matching - Uprobe discovery mode (
--list-probes) to search symbols without attaching - Raw tracepoint attachment for kprobe, tracepoint, and syscall events
- Frame pointer unwinding in eBPF — custom stack walker using
pt_regsfor deeper stacks thanbpf_get_stackid - Process spawning (
--cmd,-- <command>) — spawn a process and profile it, auto-terminates when it exits - PID exit detection — profiler automatically stops when
--pidtarget process exits - Prebuilt eBPF binary — bundled for
cargo installwithout nightly Rust;build.rsauto-detects fresh builds for development - blazesym integration — symbol resolution via blazesym library with Rust and C++ demangling
- Multi-process DWARF — system-wide profiling with per-process unwind tables and sharded eBPF array maps
- Fix TUI/serve modes stopping after 10s due to
--timedefaulting to 10000ms unconditionally - Fix
TracePointContextincorrectly cast topt_regs(tracepoint data struct != registers) - Fix
sys_exittracepoint filtering reading return value instead of syscall NR fromargs[1] - Fix syscall tracepoint fallback using per-syscall names (
sys_enter_write) that don't exist as raw tracepoints - Fix double-counting in headless
process_profiling_data - Fix combined mode missing stopping mechanisms (timer, Ctrl-C, child/PID exit)
- Fix shared library unwinding race condition
- Fix signal trampoline unwinding (
__restore_rt) - Fix PID filtering and empty stacks with
--cmd/-- - Fix spawned processes not terminated on profiler exit
- Fix timing issue: load DWARF tables before setting
TARGET_PID - Fix BPF verifier rejection with DWARF unwinding constants
- Binary names changed:
profile-bee→probee(primary) andpbee(short alias) --dwarfnow defaults tofalse(frame pointer unwinding is the default for stability/performance)--timebehavior changed in TUI/serve modes: defaults to 0 (unlimited) instead of 10000ms. CLI mode retains the 10s default.
- Compact
UnwindEntryfrom 32 → 12 bytes (u32 PC, deduplicated consecutive entries) - Sharded array maps for unwind tables (replaced single HashMap)
- Build-ID based caching for unwind table lookups
- Inode/metadata-based cache keys instead of reading full binaries
- BPF-based stack aggregation to reduce kernel ↔ userspace transfers
cargo install profile-beeworks on stable Rust (prebuilt eBPF binary bundled)- TUI feature enabled by default (
--no-default-featuresto exclude) - Crate metadata added for crates.io publishing (license, description, repository)
- GitHub Actions CI workflow for Rust packages and E2E tests
- E2E test framework (
tests/run_e2e.sh) with 14 test cases covering FP, DWARF, deep stacks, shared libraries, PIE, and Rust binaries - Test fixtures: C binaries in 6 variants (FP/no-FP × O0/O2), Rust binary, shared library, PIE, signal handler
- Linux x86_64
- DWARF unwinding: x86_64 only (ARM support planned)
- Kernel >= 4.15 for basic profiling, >= 5.15 for raw tracepoint with task_pt_regs