Add HotSpot JVM memory metrics via GCTracer uprobe and USDT probes by REASY · Pull Request #2305 · open-telemetry/opentelemetry-ebpf-instrumentation

REASY · 2026-06-11T18:04:58Z

This PR adds HotSpot JVM runtime metrics support to OBI. The remaining Beyla-facing product wiring will be handled in grafana/beyla.

Summary

Adds JVM runtime metric collection using:

a uprobe on HotSpot GCTracer::report_gc_heap_summary
HotSpot USDT probes:
- hotspot:mem__pool__gc__begin
- hotspot:mem__pool__gc__end

It also adds the generic USDT attach path needed for this work: .note.stapsdt parsing, USDT argument spec parsing, BPF-side argument lookup, semaphore/ref-counter support, and
cleanup of USDT IP map entries when links are closed.

Details

JVM runtime events are emitted from eBPF, decoded with generated BPF structs, converted into RuntimeMetricSnapshot JVM sections, and routed through the shared runtime metrics path
used by both the Go tracer and generic tracer.

Collection is controlled by:

jvm_runtime_metrics.enabled
jvm_runtime_metrics.sampling_interval

Export is controlled by the application_jvm metrics feature.

The BPF side uses jvm_sampling_interval_ns as both the sampling interval and feature gate, so disabled JVM runtime metrics do not do USDT argument reads. The BPF side uses jvm_sampling_interval_ns as both the sampling interval and feature gate, so disabled JVM runtime metrics do not do USDT argument reads. USDT spec lookups are keyed by PID, PID namespace ID, and instruction pointer. Userspace inserts both the host PID and NSpid aliases for the target process, scoped by PID namespace ID, so lookups work whether valid_pid() returns the host PID or a namespace PID. Emitted events still carry namespace PID and PID namespace ID for service decoration.

Metrics added:

jvm.memory.used / jvm_memory_used_bytes
jvm.memory.committed / jvm_memory_committed_bytes
jvm.memory.limit / jvm_memory_limit_bytes
jvm.memory.used_after_last_gc / jvm_memory_used_after_last_gc_bytes
obi.jvm.heap.used / obi_jvm_heap_used_bytes

Attributes added/used:

jvm.memory.type
jvm.memory.pool.name
jvm.gc.phase

Notes

GCTracer::report_gc_heap_summary and GCHeapSummary are HotSpot internals rather than a public JVM ABI. I checked the current OpenJDK LTS lines 8, 11, 17, 21, and 25, and the
method signature plus the relevant GCHeapSummary layout are stable across those versions.

I also evaluated github.com/parca-dev/usdt. This PR keeps the local USDT parser for now because OBI needs target-ELF-driven argument parsing and a BPF map ABI tailored to this
tracer. A TODO is left in the code to revisit Parca USDT if it exposes reusable pieces that fit this path.

Testing

make docker-generate
make compile
make lint
go test ./pkg/appolly/app/runtime
go test ./pkg/ebpf/common
go test ./pkg/runtimemetrics
go test ./pkg/internal/ebpf/generictracer
go test -run JVM ./pkg/export/otel ./pkg/export/prom
go test ./internal/test/integration -run '^TestJVMRuntimeMetrics$' -count=1 -v
go test ./internal/test/integration -run '^TestJVMRuntimeEventsLive$' -count=1 -v

linux-foundation-easycla · 2026-06-11T18:05:06Z

The committers listed above are authorized under a signed CLA.

✅ login: REASY / name: Artavazd Balaian (0c0e978, 44fad90, 5401eea, 58e545c, 887d38c, 9b0a08d, a7f444f, fb9a9e4)

grcevski · 2026-06-11T18:07:14Z

Just FYI, this PR started here accidentally grafana#50. I already put some comments in which were already addressed by @REASY. I will review this once more later today.

codecov · 2026-06-11T18:11:13Z

Codecov Report

❌ Patch coverage is 67.29607% with 433 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.04%. Comparing base (2f9f4c4) to head (6d65c91).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/ebpf/usdt.go	57.75%	108 Missing and 39 partials ⚠️
pkg/ebpf/instrumenter.go	57.00%	72 Missing and 20 partials ⚠️
pkg/internal/ebpf/generictracer/generictracer.go	68.51%	38 Missing and 13 partials ⚠️
pkg/export/otel/metrics_jvm.go	66.08%	32 Missing and 7 partials ⚠️
pkg/export/otel/metrics_runtime.go	55.31%	13 Missing and 8 partials ⚠️
pkg/ebpf/common/jvm.go	58.13%	14 Missing and 4 partials ⚠️
pkg/appolly/app/runtime/jvm.go	83.56%	11 Missing and 1 partial ⚠️
pkg/runtimemetrics/reader.go	72.72%	9 Missing and 3 partials ⚠️
pkg/export/prom/prom_jvm.go	85.29%	8 Missing and 2 partials ⚠️
pkg/ebpf/common/common.go	52.94%	6 Missing and 2 partials ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2305      +/-   ##
==========================================
- Coverage   69.22%   69.04%   -0.18%     
==========================================
  Files         325      331       +6     
  Lines       42738    43937    +1199     
==========================================
+ Hits        29584    30337     +753     
- Misses      11449    11792     +343     
- Partials     1705     1808     +103

Flag	Coverage Δ
integration-test	`50.96% <68.41%> (-0.19%)`	⬇️
integration-test-arm	`27.14% <15.59%> (-0.33%)`	⬇️
integration-test-vm-5.15-lts	`28.70% <16.32%> (-1.84%)`	⬇️
integration-test-vm-6.18-lts	`28.41% <16.32%> (-0.69%)`	⬇️
k8s-integration-test	`37.24% <15.79%> (-0.60%)`	⬇️
oats-test	`35.57% <15.70%> (-1.14%)`	⬇️
unittests	`62.54% <59.64%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mmat11 · 2026-06-11T18:13:59Z

I'll review this carefully tomorrow, I see the USDT probes parsing/linking implementation in here which should be out of scope -- have you tried using https://github.com/parca-dev/usdt so we have the basic infrastructure already done for us?

REASY · 2026-06-11T18:23:19Z

I'll review this carefully tomorrow, I see the USDT probes parsing/linking implementation in here which should be out of scope -- have you tried using https://github.com/parca-dev/usdt so we have the basic infrastructure already done for us?

Hi, @mmat11

I can't comment on the maturity of that library and I do not see the usage of it here. If we accept that risk, I can refactor and use it.

Copilot

Pull request overview

This PR adds JVM (HotSpot) runtime memory metrics support to OBI by instrumenting HotSpot via a GCTracer::report_gc_heap_summary uprobe and HotSpot USDT probes, decoding the resulting eBPF events into Go JVMRuntimeEvents, and exporting them through both OTLP and Prometheus under a new application_jvm feature.

Changes:

Add JVM runtime metrics event types, decoding, and pipeline wiring (shared ringbuf → JVMRuntimeEvents queue → OTLP/Prom exporters).
Add first-class USDT attach support (USDT note parsing + argument-spec parsing + spec/IP maps keyed by PID namespace/PID/IP).
Add configuration, semantic registry schema entries, and integration/live tests for JVM metrics/events.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
schemas/obi/groups/jvm_runtime.yaml	Adds OBI registry entries for JVM runtime attributes and the `beyla.jvm.heap.used` metric.
pkg/obi/config.go	Introduces `jvm_runtime_metrics` config and validation.
pkg/obi/config_test.go	Adds config tests for JVM runtime metrics settings and adjusts env isolation.
pkg/internal/procs/procsym.go	Extends `Sym` to retain the matched ELF symbol name.
pkg/internal/procs/proclang_linux.go	Adds exact vs substring symbol lookup for ELF symbol resolution.
pkg/internal/procs/proclang_linux_test.go	Adds tests covering exact and substring symbol lookup helpers.
pkg/internal/procs/proclang_darwin.go	Adds stub for substring symbol lookup on Darwin.
pkg/internal/ebpf/gotracer/gotracer.go	Routes JVM runtime metric records from the shared ring buffer to the JVM handler.
pkg/internal/ebpf/generictracer/jvm_live_test.go	Adds a privileged live eBPF test for JVM runtime events using real HotSpot probes.
pkg/internal/ebpf/generictracer/generictracer.go	Enables JVM runtime metric constants, adds libjvm uprobe + USDT probes, and dispatches JVM metric records.
pkg/internal/ebpf/generictracer/generictracer_test.go	Adds unit tests for JVM record decoding/dispatch and USDT probe exposure.
pkg/internal/ebpf/generictracer/generictracer_notlinux.go	Keeps non-Linux build stubs compiling after signature formatting change.
pkg/internal/appolly/appolly.go	Wires a JVM runtime events queue into the appolly pipeline when enabled.
pkg/export/prom/prom_jvm.go	Adds Prometheus exporter for JVM runtime metrics with expiring label sets.
pkg/export/prom/prom_jvm_test.go	Adds Prometheus exporter unit tests for JVM runtime metrics.
pkg/export/otel/metrics_jvm.go	Adds OTLP metrics exporter for JVM runtime metrics (gauges + “current value” up-down counters).
pkg/export/otel/metrics_jvm_test.go	Adds OTLP exporter unit tests validating produced metric shapes/attributes.
pkg/export/feature.go	Adds `FeatureApplicationJVM` and feature-mapper entry.
pkg/export/feature_test.go	Extends feature parsing tests to cover `application_jvm` and alias behavior.
pkg/export/attributes/names/attrs.go	Adds JVM attribute name constants (`jvm.memory.*`, `jvm.gc.phase`).
pkg/export/attributes/metric.go	Adds JVM metric name definitions for Prom/OTLP naming.
pkg/export/attributes/attr_defs.go	Adds JVM metric attribute groups to attribute-selection definitions.
pkg/ebpf/usdt.go	Implements USDT note parsing, arg-spec parsing, and target resolution from ELF + proc maps.
pkg/ebpf/usdt_test.go	Adds unit tests for USDT note parsing and arg-spec parsing (x86_64 + arm64).
pkg/ebpf/tracer.go	Adds `USDTTracer` interface and a JVM runtime events queue on `ProcessTracer`.
pkg/ebpf/tracer_linux.go	Passes JVM runtime events queue into the EBPF event context and attaches USDT probes on executable setup.
pkg/ebpf/tracer_darwin.go	Keeps Darwin compilation working after `Run` signature formatting.
pkg/ebpf/instrumenter.go	Adds USDT instrumentation path (maps/specs/IP map population + uprobe attach at USDT IP).
pkg/ebpf/instrumenter_test.go	Adds tests for substring symbol matching behavior and USDT IP-map PID derivation.
pkg/ebpf/common/jvm.go	Adds JVM runtime metric event dispatch + decoding/decorating helpers.
pkg/ebpf/common/jvm_test.go	Adds unit tests for JVM runtime metric record handling behavior.
pkg/ebpf/common/common.go	Adds `SymbolMatcher`, `USDTSpecManager`, `USDTProbeDesc`, and JVM runtime events queue in `EBPFEventContext`.
pkg/appolly/instrumenter.go	Threads JVM runtime events into the metrics sub-pipeline and adds OTLP/Prom JVM endpoints.
pkg/appolly/app/runtime/jvm.go	Adds Go-side raw event decoding and JVM metric event modeling.
pkg/appolly/app/runtime/jvm_test.go	Adds unit tests for JVM raw payload decoding, time conversion, and metric mapping.
internal/test/integration/jvm_runtime_metrics_test.go	Adds integration test validating JVM runtime metrics via Prometheus queries + weaver validation.
internal/test/integration/jvm_runtime_events_live_test.go	Adds docker-compose based live JVM runtime event test runner.
internal/test/integration/docker-compose-jvm-runtime-metrics.yml	Adds integration test compose stack for JVM runtime metrics.
internal/test/integration/docker-compose-jvm-runtime-events-live.yml	Adds compose stack to run the privileged live JVM runtime event test.
internal/test/integration/configs/obi-config-jvm-runtime-metrics.yml	Adds OBI config enabling `application_jvm` and JVM runtime metrics sampling.
internal/test/integration/components/jvm-runtime-event-test/Dockerfile	Adds image for running the privileged JVM live test (Go + Temurin).
internal/test/integration/components/javatestserver/src/main/java/de/fstab/demo/greeting/GreetingController.java	Adds `/gc` endpoint to trigger GC activity for integration tests.
internal/test/integration/components/docker/compose.go	Adds `Compose.Run` helper and avoids waiting for `obi` stop when running abort-on-exit stacks.
bpf/generictracer/types/jvm.h	Adds JVM event/key types for the generic tracer BPF programs.
bpf/generictracer/maps/jvm_mem_pool_samples.h	Adds internal LRU map for JVM memory-pool sampling.
bpf/generictracer/maps/jvm_heap_summary_samples.h	Adds internal LRU map for JVM heap-summary sampling.
bpf/generictracer/jvm.h	Adds sampling helpers and runtime-config constants for JVM BPF logic.
bpf/generictracer/jvm.c	Adds BPF uprobe + USDT programs emitting JVM runtime events into the shared ring buffer.
bpf/generictracer/generictracer.c	Includes the new JVM BPF source into the generic tracer build.
bpf/common/usdt.h	Adds BPF-side USDT arg decoding keyed by PID namespace/PID/IP and spec maps.
bpf/common/usdt_types.h	Adds shared USDT types/constants for BPF programs.
bpf/common/maps/obi_usdt_specs.h	Adds internal array map holding USDT argument specs.
bpf/common/maps/obi_usdt_ip_to_spec_id.h	Adds internal hash map mapping (pid, ns, ip) to spec IDs.
bpf/common/event_defs.h	Adds JVM event type IDs for the shared ring buffer event stream.

grcevski · 2026-06-12T12:55:49Z

I'll review this carefully tomorrow, I see the USDT probes parsing/linking implementation in here which should be out of scope -- have you tried using https://github.com/parca-dev/usdt so we have the basic infrastructure already done for us?

Hi, @mmat11

I can't comment on the maturity of that library and I do not see the usage of it here. If we accept that risk, I can refactor and use it.

If it's possible to use parca-dev/usdt that would be much better. Parca is an eBPF profiler from Polar Signals and well maintained project. It would be great if we could use that instead of maintaining our own version, if it's compatible with what you are adding.

REASY · 2026-06-12T13:12:59Z

Hi, @grcevski @mmat11

I checked parca-dev/usdt. It can likely replace part of our Go-side USDT parsing/linking code, but it is not a full drop-in for this PR. The main concern is that its BPF helper path uses bpf_get_attach_cookie, and the project documents Linux 5.15+ as a requirement for that. This repo still needs to support older kernels, including 5.10 in CI, so adopting its BPF-side model directly would change our compatibility assumptions.

A reasonable path may be to use Parca’s ELF/USDT argument parsing on the Go side, then adapt the result into our existing map-based BPF lookup by pid/ns/ip. I’d also want to verify HotSpot probe args don’t need any syntax that our current parser supports but Parca’s parser/helper path does not.

WDYT?

grcevski · 2026-06-12T13:16:30Z

Hi, @grcevski @mmat11

I checked parca-dev/usdt. It can likely replace part of our Go-side USDT parsing/linking code, but it is not a full drop-in for this PR. The main concern is that its BPF helper path uses bpf_get_attach_cookie, and the project documents Linux 5.15+ as a requirement for that. This repo still needs to support older kernels, including 5.10 in CI, so adopting its BPF-side model directly would change our compatibility assumptions.

A reasonable path may be to use Parca’s ELF/USDT argument parsing on the Go side, then adapt the result into our existing map-based BPF lookup by pid/ns/ip. I’d also want to verify HotSpot probe args don’t need any syntax that our current parser supports but Parca’s parser/helper path does not.

WDYT?

Thanks for looking into it. If we can use any of their Go code that would be good, otherwise I'm good with having our own code and then adding a TODO to look into leveraging Parca USDT in the future.

grcevski · 2026-06-12T13:26:28Z

@@ -56,7 +58,12 @@ func newGraphBuilder(
 	tracesCh *msg.Queue[[]request.Span],
 	processEventsCh *msg.Queue[exec.ProcessEvent],
 	runtimeMetrics *msg.Queue[[]runtimemetrics.RuntimeMetricSnapshot],
+	jvmRuntimeEventsOpt ...*msg.Queue[[]jvmruntime.JVMRuntimeEvent],


I'm wondering if we can simplify this now, because at the moment we have these separate arguments that create separate pipelines and if we were to add runtime metrics for NodeJS, Python, Ruby..., it would get unwieldy.

Can we keep the base type as runtimemetrics.RuntimeMetricSnapshot but then inside the struct, refactor it to have go section and then java section. With this we can hide all the complexity and reduce memory usage, since each prom and otel metrics exporters have memory demands on expirers etc.

Just like we have setupGoRuntimeMeters we can have setup JVM meters etc.

@grcevski I've made the changes, let me know whether this is how you envisioned it.

REASY · 2026-06-12T13:49:29Z

@grcevski

I evaluated github.com/parca-dev/usdt, the blockers I found are:

The parser is host-architecture driven. At HEAD (aeb59ba), ParseUSDTArguments resolves registers through runtime.GOARCH. In OBI we need to parse USDT args based on the target ELF machine, because the instrumented binary/container may not match the collector build/runtime architecture.
The current Parca parser does not model x86 SIB operands, e.g. offset(base,index,scale). This branch currently handles those because some USDT argument expressions can reference both a base register and an index register with scale. Parca’s public arg model has CONST / REG / REG_DEREF, but no equivalent representation for that form.
The ELF note parser is narrower than the parser added here. At the checked commit it reads STAPSDT note fields as little-endian ELF64 values. The implementation here is target-ELF driven and handles the ELF class/data encoding explicitly.

Am I missing something?

grcevski · 2026-06-12T13:59:10Z

@grcevski

I evaluated github.com/parca-dev/usdt, the blockers I found are:

The parser is host-architecture driven. At HEAD (aeb59ba), ParseUSDTArguments resolves registers through runtime.GOARCH. In OBI we need to parse USDT args based on the target ELF machine, because the instrumented binary/container may not match the collector build/runtime architecture.

The current Parca parser does not model x86 SIB operands, e.g. offset(base,index,scale). This branch currently handles those because some USDT argument expressions can reference both a base register and an index register with scale. Parca’s public arg model has CONST / REG / REG_DEREF, but no equivalent representation for that form.

The ELF note parser is narrower than the parser added here. At the checked commit it reads STAPSDT note fields as little-endian ELF64 values. The implementation here is target-ELF driven and handles the ELF class/data encoding explicitly.

Am I missing something?

That's fine then! Let's keep your code, we have this documented now and if we decide to refactor in the future we can do it. I only had one other comment, moving the handling of the JVM metrics signals under the runtime metrics umbrella, so we don't proliferate even more code in the future.

You can use the same top object, just put Go as separate pointer and JVM as separate pointer. Then the metrics code can do checks if go != nil, emit Go runtime metrics, if jvm != nil emit JVM runtime metrics etc.

I also really like your design decision to have the last timestamp the metrics were captured, so if you get flurry of GC activity you don't spam OBI now with all of that.

Copilot

Pull request overview

Copilot reviewed 61 out of 61 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

pkg/export/prom/prom_runtime.go:1

The continue inside the Go-branch drops JVM metrics if a snapshot ever carries both Go and JVM payloads (or if that becomes possible later). Instead of continuing the loop on non-Go services, only skip Go collection (e.g., guard collectGoRuntimeMetrics with a language check) and still allow JVM collection to run.
pkg/export/otel/metrics_runtime.go:1
The early return inside each branch can incorrectly prevent recording the other runtime section if both snapshot.Go and snapshot.JVM are set. Use branch-local guards (skip just the mismatching section) rather than returning from recordRuntimeMetrics, so JVM recording isn't accidentally suppressed by the Go-language check (and vice versa).
pkg/internal/procs/proclang_linux.go:1
Substring matching can be ambiguous: multiple ELF symbols may contain the same requested substring, and the current logic will silently pick whichever symbol is visited last, potentially attaching uprobes to the wrong function. Consider detecting multiple matches per substring (and failing for required probes or logging a clear warning + choosing deterministically, e.g., prefer exact match, then shortest-name match, or first match in address order).

mmat11

did a first pass, looks good!

mmat11 · 2026-06-12T14:58:11Z

-				return s, true, nil
-			}
-			return s, ignore, err
+			return p.processSharedRingbufRecord(ctx, parseContext, cfg, record)


can't JVM metrics be handled inside ReadBPFTraceAsSpan?

The current reason it is separate is that JVM metric records are not spans and need EBPFEventContext.RuntimeMetrics.

We can add a shared common ring-buffer record handler that first handles JVM metric records, then delegates to ReadBPFTraceAsSpan. That avoids embedding metric side effects into a span decoder while addressing the duplication concern.

Right, cool, so this is the same pattern as the Go tracer uses for Go runtime metrics.

mmat11 · 2026-06-12T15:55:59Z

+#include <common/usdt_types.h>
+
+struct {
+    __uint(type, BPF_MAP_TYPE_HASH);


we should probably use an HASH_LRU here

I already added explicit key cleanup on USDT link close in 9a2494c. LRU can evict active probe mappings and silently drop metrics. Plain hash plus explicit cleanup gives deterministic behavior.

Actually it doesn't hurt to change this to LRU hash, it will still be deterministic, one thing that won't happen is leak if for some reason the delete fails.

mmat11 · 2026-06-12T17:11:25Z

/all-vm-tests

github-actions · 2026-06-12T17:11:36Z

Dispatched Integration tests VM on all kernels for 9a2494ceae1d6556c4f9f69a09ee5a86c2137c43 (PR #2305). Watch progress: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/actions/workflows/workflow_integration_tests_vm.yml?query=event%3Aworkflow_dispatch

Copilot

Pull request overview

Copilot reviewed 66 out of 66 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

pkg/internal/procs/proclang_linux.go:1

FindExeSymbolsBySubstring can return ambiguous results if multiple ELF symbols contain the same substring: collectSymbols uses the substring as the map key and overwrites addresses[key] depending on symbol iteration order. This can cause nondeterministic probe attachment to the wrong symbol. Consider detecting collisions (e.g., if addresses[key] is already set) and either (a) return an error indicating ambiguity, or (b) implement a deterministic tie-breaker (e.g., prefer exact match, then shortest name / first encountered and never overwrite).
pkg/internal/procs/proclang_linux.go:1
FindExeSymbolsBySubstring can return ambiguous results if multiple ELF symbols contain the same substring: collectSymbols uses the substring as the map key and overwrites addresses[key] depending on symbol iteration order. This can cause nondeterministic probe attachment to the wrong symbol. Consider detecting collisions (e.g., if addresses[key] is already set) and either (a) return an error indicating ambiguity, or (b) implement a deterministic tie-breaker (e.g., prefer exact match, then shortest name / first encountered and never overwrite).
pkg/internal/procs/proclang_linux.go:1
FindExeSymbolsBySubstring can return ambiguous results if multiple ELF symbols contain the same substring: collectSymbols uses the substring as the map key and overwrites addresses[key] depending on symbol iteration order. This can cause nondeterministic probe attachment to the wrong symbol. Consider detecting collisions (e.g., if addresses[key] is already set) and either (a) return an error indicating ambiguity, or (b) implement a deterministic tie-breaker (e.g., prefer exact match, then shortest name / first encountered and never overwrite).
pkg/ebpf/instrumenter.go:1
When goexec.FindReturnOffsets fails, the probe keeps StartOffset but ReturnOffsets remain empty and Skip is not set. For probes that require return instrumentation (i.e., probe.End != nil, and especially when probe.Required is true), silently proceeding can produce partially-instrumented programs without an explicit failure or skip, which is difficult to debug. Consider handling returnErr similarly to handleSymbolDataReadFailure: if probe.End != nil, either return an error when probe.Required is true, or mark the probe as Skip (optional case) to avoid attaching an incomplete probe set.
pkg/internal/appolly/appolly.go:1
This condition mixes && and || across lines without parentheses, which makes the intended grouping harder to read and easier to accidentally change incorrectly later. Consider adding parentheses to make the precedence explicit (e.g., group the “no runtime metrics features enabled” branch) and/or factoring into well-named booleans for clarity.

+        "sampling_interval": {
+          "type": "string",
+          "pattern": "^[0-9]+(ms|s|m)$",
+          "default": "1s",
+          "examples": [
+            "30s",
+            "5m",
+            "1ms"
+          ],
+          "x-env-var": "OBI_JVM_RUNTIME_METRICS_SAMPLING_INTERVAL"
+        }


grcevski · 2026-06-16T13:19:59Z

Hi @REASY, can you please sync with main, we had some tests break which are now fixed, but they are blocking your PR. If you sync with main I expect we'll get green CI.

grcevski

This is looking pretty good! Thanks for all the hard work @REASY!

I think we just need to refactor and clean-up the way the runtime metrics are checked and used throughout the code. I'd like to have one global helper and common functions, which then internally check for Go and JVM, or don't check for now and we can add that later. But, it's important that all runtime metrics handling to be encapsulated as runtime metrics for the rest of the code.

grcevski · 2026-06-16T14:16:54Z

 	}
 	ta.processInstances = maps.MultiCounter[uint64]{}
 	ta.EbpfEventContext.CommonPIDsFilter = ebpfcommon.NewPIDsFilter(&ta.Cfg.Discovery, slog.With("component", "ebpfCommon.CommonPIDsFilter"), ta.Metrics)
+	if ta.RuntimeMetrics != nil {


Just a question, I think this seems reasonable, but I'm wondering why we didn't need this for the Go metrics?

Oh I think I can answer my own question, for Go metrics if we just have a Go application they are created in the Go path.

grcevski · 2026-06-16T14:18:25Z

 	}

-	if jointMetricsConfig.Features.AppRuntime() {
+	if jointMetricsConfig.Features.AppRuntime() || jointMetricsConfig.Features.AppJVM() {


Is it possible to combine all runtime metrics under Features.AppRuntime(). We can add as a follow-up option to disable some if needed. That way the check will be much simpler.

I can do that, but want to confirm the intended config semantics first.

Current behavior treats application_runtime and application_jvm as separate runtime-metric feature gates: Go runtime metrics are enabled by AppRuntime(), JVM runtime metrics by AppJVM(). The upside is users can enable JVM runtime metrics independently. The downside is the code has to carry a small Go/JVM runtime-metrics split.

If we move all runtime metrics under Features.AppRuntime(), the code gets cleaner and application_runtime becomes the single umbrella for runtime metrics, which is probably better for future Node/Python/Ruby runtime metrics. The tradeoff is that existing configs with only application_jvm would no longer enable JVM runtime metrics unless users also set application_runtime, and we may still need internal per-runtime flags later to avoid initializing/exporting metrics users do not want.

My preference would be to keep the internal runtimemetrics.Enabled{Go,JVM} helper for clarity, but I’m fine changing the public gate so AppRuntime() is the umbrella if we accept the config behavior change.

grcevski · 2026-06-16T14:23:17Z

 			go func() {
-				if err := v.provider.ForceFlush(ctx); err != nil {
-					llog.Warn("error flushing evicted runtime metrics provider", "error", err)
+				if err := v.provider.Shutdown(ctx); err != nil {


grcevski · 2026-06-16T14:29:39Z

-				return s, true, nil
-			}
-			return s, ignore, err
+			return p.processSharedRingbufRecord(ctx, parseContext, cfg, record)


Right, cool, so this is the same pattern as the Go tracer uses for Go runtime metrics.

grcevski · 2026-06-16T14:31:05Z

+
+package generictracer
+
+import (


What is this test for, maybe it's better to just keep an integration test?

This is not intended to run as a regular unit/package test. It is behind the linux && jvm_live build tag and is invoked by the integration test suite through docker-compose-jvm-runtime-events-live.yml. I put the implementation near generictracer because it exercises the raw HotSpot USDT to shared-ringbuf to runtime-event path without Prometheus/OTel exporter layers. The higher-level TestJVMRuntimeMetrics covers exported Prometheus metrics. If you prefer, I can move the live test implementation fully under internal/test/integration, but functionally it is already only run as an integration test.`

grcevski · 2026-06-16T14:33:47Z

 		p.cfg,
 		p.bpfObjects.Events,
 		func(record *ringbuf.Record) (request.Span, bool, error) {
+			if handled, err := ebpfcommon.HandleJVMRuntimeMetricRecord(ctx, ebpfEventContext, record, p.pidsFilter, p.log); handled {


We need to collapse these under one common HandleRuntimeMetricsRecord which will be used by the Go tracer and the generic tracer. That way we don't expand the code here and in the generic tracer part as we add more tracers.

One thing to note, I think the current Go runtime metrics have a bug, they should also be handled by the generic tracer just like you added in this PR. So it's good to have a common function that does if jvm do X, if go do Y.

Done in bdc2cef. I collapsed the runtime metric dispatch through ebpfcommon.HandleRuntimeMetricsRecord and now generictracer uses the same common path as gotracer.

The helper handles Go runtime metric records directly, so generictracer will now consume Go runtime metrics too if they arrive there. For JVM records, generictracer passes a tracer-local handler into the common helper because the JVM payload structs are generated by bpf2go in the generictracer package, and moving that decoding into common would introduce an import cycle.

So the flow is now: common helper decides "is this a runtime metric record?", handles Go directly, and delegates JVM decoding where the generated types live.

Route JVM runtime metric events through the shared runtimemetrics.RuntimeMetricSnapshot queue instead of maintaining a dedicated JVM event pipeline. Add Go and JVM sections to runtime snapshots, adapt eBPF JVM event forwarding to the runtime metrics sender, and fold JVM Prometheus and OTEL meters into the existing runtime metrics exporters. Preserve per-runtime feature gating so Go-only and JVM-only configurations initialize only the required meter sections. Filter OTEL snapshots before allocating per-service reporters and shut down evicted runtime providers through sharedExporter to avoid leaking periodic readers. Update runtime, exporter, and generic tracer tests for the unified snapshot path. Verification: - go tool -modfile=internal/tools/go.mod golangci-lint run ./pkg/appolly ./pkg/appolly/discover ./pkg/internal/appolly ./pkg/ebpf ./pkg/ebpf/common ./pkg/export/otel ./pkg/export/prom ./pkg/runtimemetrics ./pkg/internal/ebpf/generictracer ./pkg/internal/ebpf/gotracer --timeout=6m - go test ./pkg/appolly ./pkg/appolly/discover ./pkg/internal/appolly ./pkg/ebpf ./pkg/ebpf/common ./pkg/export/prom ./pkg/runtimemetrics ./pkg/internal/ebpf/gotracer - go test ./pkg/export/otel -run 'TestRuntimeMetricsReporter|TestSetupRuntimeMeters' - GOOS=linux go test -c -o /tmp/generictracer.test ./pkg/internal/ebpf/generictracer - git diff --check

Track the USDT IP-map keys inserted for each attached target and delete them when the owning uprobe link is closed. This prevents stale IP-to-spec mappings from accumulating across process churn while keeping the existing hash map semantics. Wrap USDT links in an idempotent closer guarded by sync.Once so repeated or concurrent shutdown paths do not race the underlying eBPF link close or delete the same IP-map keys multiple times. Clean partially inserted IP-map keys on map update or attach failures, and make the substring symbol test architecture-aware so pkg/ebpf validates correctly on both amd64 and arm64. Tests: docker run --rm -v "/Users/abalaian/github/grafana/opentelemetry-ebpf-instrumentation":/src -v go-mod-cache:/go/pkg/mod -v go-build-cache:/root/.cache/go-build -w /src golang:1.26 go test ./pkg/ebpf -count=1 Tests: docker run --rm -v "/Users/abalaian/github/grafana/opentelemetry-ebpf-instrumentation":/src -v go-mod-cache:/go/pkg/mod -v go-build-cache:/root/.cache/go-build -w /src golang:1.26 go test -race ./pkg/ebpf -run 'TestUSDTLinkCloser' -count=1 Tests: GOOS=linux go test -c -o /tmp/pkg-ebpf.test ./pkg/ebpf Tests: go tool -modfile=internal/tools/go.mod golangci-lint run ./pkg/ebpf --timeout=6m Tests: docker run --rm -v "/Users/abalaian/github/grafana/opentelemetry-ebpf-instrumentation":/src -v go-mod-cache:/go/pkg/mod -v go-build-cache:/root/.cache/go-build -w /src golang:1.26 go tool -modfile=internal/tools/go.mod golangci-lint run ./pkg/ebpf --timeout=6m

Rename the HotSpot heap metric and JVM runtime metrics environment variables from Beyla-specific names to OBI names across config, exporters, schemas, and tests. Remove x86 SIB USDT argument support from the userspace parser and BPF ABI, add explicit layout and parser coverage, and parse STAPSDT note headers with typed binary reads. Expose the JVM memory-pool BPF event shape for generated layout checks, add USDT map value comments, warn when the runtime metrics queue is missing, and fix clang-tidy BPF diagnostics.

Define an SDT note header type and parse namesz, descsz, and type with binary.Read using the target ELF byte order. Add coverage for big-endian header decoding to keep the parser behavior explicit.

Regenerate the config schema and reference docs so the new jvm_runtime_metrics section and application_jvm metric feature are present in generated artifacts. Make TestConfig_Overrides independent of ambient OTLP protocol environment variables by unsetting them for the test and expecting the loaded config protocol fields to remain unset. Exporter setup still resolves the default protocol via its existing guess path.

Make the procs symbol lookup tests build a small unstripped Go fixture binary instead of inspecting the temporary go test executable, which can be stripped of the symbol table in CI. Update the config override test environment cleanup to satisfy lint while still testing with OTLP protocol variables absent.

Add an explicit unknown JVM memory type so exported memory labels are never empty for vendor-specific pools. Update generated configuration schemas to describe Go-compatible duration values, including hours, microseconds, fractional values, and compound durations.

Route Go and JVM runtime metric ring-buffer records through a single ebpf/common handler backed by the shared EBPF event context. Group Go and JVM runtime metric feature checks behind runtimemetrics.Enabled so exporters and queue creation do not maintain parallel runtime booleans.

Use an LRU hash for the USDT IP-to-spec map so stale process mappings cannot permanently fill the pinned map when processes churn. Share instrumentation path resolution between uprobes and USDT probes, make USDT probe discovery part of the tracer contract, and avoid double-registering USDT link closers. Tighten the BPF-side USDT lookup by using valid_pid consistently, adding a pt_regs size assertion, and simplifying verifier-friendly bounds handling.

Use valid_pid only as the BPF-side filter gate and key USDT IP lookups with pid_from_pid_tgid so userspace and BPF agree on the host PID used for uprobe attachment. Remove namespace PID alias insertion from the userspace IP map setup and drop the namespace field from obi_usdt_ip_key. Keep explicit padding in the key layout so the Go and C map-key ABI stays stable and free of implicit padding warnings.

Replace per-byte bpf_probe_read_user calls in JVM USDT string parsing with a single bounded bulk read. The destination buffer is still zeroed first and the copied length remains capped to k_jvm_raw_string_len - 1, preserving NUL termination while avoiding repeated user-memory probe overhead.

Move the JVM runtime metrics and PID filter checks into the top-level HotSpot memory-pool USDT probe preambles. This lets disabled or filtered processes return before decoding USDT arguments, and passes the validated pid_tgid and pid into the shared event helper.

Move JVM runtime metric ring buffer decoding into generictracer, where the bpf2go-generated event types are available, and use ReinterpretCast instead of manually maintained raw mirror structs. Keep the runtime package focused on mapping typed JVM fields into runtime metric events, and leave common runtime metric handling as a fallback consumer for JVM event types. Update tests to construct generated JVM BPF event payloads directly and assert the generated layouts used by the decoder.

Follow the established ring buffer dispatch pattern by assigning record.RawSample[0] to a local eventType before switching on it. Apply the same clarity pattern to the shared runtime metric fallback and the generictracer JVM runtime metric dispatcher.

Decorate only the first JVM memory-pool runtime event from a raw sample and copy its resolved service to the remaining fanned-out events, avoiding repeated PID filter lookups for events that share the same PID identity. Add a regression assertion for the single lookup behavior and document why peeking at the first event is sufficient. Keep lint clean by marking the unused shared runtime-metric logger parameter and adding the missing runtimemetrics vanity import.

Add a combined ELF symbol lookup helper that scans .symtab and .dynsym once while collecting both exact-name and substring matches. Use the combined lookup from gatherOffsetsImpl so probes with mixed symbol matchers do not force duplicate symbol-table traversal. Keep the existing exact and substring lookup APIs as wrappers around the combined path, and cover the combined lookup with a Linux test fixture.

Remove the redundant seen map from symbolNames because probe descriptors are already grouped by symbol name in the input map. Stop scanning a symbol's probe descriptors after the first probe with the requested matcher, preserving behavior while avoiding duplicate append bookkeeping.

Note that USDT specs and their manager IDs are append-only while link closers only clean up IP map entries.

Add an optional runtime metric record handler hook so tracers can decode payloads whose generated BPF types live outside pkg/ebpf/common. Use the shared HandleRuntimeMetricsRecord path from generictracer for both Go and JVM runtime metric events, while keeping JVM bpf2go decoding in the generictracer package to avoid an import cycle. Add coverage for custom runtime metric handler delegation.

Replace local JVM probe uses of __builtin_memset, __builtin_memcpy, and __builtin_memcmp with the optimized bpf_memset, bpf_memcpy, and bpf_memcmp helpers from bpf_builtins.h.

Remove the separate JVM runtime metrics BPF boolean and use jvm_sampling_interval_ns as the single feature gate. Set the interval to zero when JVM runtime metrics are disabled and to the configured positive interval when enabled. Add generictracer coverage for the constants map and verifier permutations for disabled and enabled interval values.

Restore PID namespace id to the USDT IP map key and insert the host PID plus namespaced PID aliases when registering HotSpot USDT probe specs. The host-PID-only key path prevented BPF-side USDT argument decoding from resolving the spec for memory-pool probes in container PID namespaces. Heap-summary uprobes kept working because they do not use the USDT spec lookup map, which hid the regression unless the live test asserted memory-pool events. Keep the existing closer-based IP map cleanup, extend the ABI layout tests, and restore the live JVM runtime event test to require memory-pool events.

REASY · 2026-06-19T03:04:41Z

The failure in CI was real. I brought namespace awareness back, 0ee9458, because the BPF-side USDT argument decoding lookup happens in the context of the probed task, and the PID identity used there is not always just the host PID.

he USDT attach itself is done from userspace using the host PID, but the BPF program resolves the argument spec at runtime from obi_usdt_ip_to_spec_id. That lookup key must match what BPF observes when the USDT probe fires. With PID filtering enabled, valid_pid(pid_tgid) can return the namespace PID, and the lookup also needs the PID namespace id to avoid collisions between containers where the same namespace PID can exist, e.g. PID 1 in multiple containers.

The host-PID-only refactor broke this path for HotSpot memory-pool probes: heap-summary events still worked because they are normal uprobes and do not use obi_usdt_arg() or the USDT spec map, but memory-pool events disappeared because obi_usdt_arg() could not resolve the USDT spec for the firing probe.

CC: @grcevski @rafaelroquetto

REASY requested a review from a team as a code owner June 11, 2026 18:04

REASY force-pushed the jvm-runtime-metrics branch from 0c0e978 to bb1e24d Compare June 11, 2026 18:14

REASY force-pushed the jvm-runtime-metrics branch from adab8d7 to 00a022c Compare June 11, 2026 20:21

REASY changed the title ~~Insights Add HotSpot JVM memory metrics via GCTracer uprobe and USDT probes~~ Add HotSpot JVM memory metrics via GCTracer uprobe and USDT probes Jun 12, 2026

mariomac requested a review from Copilot June 12, 2026 08:04

Copilot started reviewing on behalf of mariomac June 12, 2026 08:04 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread pkg/ebpf/instrumenter.go

grcevski reviewed Jun 12, 2026

View reviewed changes

REASY requested review from Copilot and grcevski June 12, 2026 15:02

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread bpf/common/maps/obi_usdt_ip_to_spec_id.h

mmat11 reviewed Jun 12, 2026

View reviewed changes

REASY requested review from Copilot and mmat11 June 15, 2026 03:51

Copilot AI reviewed Jun 15, 2026

View reviewed changes

REASY force-pushed the jvm-runtime-metrics branch from 5109750 to 12b12bb Compare June 15, 2026 03:57

grcevski reviewed Jun 16, 2026

View reviewed changes

REASY added 28 commits June 19, 2026 10:56

fix(ebpf): read STAPSDT note headers as structs

faf8469

Define an SDT note header type and parse namesz, descsz, and type with binary.Read using the target ELF byte order. Add coverage for big-endian header decoding to keep the parser behavior explicit.

fix(ebpf): drop unrelated h2_defs formatting

95bd435

fix: improve jvm_read_usdt_string

919dcc5

fix: undo unrelated changes

1832a1a

docs(ebpf): clarify USDT spec lifetime

6401353

Note that USDT specs and their manager IDs are append-only while link closers only clean up IP map entries.

refactor(bpf): use memory helpers in JVM probes

66b51dd

Replace local JVM probe uses of __builtin_memset, __builtin_memcpy, and __builtin_memcmp with the optimized bpf_memset, bpf_memcpy, and bpf_memcmp helpers from bpf_builtins.h.

fix(test): satisfy generictracer lint rule

ca5af3e

fix: undo the change

9415955

fix(schema): omit redundant duration description

51cefc2

REASY force-pushed the jvm-runtime-metrics branch from 6d65c91 to 0ee9458 Compare June 19, 2026 03:01

Conversation

REASY commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Notes

Testing

Uh oh!

linux-foundation-easycla Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grcevski commented Jun 11, 2026

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mmat11 commented Jun 11, 2026

Uh oh!

REASY commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

grcevski commented Jun 12, 2026

Uh oh!

REASY commented Jun 12, 2026

Uh oh!

grcevski commented Jun 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

REASY commented Jun 12, 2026

Uh oh!

grcevski commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

mmat11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmat11 commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

grcevski commented Jun 16, 2026

Uh oh!

REASY commented Jun 11, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 11, 2026 •

edited

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading

REASY Jun 17, 2026 •

edited

Loading

REASY commented Jun 19, 2026 •

edited

Loading