Skip to content

interpreter: add Golang #408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 8, 2025
Merged

interpreter: add Golang #408

merged 23 commits into from
Apr 8, 2025

Conversation

florianl
Copy link
Contributor

Add symbolization functionality for Go executables.

@florianl florianl force-pushed the interpreter-golang branch from 5ad982b to 9078329 Compare March 17, 2025 18:22
@florianl florianl marked this pull request as ready for review March 17, 2025 18:32
@florianl florianl requested review from a team as code owners March 17, 2025 18:32
Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments added. But before just blindly starting mixing Rust, I'd like a larger discussion a comments from all maintainers on feasibility of this.

I understand using existing Rust code might be faster, and the Rust code is likely good and tested.

On the other hand this the down sides are:

  • core maintainers should also know Rust (possibly true?)
  • CGO is slower than regular Go calls. not sure of the overhead.
  • there is room for non-trivial errors as the calls need to use unsafe Go, potentially in non-trivial ways

Additionally go symbolization could be done directly using Go standard runtime libraries, or by extending the existing elfgopclntab code we have intree.

Could you elaborate why embedded Rust was chosen instead of the two above mentioned approaches? Do these reasons justify the overhead that mixing Go and Rust brings both in maintanenance, complexity and debugging?

@christos68k
Copy link
Member

christos68k commented Mar 17, 2025

Could you elaborate why embedded Rust was chosen instead of the two above mentioned approaches? Do these reasons justify the overhead that mixing Go and Rust brings both in maintanenance, complexity and debugging?

Besides the main reasons you outlined (Rust code performant/safe/well tested), using symblib allows us to bring this feature to production quickly, and solves a number of issues we have with deployed Go binaries. We do plan to also use symblib for extracting/uploading symbols from other native (non-Go) binaries, albeit this doesn't need to take place in-process in the agent, and can be a separate utility.

Regardless, I'm not opposed to doing Go symbolization in Go, assuming someone implements it we could evaluate and transition. But we don't have that right now (and at least at Elastic, have no plans to work on it) and we do have symblib.

@fabled
Copy link
Contributor

fabled commented Mar 17, 2025

using symblib allows us to bring this feature to production quickly, and solves a number of issues we have with deployed Go binaries

If going this way, I'd rather then use symblib as the native code dynamic symbolizer. And have it symbolize native code. We could limit to using dynamic symbols only if wanted. But getting some symbols for C-code with this would make sense. Rather than special case this for Golang only.

Regardless, I'm not opposed to doing Go symbolization in Go, assuming someone implements it we could evaluate and transition. But we don't have that right now (and at least at Elastic, have no plans to work on it) and we do have symblib.

Also looking at Rust code a bit, it seems to mmap files which is sort of good, but also previously unused. Also since Go binaries can contain C-code and Dwarf, the symblib dwarf code will get used for these executable bringing a lot of potentially new unexpected things: like automatically uncompressing the compressed dwarf to large temporary files. Edit: I missed that the Rust API used only looks at gopclntab.

IMHO the symblib code would probably need a mode the host agent can safely use for all binaries to do dynamic/elf/gopclntab symbolization but skip dwarf and compressed things. The integration as "native symbolizer" would be much cleaner than trying to special Golang.

@florianl
Copy link
Contributor Author

IMHO the symblib code would probably need a mode the host agent can safely use for all binaries to do dynamic/elf/gopclntab symbolization but skip dwarf and compressed things. The integration as "native symbolizer" would be much cleaner than trying to special Golang.

That is the general idea going forward. At the moment the scope is limited and focused on Go as from this ecosystem there is the highest demand for Symbols.

}

frameID := libpf.NewFrameID(libpf.NewFileID(uint64(frame.File), uint64(frame.File)),
libpf.AddressOrLineno(symbolsSlice[0].line_number))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the AddressOrLineno part of frameID with 5bf7fd7. Previously pc was used, but this generated too much different frame IDs for the same function/source file/source line combination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does prevent using the data for PGO. Do you have statistics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't collect statistics. Like everywhere else for the on CPU sampling approach, leave frames are different which causes a high number of variance and differences in frame IDs.

Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional new comments to clean up a bit.

Also, how much the profiler executable size goes up with this now that it pulls static Rust stuff?

frameID := libpf.NewFrameID(libpf.NewFileID(uint64(frame.File), uint64(frame.File)),
libpf.AddressOrLineno(symbolsSlice[0].line_number))

trace.AppendFrameID(libpf.GolangFrame, frameID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this provide the mapping information similar as the non-symbolized native code is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point Go frames are treated as interpreter and no longer as native code. So I guess the answer here is no, and mapping information is not provided similar to other native code.

}

frameID := libpf.NewFrameID(libpf.NewFileID(uint64(frame.File), uint64(frame.File)),
libpf.AddressOrLineno(symbolsSlice[0].line_number))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does prevent using the data for PGO. Do you have statistics?

@fabled
Copy link
Contributor

fabled commented Mar 18, 2025

A little note that also #407 adds golang interpreter that does quite different things.

@florianl
Copy link
Contributor Author

Also, how much the profiler executable size goes up with this now that it pulls static Rust stuff?

#408 @ 7616b6c

$ ls -lia ebpf-profiler 
42649440 -rwxr-xr-x 1 user user 32535648 Mar 18 11:46 ebpf-profiler

main @ 422bd6b

$ ls -lia ebpf-profiler 
42649394 -rwxr-xr-x 1 user user 30313560 Mar 18 11:48 ebpf-profiler

@florianl florianl force-pushed the interpreter-golang branch from 3222c8d to 9a26452 Compare March 20, 2025 11:08
@florianl
Copy link
Contributor Author

Force pushed to resolve merge conflict with main branch.

@florianl florianl force-pushed the interpreter-golang branch 11 times, most recently from 8be0993 to 9a26452 Compare March 21, 2025 11:27
@florianl florianl mentioned this pull request Mar 21, 2025
@korniltsev
Copy link
Contributor

On the other hand this the down sides are:

Another potential downside is the use of malloc. Some implementations are known to not return the freed memory to the OS. Some implementations keep the freed memory in per thread arenas. This may lead to increased memory consumption, which may be amplified by the number of threads the cgo call was issued from (consider a cgo call allocates a total of 10Mib in small chunks. If we have multiple of these calls from different goroutines (and therefor threads - we do not control the thread by default)) and if we have 32 threads, eventually we may waste 320Mib.

It's possible to mitigate by locking the goroutine-thread. Nothing unsolvable, just something to keep in mind.

symbolsSlice := unsafe.Slice((*C.SymblibResolvedSymbol)(unsafe.Pointer(symbols.data)),
symbols.len)
if len(symbolsSlice) != 1 {
return fmt.Errorf("unexpected return for point lookup: %d", len(symbolsSlice))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1? Does it not resolve inlined functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - inlined functions are not resolved atm.

@florianl florianl force-pushed the interpreter-golang branch from df97089 to dbc4130 Compare April 7, 2025 07:14
@florianl florianl force-pushed the interpreter-golang branch from dbc4130 to 733570f Compare April 7, 2025 07:19
@florianl
Copy link
Contributor Author

florianl commented Apr 7, 2025

PR has been rebased on current main and conflicts got resolved.

friendly ping for feedback @fabled && @christos68k

Signed-off-by: Florian Lehner <[email protected]>
@florianl florianl force-pushed the interpreter-golang branch from ac92755 to b93d090 Compare April 7, 2025 12:48
Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can revisit later if using debug/gosym is feasible when the other prerequisites for it are done first.

Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I also tested locally and everything works as expected. I'm assuming the crashes you saw on your send were not related to pinning (otherwise they'd also manifest for me and @fabled) and I'm also assuming you can no longer trigger them on your end.

Signed-off-by: Florian Lehner <[email protected]>
@florianl florianl merged commit a996c24 into main Apr 8, 2025
26 checks passed
@florianl florianl deleted the interpreter-golang branch April 8, 2025 08:18
@florianl florianl self-assigned this Apr 23, 2025
cauemarcondes pushed a commit to cauemarcondes/kibana that referenced this pull request May 8, 2025
## Summary

OTel Semantic Conventions
[defines](open-telemetry/semantic-conventions#2003)
a type for Go and OTel eBPF profiler is about to start with pushing Go
frames (either with
open-telemetry/opentelemetry-ebpf-profiler#409
or
open-telemetry/opentelemetry-ebpf-profiler#408)

FYI: @elastic/ingest-otel-data

### Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

- [ ] ~~Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)~~
not relevant
- [ ]
~~[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials~~ not
relevant
- [ ] ~~[Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios~~ not relevant
- [ ] ~~If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)~~
not relevant
- [ ] ~~This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.~~
not relevant
- [ ] ~~[Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed~~ not relevant
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- `release_note:skip`

Signed-off-by: Florian Lehner <[email protected]>
(cherry picked from commit 7c4af05)
cauemarcondes added a commit to elastic/kibana that referenced this pull request May 8, 2025
# Backport

This will backport the following commits from `main` to `8.19`:
- [[Profiling] Add FrameType and color for Go
(#215697)](#215697)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Florian
Lehner","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-03-24T15:05:07Z","message":"[Profiling]
Add FrameType and color for Go (#215697)\n\n## Summary\n\nOTel Semantic
Conventions\n[defines](https://github.com/open-telemetry/semantic-conventions/pull/2003)\na
type for Go and OTel eBPF profiler is about to start with pushing
Go\nframes (either
with\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/409\nor\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/408)\n\nFYI:
@elastic/ingest-otel-data \n\n### Checklist\n\nCheck the PR satisfies
following conditions. \n\nReviewers should verify this PR satisfies this
list as well.\n\n- [ ] ~~Any text added follows [EUI's
writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\nsentence case text and includes
[i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)~~\nnot
relevant\n- [
]\n~~[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas
added for features that require explanation or tutorials~~
not\nrelevant\n- [ ] ~~[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios~~ not relevant\n- [
] ~~If a plugin configuration key changed, check if it needs to
be\nallowlisted in the cloud and added to the
[docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)~~\nnot
relevant\n- [ ] ~~This was checked for breaking HTTP API changes, and
any breaking\nchanges have been approved by the breaking-change
committee. The\n`release_note:breaking` label should be applied in these
situations.~~\nnot relevant\n- [ ] ~~[Flaky
Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\nused on any tests changed~~ not relevant\n- [x] The PR description
includes the appropriate Release Notes section,\nand the correct
`release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n-
`release_note:skip`\n\nSigned-off-by: Florian Lehner
<[email protected]>","sha":"7c4af051b22c64fae5ad532be25080152293446e","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Team:obs-ux-infra_services","v9.1.0"],"title":"[Profiling]
Add FrameType and color for
Go","number":215697,"url":"https://github.com/elastic/kibana/pull/215697","mergeCommit":{"message":"[Profiling]
Add FrameType and color for Go (#215697)\n\n## Summary\n\nOTel Semantic
Conventions\n[defines](https://github.com/open-telemetry/semantic-conventions/pull/2003)\na
type for Go and OTel eBPF profiler is about to start with pushing
Go\nframes (either
with\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/409\nor\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/408)\n\nFYI:
@elastic/ingest-otel-data \n\n### Checklist\n\nCheck the PR satisfies
following conditions. \n\nReviewers should verify this PR satisfies this
list as well.\n\n- [ ] ~~Any text added follows [EUI's
writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\nsentence case text and includes
[i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)~~\nnot
relevant\n- [
]\n~~[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas
added for features that require explanation or tutorials~~
not\nrelevant\n- [ ] ~~[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios~~ not relevant\n- [
] ~~If a plugin configuration key changed, check if it needs to
be\nallowlisted in the cloud and added to the
[docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)~~\nnot
relevant\n- [ ] ~~This was checked for breaking HTTP API changes, and
any breaking\nchanges have been approved by the breaking-change
committee. The\n`release_note:breaking` label should be applied in these
situations.~~\nnot relevant\n- [ ] ~~[Flaky
Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\nused on any tests changed~~ not relevant\n- [x] The PR description
includes the appropriate Release Notes section,\nand the correct
`release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n-
`release_note:skip`\n\nSigned-off-by: Florian Lehner
<[email protected]>","sha":"7c4af051b22c64fae5ad532be25080152293446e"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/215697","number":215697,"mergeCommit":{"message":"[Profiling]
Add FrameType and color for Go (#215697)\n\n## Summary\n\nOTel Semantic
Conventions\n[defines](https://github.com/open-telemetry/semantic-conventions/pull/2003)\na
type for Go and OTel eBPF profiler is about to start with pushing
Go\nframes (either
with\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/409\nor\nhttps://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/408)\n\nFYI:
@elastic/ingest-otel-data \n\n### Checklist\n\nCheck the PR satisfies
following conditions. \n\nReviewers should verify this PR satisfies this
list as well.\n\n- [ ] ~~Any text added follows [EUI's
writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\nsentence case text and includes
[i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)~~\nnot
relevant\n- [
]\n~~[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas
added for features that require explanation or tutorials~~
not\nrelevant\n- [ ] ~~[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios~~ not relevant\n- [
] ~~If a plugin configuration key changed, check if it needs to
be\nallowlisted in the cloud and added to the
[docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)~~\nnot
relevant\n- [ ] ~~This was checked for breaking HTTP API changes, and
any breaking\nchanges have been approved by the breaking-change
committee. The\n`release_note:breaking` label should be applied in these
situations.~~\nnot relevant\n- [ ] ~~[Flaky
Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\nused on any tests changed~~ not relevant\n- [x] The PR description
includes the appropriate Release Notes section,\nand the correct
`release_note:*` label is applied per
the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n-
`release_note:skip`\n\nSigned-off-by: Florian Lehner
<[email protected]>","sha":"7c4af051b22c64fae5ad532be25080152293446e"}}]}]
BACKPORT-->

Signed-off-by: Florian Lehner <[email protected]>
Co-authored-by: Florian Lehner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants