-
Notifications
You must be signed in to change notification settings - Fork 317
fix(python): handle cold interpreter func ranges #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- handle cold interpreter func chunks - add coredump alpine320, alpine320-nobuildid
I don't really like the debug-file symbol lookup. I'm thinking to rollback this change (and keep only the hardcoded table for now) until we find a better fix. |
Hacky but I solved a similar problem for luajit by using the size of the function by looking at the stack deltas. https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/luajit.go#L173 So maybe if the size of the function is less than some cut off (ie 1000) decide we have a .cold situation and walk the stack deltas and assume the largest function in the binary is the rest of the interpreter. The 3 largest functions in python:
So seems like there's a large margin for error. |
I don't think we can use the size of the function unfortunately. This is from alpine 3.21
Nice one, I will take a look. |
dumping another idea for validation:
|
Interesting. Does the hot function call the cold function as a normal function call or does it just jmp between the two? I guess it must be a jmp or this wouldn't be an issue? |
It jumps. Either direct/conditional jump or indirect through switch/case jump table. |
merging open-telemetry#414 Squashed commit of the following: commit 43b26dc Author: Tolya Korniltsev <[email protected]> Date: Mon Mar 24 11:18:47 2025 +0700 revert debug-file-lookup changes commit 71ce508 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 18:41:36 2025 +0700 update arm kernel blobs commit 1346f70 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 18:36:21 2025 +0700 Lint commit 85861b1 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 16:16:16 2025 +0700 fix(python): handle cold interpreter func chunks - handle cold interpreter func chunks - add coredump alpine320, alpine320-nobuildid
Well, that escalated quickly. I found the following images with cold function chunks
I will mark the PR as a draft as I don't believe this should be merged in this form. |
I propose to split this PR in two pieces.
I will close this and submit a separate PR's for the above. |
Squashed commit of the following: commit 69d4ab0e79a6073b7e67ef3e0fa732647f928c1f Author: Tolya Korniltsev <[email protected]> Date: Tue Mar 25 13:13:01 2025 +0700 add more alpines commit 43b26dc Author: Tolya Korniltsev <[email protected]> Date: Mon Mar 24 11:18:47 2025 +0700 revert debug-file-lookup changes commit 71ce508 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 18:41:36 2025 +0700 update arm kernel blobs commit 1346f70 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 18:36:21 2025 +0700 Lint commit 85861b1 Author: Tolya Korniltsev <[email protected]> Date: Fri Mar 21 16:16:16 2025 +0700 fix(python): handle cold interpreter func chunks - handle cold interpreter func chunks - add coredump alpine320, alpine320-nobuildid # Conflicts: # interpreter/loaderinfo.go
cherry-pick open-telemetry#414 but without hardcoded cold ranges
cherry-pick open-telemetry#414 but without hardcoded cold ranges
alpine:3.20 python3.12 has a separate
_PyEval_EvalFrameDefault
_PyEval_EvalFrameDefault.cold
readelf -s -W /usr/lib/debug/usr/lib/libpython3.12.so.1.0.debug | grep EvalFrameDef 2143: 0000000000171780 362 FUNC LOCAL DEFAULT 9 _PyEval_EvalFrameDefault.localalias 2234: 0000000000088a9a 56543 FUNC LOCAL DEFAULT 9 _PyEval_EvalFrameDefault.cold 43791: 0000000000171780 362 FUNC GLOBAL DEFAULT 9 _PyEval_EvalFrameDefault
Therefore ebpf code only assumes the interpreter function is only 362 bytes long and fails to select python unwinder as the meat of the interpreter is in the cold one.
In this PR I fix this issue in two ways.
Debug file lookup (at least gives someone an opportunity to install a debug package and allow profiling). The downside is that we run this check even for interpreters who has no cold functions. Maybe we can come up with some heuristic to not waste resources.Edit: I've reverted the change to keep the PR smaller and not introduce unnecesary resource consumption. I've created a tracking issue python: handle cold interpreter func ranges #416the debug file lookupEDIT: the second testalpine320-nobuildid.json
is skipped for now)