Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): handle cold interpreter func ranges #414

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

korniltsev
Copy link
Contributor

alpine:3.20 python3.12 has a separate _PyEval_EvalFrameDefault _PyEval_EvalFrameDefault.cold

readelf -s -W /usr/lib/debug/usr/lib/libpython3.12.so.1.0.debug | grep EvalFrameDef
  2143: 0000000000171780   362 FUNC    LOCAL  DEFAULT    9 _PyEval_EvalFrameDefault.localalias
  2234: 0000000000088a9a 56543 FUNC    LOCAL  DEFAULT    9 _PyEval_EvalFrameDefault.cold
 43791: 0000000000171780   362 FUNC    GLOBAL DEFAULT    9 _PyEval_EvalFrameDefault

Therefore ebpf code only assumes the interpreter function is only 362 bytes long and fails to select python unwinder as the meat of the interpreter is in the cold one.

In this PR I fix this issue in two ways.

  1. A band-aid table for known binaries with a separate cold function.
  2. Debug file lookup (at least gives someone an opportunity to install a debug package and allow profiling). The downside is that we run this check even for interpreters who has no cold functions. Maybe we can come up with some heuristic to not waste resources.
  3. Update the kernel code to handle cases when an interpreter has two chunks
  4. Add coredumps for both cases (the table, the debug file lookup)

- handle cold interpreter func chunks
- add coredump alpine320, alpine320-nobuildid
@korniltsev
Copy link
Contributor Author

korniltsev commented Mar 21, 2025

I don't really like the debug-file symbol lookup. I'm thinking to rollback this change (and keep only the hardcoded table for now) until we find a better fix.
Curious what others think.

@korniltsev korniltsev marked this pull request as ready for review March 21, 2025 15:56
@korniltsev korniltsev requested review from a team as code owners March 21, 2025 15:56
@gnurizen
Copy link
Contributor

gnurizen commented Mar 21, 2025

Hacky but I solved a similar problem for luajit by using the size of the function by looking at the stack deltas.

https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/luajit.go#L173

So maybe if the size of the function is less than some cut off (ie 1000) decide we have a .cold situation and walk the stack deltas and assume the largest function in the binary is the rest of the interpreter.

The 3 largest functions in python:

  1024: 0000000000353e30  9730 FUNC    GLOBAL DEFAULT   11 PyUnicode_Format
  1974: 000000000013a640 12000 FUNC    GLOBAL DEFAULT   11 _PyUnicode_ToNumeric
  1799: 0000000000108f90 66171 FUNC    GLOBAL DEFAULT   11 _PyEval_EvalFrameDefault

So seems like there's a large margin for error.

@korniltsev
Copy link
Contributor Author

I don't think we can use the size of the function unfortunately. This is from alpine 3.21

  2270: 0000000000089228 13785 FUNC    LOCAL  DEFAULT    9 _PyEval_EvalFrameDefault.cold
 43735: 0000000000167f60 45799 FUNC    GLOBAL DEFAULT    9 _PyEval_EvalFrameDefault

Hacky but I solved a similar problem for luajit by using the size of the function by looking at the stack deltas.

Nice one, I will take a look.

@korniltsev
Copy link
Contributor Author

korniltsev commented Mar 22, 2025

dumping another idea for validation:
maybe we can inspect contents of sysconfig.get_config_vars() to decide if we need searching for the cold range (not exactly sure what flags to search for, maybe -fprofile-use ?)

"OPT": "-DNDEBUG -g -O2 -Wall",
  "CONFIG_ARGS": "'--enable-shared' '--prefix=/usr' '--libdir=/usr/lib/x86_64-linux-gnu' '--enable-ipv6' '--enable-loadable-sqlite-extensions' '--with-dbmliborder=bdb:gdbm' '--with-computed-gotos' '--without-ensurepip' '--with-system-expat' '--with-dtrace' '--with-ssl-default-suites=openssl' '--with-wheel-pkg-dir=/usr/share/python-wheels/' 'MKDIR_P=/bin/mkdir -p' 'CC=x86_64-linux-gnu-gcc'",
"LDSHARED": "x86_64-linux-gnu-gcc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions  -Wl,-z,relro -g -fwrapv -O2   ",
  "LDVERSION": "3.12",
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants