-
Notifications
You must be signed in to change notification settings - Fork 148
Revert to jemalloc allocator #3954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR switches the default memory allocator from mimalloc to jemalloc across all binaries and simplifies heap profiling by making it purely runtime-controlled via environment variables. The change allows production systems to generate memory dumps without recompilation or separate profiling builds.
Key changes:
- Jemalloc is now the default allocator (previously mimalloc), with built-in heap profiling support
- Mimalloc can be optionally enabled via
--features mimalloc-allocatorfor cases where heap profiling is not needed - Heap profiling is controlled entirely at runtime via
MALLOC_CONFenvironment variable instead of compile-time features
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/*/src/main.rs | Swapped allocator configuration: jemalloc is now default, mimalloc is feature-gated |
| crates/*/Cargo.toml | Updated dependencies: tikv-jemallocator is now required, mimalloc is optional; feature renamed from jemalloc-profiling to mimalloc-allocator |
| crates/observe/src/heap_dump_handler.rs | Simplified conditional compilation to always compile on unix; added documentation about runtime profiling control |
| crates/observe/src/lib.rs | Removed feature gate from heap_dump_handler module |
| crates/observe/Cargo.toml | Made jemalloc_pprof a required dependency; removed jemalloc-profiling feature |
| crates/*/src/{run.rs,lib.rs} | Updated heap dump handler spawn calls to remove jemalloc-profiling feature gate |
| README.md | Updated heap profiling documentation to reflect runtime-only configuration |
| .github/workflows/deploy.yaml | Added optional features input parameter to support building with different allocators |
| .github/workflows/deploy-profiling.yaml | Removed specialized profiling workflow (no longer needed) |
| Dockerfile | Simplified: build-essential now always installed; removed conditional jemalloc installation logic |
| Cargo.lock | Updated to reflect new dependency structure with jemalloc_pprof only in observe crate |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MartinquaXD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess using jemalloc for 2 weeks in prod should be sufficient to remove mimalloc entirely, right?
| "jemalloc profiling not available - heap dump handler not started. Ensure service is \ | ||
| built with jemalloc-profiling feature and MALLOC_CONF is set." | ||
| ); | ||
| // Profiling is disabled - do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not merge blocking but long term it would be nice if we could always spawn the dump handler and also support enabling profiling while the process is already running. This is possible with jemalloc but I'm not sure if just updating the env variable after starting the process would be sufficient for that.
What I have in mind is only measuring the memory growth without having to manually diff to memory dumps. In other words:
- start pod
- wait 1h to let caches fill to a reasonable degree
- enable profiling
- wait 24h
- collect memory dump
- analyze only the memory allocated during the 24h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it doesn't work well with delayed profiling(tikv/jemallocator#140). Probably, if the profiler missed the root allocation, it is not possible to record detached children, I don't know.
Also, this overcomplicates the solution, where the unix socket connection would need to accept more commands, etc. Anyway, I can try to do that in the future. But since the enabled profiling doesn't really affect the performance, this is not so critical.
| inputs: | ||
| features: | ||
| description: 'Optional cargo features (currently supported: "mimalloc-allocator")' | ||
| required: false | ||
| default: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! This can be made even more user friendly by using type: choice and providing a list of options (see here). That way you don't even have to know the exact feature names. (obviously runs the risk of having the action out of sync with the actually supported features).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about a list, but that would be more difficult to test new features since, in that case, you would need to provide custom inputs. AFAIK, GH doesn’t support a hybrid "preset options + custom text" input. A choice input only accepts values listed in options.
Description
This PR makes jemalloc the default memory allocator and simplifies heap profiling configuration by making it purely runtime-controlled via the
MALLOC_CONFenvironment variable. The reason for switching back to jemalloc is that with a tuned config, it performs better than mimalloc and also allows creating memory dumps without redeployments.Changes
jemalloc-profilingfeature).--features mimalloc-allocator(was default).MALLOC_CONF=prof:true.github/workflows/deploy-profiling.yaml(no longer needed).github/workflows/deploy.yamlto support optionalfeaturesparametermimalloc-allocatorHow to test
Tested on staging and prod. No issues observed. For some reason, on mainnet-shadow, when active profiling is enabled, liquidity fetching performance degrades, unlike in other environments (mainnet-prod, base-prod).