Skip to content

Conversation

@squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Dec 9, 2025

Description

This PR makes jemalloc the default memory allocator and simplifies heap profiling configuration by making it purely runtime-controlled via the MALLOC_CONF environment variable. The reason for switching back to jemalloc is that with a tuned config, it performs better than mimalloc and also allows creating memory dumps without redeployments.

Changes

  • Jemalloc is now the default allocator (was behind jemalloc-profiling feature).
  • Mimalloc is now optional via --features mimalloc-allocator (was default).
  • All binaries now use jemalloc with built-in heap profiling support by default.
  • Developers can optionally use mimalloc if heap profiling is not needed or there are problems with the jemalloc allocator.
  • Heap profiling is now controlled entirely at runtime via MALLOC_CONF=prof:true
  • Removed .github/workflows/deploy-profiling.yaml (no longer needed)
  • Updated .github/workflows/deploy.yaml to support optional features parameter
    • Currently supported: mimalloc-allocator
    • Ready for future features to be added (tokio-console will be another one)

How to test

Tested on staging and prod. No issues observed. For some reason, on mainnet-shadow, when active profiling is enabled, liquidity fetching performance degrades, unlike in other environments (mainnet-prod, base-prod).

@squadgazzz squadgazzz marked this pull request as ready for review December 9, 2025 11:15
@squadgazzz squadgazzz requested a review from a team as a code owner December 9, 2025 11:15
Copilot AI review requested due to automatic review settings December 9, 2025 11:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches the default memory allocator from mimalloc to jemalloc across all binaries and simplifies heap profiling by making it purely runtime-controlled via environment variables. The change allows production systems to generate memory dumps without recompilation or separate profiling builds.

Key changes:

  • Jemalloc is now the default allocator (previously mimalloc), with built-in heap profiling support
  • Mimalloc can be optionally enabled via --features mimalloc-allocator for cases where heap profiling is not needed
  • Heap profiling is controlled entirely at runtime via MALLOC_CONF environment variable instead of compile-time features

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/*/src/main.rs Swapped allocator configuration: jemalloc is now default, mimalloc is feature-gated
crates/*/Cargo.toml Updated dependencies: tikv-jemallocator is now required, mimalloc is optional; feature renamed from jemalloc-profiling to mimalloc-allocator
crates/observe/src/heap_dump_handler.rs Simplified conditional compilation to always compile on unix; added documentation about runtime profiling control
crates/observe/src/lib.rs Removed feature gate from heap_dump_handler module
crates/observe/Cargo.toml Made jemalloc_pprof a required dependency; removed jemalloc-profiling feature
crates/*/src/{run.rs,lib.rs} Updated heap dump handler spawn calls to remove jemalloc-profiling feature gate
README.md Updated heap profiling documentation to reflect runtime-only configuration
.github/workflows/deploy.yaml Added optional features input parameter to support building with different allocators
.github/workflows/deploy-profiling.yaml Removed specialized profiling workflow (no longer needed)
Dockerfile Simplified: build-essential now always installed; removed conditional jemalloc installation logic
Cargo.lock Updated to reflect new dependency structure with jemalloc_pprof only in observe crate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I guess using jemalloc for 2 weeks in prod should be sufficient to remove mimalloc entirely, right?

"jemalloc profiling not available - heap dump handler not started. Ensure service is \
built with jemalloc-profiling feature and MALLOC_CONF is set."
);
// Profiling is disabled - do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not merge blocking but long term it would be nice if we could always spawn the dump handler and also support enabling profiling while the process is already running. This is possible with jemalloc but I'm not sure if just updating the env variable after starting the process would be sufficient for that.
What I have in mind is only measuring the memory growth without having to manually diff to memory dumps. In other words:

  1. start pod
  2. wait 1h to let caches fill to a reasonable degree
  3. enable profiling
  4. wait 24h
  5. collect memory dump
  6. analyze only the memory allocated during the 24h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it doesn't work well with delayed profiling(tikv/jemallocator#140). Probably, if the profiler missed the root allocation, it is not possible to record detached children, I don't know.

Also, this overcomplicates the solution, where the unix socket connection would need to accept more commands, etc. Anyway, I can try to do that in the future. But since the enabled profiling doesn't really affect the performance, this is not so critical.

Comment on lines +7 to +11
inputs:
features:
description: 'Optional cargo features (currently supported: "mimalloc-allocator")'
required: false
default: ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This can be made even more user friendly by using type: choice and providing a list of options (see here). That way you don't even have to know the exact feature names. (obviously runs the risk of having the action out of sync with the actually supported features).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about a list, but that would be more difficult to test new features since, in that case, you would need to provide custom inputs. AFAIK, GH doesn’t support a hybrid "preset options + custom text" input. A choice input only accepts values listed in options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants