Linux - Use tcmalloc for memory management for slight speed boost and much, MUCH lower CPU memory leakage #6722

Glen-O · 2023-01-14T10:11:10Z

Glen-O
Jan 14, 2023

So, I use webui on XUbuntu, on a system with a Nvidia Optimus graphics card with only 2GB on it. As such, I need to run it on "lowvram" mode.

What frustrated me was an inconsistent pattern of memory leakage - each time I would generate an image, it would use up more memory, until my memory was full. And attempts at profiling to find where the memory was leaking failed repeatedly - none of the tools could see the excessively used memory (while the program was using more than 9 GB of memory, the profilers would only see about 250 MB of data).

While trying to figure out what was going on, I looked into optimisations, in the hopes it would give me ideas, and I came across the idea of using a different malloc to handle memory allocations.

And when I used tcmalloc... amazingly, the memory leak was gone. Not only that, but I got a maybe 2% speed boost in the process. Now I can run multiple batches, and don't need to worry about running out of memory!

To do it, you need to have the appropriate library installed - on Ubuntu 22.10, it's in libgoogle-perftools-dev (although it also seems to work pretty well with libtcmalloc-minimal4)...

Then, as an environment variable, add LD_PRELOAD=libtcmalloc.so.4 (you may want to confirm the "4" at the end, I don't know if it varies from system to system). If you went with the minimal one, it'll be libtcmalloc_minimal rather than just libtcmalloc. You can do this on the command line as usual, or you can edit webui-user.sh to add export LD_PRELOAD=libtcmalloc.so.4

This was such a substantial improvement, I'm considering putting it forward as a Feature Request Issue, to be added to webui-user.sh (commented out, ready to be uncommented) and mentioned in the wiki as an optimisation.

uservar · 2023-02-11T15:18:06Z

uservar
Feb 11, 2023

This is very helpful to use on google colab as well, good find 👍

1 reply

WyrmSpear May 22, 2023

I'm getting error with mono-dev with this attempt to install this lib.

valden80 · 2023-02-15T16:27:47Z

valden80
Feb 15, 2023

True magical! This is completely solution for me too, as I experienced similar issue here: #7479
After testing it for more then 2 hours and many generations, now memory usage look like a constant. My deep gratitude for this!

0 replies

vladmandic · 2023-02-15T18:04:01Z

vladmandic
Feb 15, 2023
Collaborator

very cool

few comments:

best practice on linux is to link versioned lib to generic one: ln -s libtcmalloc.so.4 libtcmalloc.so and then use libtcmalloc.so
don't use export LD_PRELOAD as that will impact every binary for that user and who knows how the rest of the ecosystem likes that. just set variable as local in your webui startup script.

0 replies

mcmonkey4eva · 2023-02-16T03:14:47Z

mcmonkey4eva
Feb 16, 2023

sudo apt install libgoogle-perftools-dev
then export LD_PRELOAD=libtcmalloc.so in webui-user
Does indeed work. The .4 isn't needed it's loaded.
My memory leak is solved! Awesome!

also @vladmandic I believe you've gotten confused with your comment on export - export only affects current context (not global), so putting it in the user.sh only affects the webui run.

0 replies

gsgoldma · 2023-02-16T05:04:21Z

gsgoldma
Feb 16, 2023

what would an equivalent command in windows be?

1 reply

valden80 Feb 16, 2023

No equivalent, this is Linux techno-magic only. :)

vladmandic · 2023-02-16T13:12:20Z

vladmandic
Feb 16, 2023
Collaborator

@mcmonkey4eva export affects current context and all child processes. so if its set in shell before you run webui, any other processes you start before or after webui will have the same effect. But yeah, if that session is used only to start webui, then no issues.

@gsgoldma Windows works completely differently, there is no equivalent.

1 reply

valden80 Feb 16, 2023

Also, manual "export" will affect only one shell - where it exported. If after it will be opened another terminal, it's will be not affected. As It will be another branch on process tree. So it's mostly safe to do too...

pamposzek · 2023-03-08T08:14:46Z

pamposzek
Mar 8, 2023

Thank you, looks like it worked for me. Running on WSL2, I had 8.5 GB taken after initial loading, then swapping to another model maxed out my memory, but swap I created only took 300 MB.. and after it loaded, memory usage was back to 8.5 GB. Earlier I couldn't even load a second model. Great fix!

0 replies

killacan · 2023-03-08T17:51:54Z

killacan
Mar 8, 2023

Edit: Fix did not work for me, and caused webui to not be able to run until I rebooted my computer. Are there any other ways to apply this fix?

0 replies

mykeehu · 2023-03-10T07:00:52Z

mykeehu
Mar 10, 2023

How can I use this on colab? I am not a Linux wizard, so I would be interested in specific commands. Thanks in advance!

0 replies

Threnos · 2023-03-15T21:25:50Z

Threnos
Mar 15, 2023

Wanted to set LD_PRELOAD when launching web-ui from terminal.
I was getting error "ERROR: ld.so: object 'libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.".

Turned out I needed to specify full path to the library, like this:
~/stable-diffusion-webui$ LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4" python launch.py

3 replies

vladmandic Mar 15, 2023
Collaborator

better way is to set LD_LIBRARY_PATH so Linux knows where to search for libs.
And "correct" way is to have correct entries in /etc/ld.so.conf and let ldconfig deal with maps.

Threnos Mar 15, 2023

Will my way affect anything negatively? I'd like to see this env var at least for some time to memorize what I did to fix the ram leak issue

vladmandic Mar 16, 2023
Collaborator

nope, just mentioning best practices.

Technologicat · 2023-04-13T20:55:55Z

Technologicat
Apr 13, 2023

Had this same RAM leak issue running with --medvram. The same solution fixed it. Thanks!

I use a custom startup wrapper script for Automatic1111 to always enable some command-line options. I think this is cleaner than editing one of the provided scripts.

Added export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so before the actual startup command. No more RAM leak.

For others struggling with this: after you have installed libgoogle-perftools-dev, to see the full path to the installed lib, use dpkg -L libgoogle-perftools-dev | grep libtcmalloc.so.

2 replies

mcmonkey4eva Apr 13, 2023

In theory you shouldn't need to do that extra step of getting the full path, it should just add into your default system path and work itself out.

Soulreaver90 Apr 13, 2023

True but no harm in being extra safe. I'm loving these finds! Just wish the bulk of these weren't hidden in discussion boards and such. I know Auto doesn't want to change the default environment but could atleast include short documentation with optional flags such as these. I currently use PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:128 and noticed my memory allocation improved significantly, had to google that to find it. Who know what other tips are floating out there that we haven't heard of.

vladmandic · 2023-04-13T22:37:53Z

vladmandic
Apr 13, 2023
Collaborator

well, here's couple of mine...

    os.environ.setdefault('TF_CPP_MIN_LOG_LEVEL', '2')
    os.environ.setdefault('ACCELERATE', 'True')
    os.environ.setdefault('FORCE_CUDA', '1')
    os.environ.setdefault('ATTN_PRECISION', 'fp16')
    os.environ.setdefault('PYTORCH_CUDA_ALLOC_CONF', 'garbage_collection_threshold:0.9,max_split_size_mb:512')
    os.environ.setdefault('CUDA_LAUNCH_BLOCKING', '0')
    os.environ.setdefault('CUDA_CACHE_DISABLE', '0')
    os.environ.setdefault('CUDA_AUTO_BOOST', '1')
    os.environ.setdefault('CUDA_MODULE_LOADING', 'LAZY')
    os.environ.setdefault('CUDA_DEVICE_DEFAULT_PERSISTING_L2_CACHE_PERCENTAGE_LIMIT', '0')
    os.environ.setdefault('GRADIO_ANALYTICS_ENABLED', 'False')
    os.environ.setdefault('SAFETENSORS_FAST_GPU', '1')
    os.environ.setdefault('NUMEXPR_MAX_THREADS', '16')

3 replies

chewtoys Apr 13, 2023

where do you put this?

vladmandic Apr 13, 2023
Collaborator

i set it in launch.py, but you can also convert it to standard set statements and set in webui.bat or webui.sh. or just set in interactive shell.

Milor123 Jul 9, 2023

Can be it useful for ROCm users?
What settings should I use in my rx6700 xt?

hochl · 2023-06-13T09:07:56Z

hochl
Jun 13, 2023

I am using Debian testing and I installed the libtcmalloc_minimal package. I patched my webui.sh file like this to make it preload the library:

diff --git a/webui.sh b/webui.sh
index ab52ac3b..ff83ea59 100755
--- a/webui.sh
+++ b/webui.sh
@@ -4,6 +4,8 @@
 # change the variables in webui-user.sh instead #
 #################################################
 
+export PATH=/sbin:$PATH
+
 # If run from macOS, load defaults from webui-macos-env.sh
 if [[ "$OSTYPE" == "darwin"* ]]; then
     if [[ -f webui-macos-env.sh ]]
@@ -190,7 +192,7 @@ fi
 # Try using TCMalloc on Linux
 prepare_tcmalloc() {
     if [[ "${OSTYPE}" == "linux"* ]] && [[ -z "${NO_TCMALLOC}" ]] && [[ -z "${LD_PRELOAD}" ]]; then
-        TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)"
+        TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc_minimal.so.\d" | head -n 1)"
         if [[ ! -z "${TCMALLOC}" ]]; then
             echo "Using TCMalloc: ${TCMALLOC}"
             export LD_PRELOAD="${TCMALLOC}"

HTH

3 replies

vladmandic Jun 13, 2023
Collaborator

do not edit webui.sh unless you're a dev. this was covered in previous qa entries, always create your own. for example, webui-user.sh that sets TCMALLOC and then calls original (unmodified) webui.sh.

hochl Jun 20, 2023

I cannot see how setting this option in webui-user.sh could modify the line TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)" for a different libtcmalloc library. Please, could you explain how I can do that in webui-user.sh?

LoRAMilkshake May 4, 2024

could you explain how to set TCMALLOC on webui-user.sh then? it's not obvious how to to do it

morenabeltran · 2023-07-20T02:11:15Z

morenabeltran
Jul 20, 2023

I had found a fix that used this on an automatic1111 notebook, it's similar to what's posted here. Been using it via google colab for weeks just fine until today. The code
!curl -Lo memfix.zip https://github.com/nolanaatama/microsoftexcel/raw/main/memfix.zip
!unzip /content/memfix.zip
!apt install -qq libunwind8-dev
!apt install -qq libcairo2-dev pkg-config python3-dev
!dpkg -i *.deb
%env LD_PRELOAD=libtcmalloc.so
!rm *

Today, for whatever reason, it just causes the google colab cell to just stop before showing the URLs.
The last output before it stops:

Launching Web UI with arguments: --share --disable-safe-unpickle --no-half-vae --xformers --enable-insecure-extension --gradio-queue
src/tcmalloc.cc:283] Attempt to free invalid pointer 0x59a229ce3740

Commenting out the memfix code lines gets the WebUI to run, but with the memory issue. Any idea of what went wrong?

SD version: 1.4.0 • python: 3.10.6 • torch: 2.0.1+cu118 • xformers: 0.0.20 • gradio: 3.32.0

2 replies

morenabeltran Jul 20, 2023

I have solved your problem, you don't need to comment out the code, it's just a question of acccess

What does that mean?
Is it Google Colab that's just not letting this work anymore? nolanaatama seems to have given up on the memory fix, for what that's worth.

EDIT: Nevermind, I found something that's at least a temporary fix. Quoting in case it doesn't load:

Google has an Ubuntu update on its system, and the Notebook should be changed to reflect that. Workaround is to go to Tools->Command Platte and activate the fallback option "Use fallback runtime version"

You need to connect the runtime first, and then it will appear there.

morenabeltran Jul 21, 2023

Get the permission with the sudo su command, or you can run it with (!sudo python launch.py), The error is because of the lack of authority

I tried !sudo and that got me this error:
/bin/bash: line 1: !sudo: command not found

Without the !, I got this:

Traceback (most recent call last):
File "/content/microsoftexcel/launch.py", line 38, in <module>
main()
File "/content/microsoftexcel/launch.py", line 29, in main
prepare_environment()
File "/content/microsoftexcel/modules/launch_utils.py", line 268, in prepare_environment
raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

Adding that to commandline_args did nothing.

With sudo su, meaning !COMMANDLINE_ARGS="--share --disable-safe-unpickle --no-half-vae --xformers --enable-insecure-extension --gradio-queue" REQS_FILE="requirements.txt" sudo su python launch.py I get this:

su: user python does not exist or the user entry does not contain all the required fields

What does your !COMMANDLINE_ARGS= line look like?

deweydbdecibel · 2023-11-04T08:01:30Z

deweydbdecibel
Nov 4, 2023

So, I've read and re-read all this and I have tryed multiple different ways of getting this right and making A1111 work correctly. I always installed 'libgoogle-perftools-dev" without issues and A1111 always worked like a charm. I decided to dual boot linux (ubuntu & Mint/edge) and now I see to get the following..

ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Gtk-Message: 02:50:13.004: Failed to load module "xapp-gtk3-module"
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Gtk-Message: 02:50:13.447: Failed to load module "xapp-gtk3-module"

I'm not educated with linux like I wish I was, however, I am learning at a rate that makes this frustrating to figure out. A1111 still works and all, but my mind will not let this go. Like I said, this has always worked without fail before. And since Mint/edge is basically ubuntu "Jammy", I figured everything would fall in line. If I try to down grade xapp at all, it wants to take Cinnamon with it. What am I missing here? I've tryed adjusting /etc/ld.so.conf and that doesn't seem to work. I've added lined to webui-user.sh and no dice there either. Would love some guidance as I know I'm missing something and google searching and trying different things has gotten me multiple reinstalls of mint. lol Learning is full and all, but i figured it was time to ask for help.
@vladmandic - as you are a creator and extremely knowledged in the things you have helped with and continue to educate in... Any advice would be fantastic! I'm pullin my hair out over here. I'm aware that not everything is considered an error and can be ignored. But I feel like this can be resolved...I'm just not seeing it, even though the answer is right in front of me. ANY help from ANYONE would be amazing. Thanks in advance!!

DeweyDecibel

0 replies

vladmandic · 2023-11-04T12:18:13Z

vladmandic
Nov 4, 2023
Collaborator

that could be anything.

first, make sure that libtcmalloc is set as env variable ONLY for sd. its NOT compatible to be a system wide allocator, so if you set it in .bashrc or something like that, it will result in exact errors you're seeing.
so for example, this is sufficient:

LD_PRELOAD=libtcmalloc.so; ./webui.sh

second, as with any system lib, my recommendation remains to make sure its resolvable by system and avoid using absolute paths.
which means

modifying /etc/ld.so.conf or /etc/ld.so.conf.d
running ldconfig to refresh cache
running ldconfig -p to print cache and verify its actually there and there are no errors

1 reply

deweydbdecibel Nov 4, 2023

While i was newbin around my OS, i must have added this to my bashrc.
"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:usr/lib/x86_64-linux-gnu/"
Learning linux hurts my brain. In the last 2 weeks, I've gone from Jammy, to Arch Linux, to Debian, KDE, and now to Linux Mint. I have been trying to figure out the best OS to run SD on and in the process try and teach myself a thing or 2. I'm 43 and Windows has been my friend... But Linux and coding, albeit fun to learn...is really kickin my @$$.

I'll make a few adjustments and if I have anything more to add, I'll be sure to ask as I'm kinda tired of the linux round robin game i've been playing. As of right now, A1111 is working minus a few "errors". 6700 XT & 10100 cpu, I'm able to do 6-7it/s. As compared to 2s/it on windows. Thank you for the quick response and much respect to you and what you have accomplished and continue to accomplish.

GeLi1989 · 2024-04-21T09:48:05Z

GeLi1989
Apr 21, 2024

I have configured and used it according to the above method, and it can run normally.
"
sudo apt update
apt install libgoogle-perftools-dev

export LD_PRELOAD=libtcmalloc.so
"

But when I use dwpose , an error occurs and the SD service is interrupted

I found that as long as the ". onnx " file is used, there will be a service interruption

log:
DWPose: Using yolo_nas_s_fp16.onnx for bbox detection and dw-ll_ucoco_384.onnx for pose estimation
DWPose: Caching ONNXRuntime session yolo_nas_s_fp16.onnx...
double free or corruption (out)
Aborted (core dumped)

How should I solve this problem?

Thanks.

0 replies

Linux - Use tcmalloc for memory management for slight speed boost and much, MUCH lower CPU memory leakage #6722

Uh oh!

Replies: 17 comments · 23 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vladmandic Feb 15, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vladmandic Feb 16, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vladmandic Mar 15, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

vladmandic Mar 16, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vladmandic Apr 13, 2023 Collaborator

Uh oh!

Uh oh!

vladmandic Apr 13, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vladmandic Jun 13, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 17 comments 23 replies

vladmandic
Feb 15, 2023
Collaborator

vladmandic
Feb 16, 2023
Collaborator

vladmandic Mar 15, 2023
Collaborator

vladmandic Mar 16, 2023
Collaborator

vladmandic
Apr 13, 2023
Collaborator

vladmandic Apr 13, 2023
Collaborator

vladmandic Jun 13, 2023
Collaborator