Can't Free GPU Memory in SGLang? #16231
jamesdborin
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I have been working on use cases where I might want to delete weights or move them to the CPU after they are initialized (use cases like pruning or offloading for example).
However I find that I cannot do this as I expect.
For example if I delete all of the expert weights from a Qwen3-30B-A3B with a snippet like this in
load_weightsinloader.py:I find that the available memory doesn't change at all:
Usually this is because there is another reference to the tensor somewhere, but I can't find where that reference might be.
Any suggestions would be really appreciated!
Beta Was this translation helpful? Give feedback.
All reactions