💥 GGUF support!
Thanks for both @diegovelilla and @vm7608, hf-mem now supports GGUF!
uvx hf-mem --model-id TheBloke/deepseek-llm-7B-chat-GGUF --gguf-file deepseek-llm-7b-chat.Q2_K.gguf --experimental
🐍 Use via Python!
Now you can use hf-mem programmatically with Python as:
from hf_mem import run
run(model_id="MiniMaxAI/MiniMax-M2", experimental=True)
# Result(model_id='MiniMaxAI/MiniMax-M2', revision='main', filename=None, memory=230121630720, kv_cache=24964497408, total_memory=255086128128, details=False)🐛 Fixes
- The table now displays the memory requirements as GiB instead of GB to be more accurate, thanks to @vrdn-23!
- The KV cache estimations on Safetensors now use a more accurate formula to handle properly the full and sliding attention, rather than always assuming full attention + use the
head_dimif specified instead of calculating it, thanks to https://huggingface.co/YouJiacheng in https://huggingface.co/Qwen/Qwen3.5-397B-A17B/discussions/20#69a5bf82a2b3b0f27e8eacef
uvx hf-mem --model-id Qwen/Qwen3.5-397B-A17B --experimental --kv-cache-dtype fp8
v0.4.4
|
v0.5.0
|
🚨 Deprecated!
--ignore-table-widthis deprecated and won't have any effect, in favour of always resizing the table to fit the content regardless of how wide it is.
What's Changed
- Fix display from GB to GiB for consistency by @vrdn-23 in #34
- Add
versiondisplay on table and JSON output by @alvarobartt in #36 - Add support for GGUF files + KV cache from GGUF metadata by @diegovelilla in #25
- (fix) Fetch referenced config on models that require it by @Napuh in #38
- Release
versionv0.5.0 + add--hf-token, usehf-memas lib, etc. by @alvarobartt in #39
New Contributors
- @vrdn-23 made their first contribution in #34
- @diegovelilla made their first contribution in #25
Full Changelog: 0.4.4...0.5.0

