Describe the bug
When using llama-cpp-2 (v0.1.138) on Windows with the MSVC toolchain (x86_64-pc-windows-msvc), loading any GGUF model larger than 4GB (e.g., a 4.5GB 7B model) fails immediately with:
gguf_init_from_file_impl: failed to read magic
llama_model_load_from_file_impl: failed to load model
Steps to Reproduce
- Environment: Windows 11, Rust
x86_64-pc-windows-msvc (64-bit).
- Model: Any GGUF file > 4GB (e.g.,
qwen2.5-coder-7b-instruct-q4_k_m.gguf which is ~4.5GB).
- Try to load it using
LlamaModel::load_from_file(&backend, &model_path, &model_params).
- It fails.
- Models < 2GB (e.g., 1.5B models) load perfectly fine with the exact same code and environment.
- Crucial baseline: The exact same 4.5GB physical file loads perfectly in
< 1 second using llama-cpp-python on the exact same machine.
Attempts to workaround
I suspected a Windows mmap 4GB boundary issue, so I tried bypassing the Rust API to forcefully disable mmap using unsafe:
unsafe {
let raw_ptr = &mut model_params as *mut _ as *mut llama_cpp_sys_2::llama_model_params;
(*raw_ptr).use_mmap = false;
}
However, the exact same failed to read magic error persists even with standard I/O forced.
Root Cause Suspicion
Since Python (llama-cpp-python) handles this flawlessly and 1.5B models work in Rust, it strongly implies a 32-bit integer truncation issue within the llama-cpp-sys-2 build process on Windows MSVC. In MSVC, the C long type is 32-bit. It is highly likely that the build.rs or CMake configuration is missing large-file support macros (like _FILE_OFFSET_BITS=64 equivalent for MSVC), causing file pointers or mmap offsets to overflow/truncate when addressing files larger than 4GB.
Expected behavior
Models > 4GB should load on Windows MSVC Rust exactly as they do in Python or Linux.
Describe the bug
When using
llama-cpp-2(v0.1.138) on Windows with the MSVC toolchain (x86_64-pc-windows-msvc), loading any GGUF model larger than 4GB (e.g., a 4.5GB 7B model) fails immediately with:gguf_init_from_file_impl: failed to read magicllama_model_load_from_file_impl: failed to load modelSteps to Reproduce
x86_64-pc-windows-msvc(64-bit).qwen2.5-coder-7b-instruct-q4_k_m.ggufwhich is ~4.5GB).LlamaModel::load_from_file(&backend, &model_path, &model_params).< 1second usingllama-cpp-pythonon the exact same machine.Attempts to workaround
I suspected a Windows
mmap4GB boundary issue, so I tried bypassing the Rust API to forcefully disablemmapusingunsafe:However, the exact same
failed to read magicerror persists even with standard I/O forced.Root Cause Suspicion
Since Python (
llama-cpp-python) handles this flawlessly and 1.5B models work in Rust, it strongly implies a 32-bit integer truncation issue within thellama-cpp-sys-2build process on Windows MSVC. In MSVC, the Clongtype is 32-bit. It is highly likely that thebuild.rsor CMake configuration is missing large-file support macros (like_FILE_OFFSET_BITS=64equivalent for MSVC), causing file pointers ormmapoffsets to overflow/truncate when addressing files larger than 4GB.Expected behavior
Models > 4GB should load on Windows MSVC Rust exactly as they do in Python or Linux.