[Feature Request] Respect tensor_split for GPU visibility in Docker + NVIDIA passthrough
#9623
okcodemaybe
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📌 Summary
When running LocalAI in a Docker container with NVIDIA GPU passthrough, the inference backend (e.g.,
llama.cpp) automatically initializes across all visible GPUs in the container, even whentensor_splitis explicitly configured to partition the model across only specific devices. This leads to unnecessary GPU context creation, VRAM reservation, and potential instability in multi-GPU container deployments.🔍 Current Behavior
--gpus allor equivalent NVIDIA passthrough.tensor_split: "0,1"(or similar) to target specific GPUs.cuda/metal/vulkancontexts on every available GPU in the container, regardless of thetensor_splitdirective.✅ Expected Behavior
The backend should only initialize and expose GPUs that are explicitly referenced in the
tensor_splitconfiguration. For example:tensor_split: "0,1"→ Only GPUs 0 and 1 are initialized.tensor_split: "0"→ Only GPU 0 is initialized.tensor_splitis omitted or set to auto, fall back to current behavior (use all available GPUs).🛠 Technical Context
nvidia-container-toolkit/--gpusflagllama.cpp(or GGUF-compatible inference engine)tensor_split: "0,1"💡 Why This Matters
tensor_splitshould act as both a partitioning directive and a visibility filter.🔧 Suggested Implementation
tensor_splitearly in the backend initialization phase.CUDA_VISIBLE_DEVICES(or equivalent) based on the referenced GPU indices before loading the inference engine.visible_gpus: "0,1"that overrides auto-detection while keepingtensor_splitfor weight distribution.INFO: tensor_split targets GPUs [0,1]. Restricting CUDA_VISIBLE_DEVICES accordingly.🔄 Current Workaround
Manually set
CUDA_VISIBLE_DEVICESin the Docker run command or environment file:This works but is not per-model and requires manual orchestration, defeating the purpose of declarative
.yamlconfiguration.Beta Was this translation helpful? Give feedback.
All reactions