-
Notifications
You must be signed in to change notification settings - Fork 537
Description
🚀 Feature Description and Motivation
I would like to propose support for dynamic LoRA loading for diffusion models in AIBrix.
This feature would enable AIBrix to load and switch LoRA adapters at runtime for diffusion workloads, improving flexibility and making LoRA serving more practical and efficient.
Motivation
LoRA-based adaptation is widely used for diffusion models to support customization with relatively small adapter weights. In multi-tenant or multi-task serving scenarios, dynamically loading LoRAs instead of statically baking them into deployments could provide important benefits such as:
- Better resource utilization
- Faster adaptation to different requests
- Reduced need to deploy separate model instances per LoRA
Dynamic LoRA loading seems like a strong fit for AIBrix’s goals around efficient model serving, especially if designed in a way that aligns with its existing architecture and model lifecycle patterns.
Use Case
This feature would make AIBrix more capable for diffusion serving scenarios where many LoRA adapters need to be served efficiently on shared infrastructure.
Proposed Solution
Task break down
- Profile the benefits of multi-LoRA serving
- Measure performance tradeoffs
- Compare static LoRA deployment vs dynamic loading
- Introduce a mechanism to load/unload or activate LoRA adapters at runtime
- Ensure compatibility with AIBrix’s existing model management and serving flow
- Define how LoRA artifacts are registered, cached, selected, and evicted
- Consider latency overhead, memory pressure, and isolation between requests