Skip to content

[Feature] Dynamic LoRA Loading for diffusion models #2006

@Suyi32

Description

@Suyi32

🚀 Feature Description and Motivation

I would like to propose support for dynamic LoRA loading for diffusion models in AIBrix.

This feature would enable AIBrix to load and switch LoRA adapters at runtime for diffusion workloads, improving flexibility and making LoRA serving more practical and efficient.

Motivation

LoRA-based adaptation is widely used for diffusion models to support customization with relatively small adapter weights. In multi-tenant or multi-task serving scenarios, dynamically loading LoRAs instead of statically baking them into deployments could provide important benefits such as:

  • Better resource utilization
  • Faster adaptation to different requests
  • Reduced need to deploy separate model instances per LoRA

Dynamic LoRA loading seems like a strong fit for AIBrix’s goals around efficient model serving, especially if designed in a way that aligns with its existing architecture and model lifecycle patterns.

Use Case

This feature would make AIBrix more capable for diffusion serving scenarios where many LoRA adapters need to be served efficiently on shared infrastructure.

Proposed Solution

Task break down

  1. Profile the benefits of multi-LoRA serving
  • Measure performance tradeoffs
  • Compare static LoRA deployment vs dynamic loading
  1. Introduce a mechanism to load/unload or activate LoRA adapters at runtime
  • Ensure compatibility with AIBrix’s existing model management and serving flow
  • Define how LoRA artifacts are registered, cached, selected, and evicted
  • Consider latency overhead, memory pressure, and isolation between requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions