Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 832 Bytes

File metadata and controls

38 lines (26 loc) · 832 Bytes

v1.1 LoRA Serving

Goal

Serve multiple LoRA adapters on top of one base model.

Why

Small multi-tenant products often need many task- or customer-specific adapters without loading many full models.

Scope

  • load and unload LoRA adapters
  • request-level adapter selection
  • simple adapter cache
  • adapter metrics

Out Of Scope

  • large-scale adapter paging
  • distributed adapter placement
  • training adapters

Acceptance Criteria

  • A request can select an adapter.
  • Multiple adapters can be loaded over one base model.
  • Adapter cache behavior is visible.

Progress

  • Choose LoRA integration library.
  • Add adapter registry.
  • Add adapter load and unload.
  • Add request adapter option.
  • Add simple adapter cache.
  • Add demo with two adapters.
  • Document LoRA serving use cases.