Serve multiple LoRA adapters on top of one base model.
Small multi-tenant products often need many task- or customer-specific adapters without loading many full models.
- load and unload LoRA adapters
- request-level adapter selection
- simple adapter cache
- adapter metrics
- large-scale adapter paging
- distributed adapter placement
- training adapters
- A request can select an adapter.
- Multiple adapters can be loaded over one base model.
- Adapter cache behavior is visible.
- Choose LoRA integration library.
- Add adapter registry.
- Add adapter load and unload.
- Add request adapter option.
- Add simple adapter cache.
- Add demo with two adapters.
- Document LoRA serving use cases.