- Limit the number of adapters that can be attach on one base model to avoid consumption of GPU memory (consider automatic adapter unloading and loading).
- Ensure that each created adapter can be unloaded promptly, even if no unload request is issued by the user.
Both SamplingBackend and TrainingBackend need to implement above functionality. In particular, VLLMSampllingbackend currently lacks a mechanism to remove the adatper parameters from vLLM ( #5 ).
Both
SamplingBackendandTrainingBackendneed to implement above functionality. In particular,VLLMSampllingbackendcurrently lacks a mechanism to remove the adatper parameters from vLLM ( #5 ).