You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve delete_model semantics for multi-tenancy - in the case that we are doing multi-tenant lora training, we should avoid unnecessary ray.shutdown() calls even when the number of active tenants goes to zero. However we should preserve the ray.shutdown behavior for the full parameter finetuning case.
Ensure that the case of an incoming request hitting an old proxy url in [tinker] Forward sample requests directly to backend vLLM (non-colocated) #1638 is not possible. We have to have some way of upserting new proxy urls for the full parameter fine tuning case, but we should make sure that updating logic is clean and we can never hit an old proxy url.
For the multi-tenant case if we can avoid unnecessary shutdowns, then we should only every have one proxy url upsert.
delete_modelsemantics for multi-tenancy - in the case that we are doing multi-tenant lora training, we should avoid unnecessary ray.shutdown() calls even when the number of active tenants goes to zero. However we should preserve the ray.shutdown behavior for the full parameter finetuning case.per discussion cc: @pcmoritz