-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
Description
Current implementation of Ray Data LLM has begun to diverge in terms of supported models with vLLM. There are models that work with vllm that breaks when we try to use it with Ray Data LLM. This is caused by the dependency of Ray Data LLM on the transformers library for loading in model config/tokenizer/etc. Newer model architectures like GLM-4.7-Flash (glm4_moe_lite) are not supported by the required version of transformers by vLLM (<5.0.0), yet this model architecture only exists in a newer version of transformers (5.1.0). The same can be said about DeepSeek v3.2 (#60056).
These are examples that work with vLLM serve, and even Ray serve that break with Ray data LLM.
Question: Is it possible to remove dependency on transformers and rely on purely on vLLM?
Use case
This syncs up supported models by Ray and vLLM without users needing to worry about whether the model actually works on Ray, currently not all models supported by vLLM will run on Ray and that becomes a hindrance.