@@ -8,18 +8,21 @@ Overall, this microservice offers a streamlined way to integrate large language
88
99## Validated LLM Models
1010
11- | Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | OVMS |
12- | ------------------------------------------- | --------- | -------- | ---------- | -------- |
13- | [ Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ | ✓ |
14- | [ meta-llama/Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ | ✓ |
15- | [ meta-llama/Llama-2-70b-chat-hf] | ✓ | - | ✓ | - |
16- | [ meta-llama/Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ | ✓ |
17- | [ meta-llama/Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ | - |
18- | [ Phi-3] | x | Limit 4K | Limit 4K | Limit 4K |
19- | [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓ | - | ✓ | - |
20- | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | ✓ | - | ✓ | - |
21- | [ mistralai/Mistral-Small-24B-Instruct-2501] | ✓ | - | ✓ | - |
22- | [ mistralai/Mistral-Large-Instruct-2411] | x | - | ✓ | - |
11+ | Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | OVMS | Optimum-Habana |
12+ | ------------------------------------------- | --------- | -------- | ---------- | -------- | -------------- |
13+ | [ Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ | ✓ | ✓ |
14+ | [ meta-llama/Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ | ✓ | ✓ |
15+ | [ meta-llama/Llama-2-70b-chat-hf] | ✓ | - | ✓ | - | ✓ |
16+ | [ meta-llama/Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ | ✓ | ✓ |
17+ | [ meta-llama/Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ | - | ✓ |
18+ | [ Phi-3] | x | Limit 4K | Limit 4K | Limit 4K | ✓ |
19+ | [ Phi-4] | x | x | x | x | ✓ |
20+ | [ deepseek-ai/DeepSeek-R1-Distill-Llama-8B] | ✓ | - | ✓ | - | ✓ |
21+ | [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓ | - | ✓ | - | ✓ |
22+ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-14B] | ✓ | - | ✓ | - | ✓ |
23+ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | ✓ | - | ✓ | - | ✓ |
24+ | [ mistralai/Mistral-Small-24B-Instruct-2501] | ✓ | - | ✓ | - | ✓ |
25+ | [ mistralai/Mistral-Large-Instruct-2411] | x | - | ✓ | - | ✓ |
2326
2427### System Requirements for LLM Models
2528
@@ -31,7 +34,10 @@ Overall, this microservice offers a streamlined way to integrate large language
3134| [ meta-llama/Meta-Llama-3-8B-Instruct] | 1 |
3235| [ meta-llama/Meta-Llama-3-70B-Instruct] | 2 |
3336| [ Phi-3] | x |
37+ | [ Phi-4] | x |
38+ | [ deepseek-ai/DeepSeek-R1-Distill-Llama-8B] | 1 |
3439| [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | 8 |
40+ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-14B] | 2 |
3541| [ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | 4 |
3642| [ mistralai/Mistral-Small-24B-Instruct-2501] | 1 |
3743| [ mistralai/Mistral-Large-Instruct-2411] | 4 |
@@ -192,8 +198,11 @@ curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \
192198[ meta-llama/Meta-Llama-3-8B-Instruct ] : https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
193199[ meta-llama/Meta-Llama-3-70B-Instruct ] : https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
194200[ Phi-3 ] : https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3
201+ [ Phi-4 ] : https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4
195202[ HuggingFace ] : https://huggingface.co/
203+ [ deepseek-ai/DeepSeek-R1-Distill-Llama-8B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
196204[ deepseek-ai/DeepSeek-R1-Distill-Llama-70B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
197205[ deepseek-ai/DeepSeek-R1-Distill-Qwen-32B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
206+ [ deepseek-ai/DeepSeek-R1-Distill-Qwen-14B ] : https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
198207[ mistralai/Mistral-Small-24B-Instruct-2501 ] : https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
199208[ mistralai/Mistral-Large-Instruct-2411 ] : https://huggingface.co/mistralai/Mistral-Large-Instruct-2411
0 commit comments