What's new in 1.4.0 (2025-03-21)
These are the changes in inference v1.4.0.
New features
- FEAT: Support gemma-3 text part by @zky001 in #3077
- FEAT: Gemma-3-it that supports vision by @qinxuye in #3102
- FEAT: add deepseek v3 function calling by @rogercloud in #3103
Enhancements
- ENH: xllamacpp backend raise exception if failed by @codingl2k1 in #3053
- ENH: [UI] change 'GPU Count' to 'GPU Count per Replica'. by @yiboyasss in #3078
Bug fixes
- BUG: [UI] fix dark mode bugs. by @yiboyasss in #3028
- BUG: fix Internvl2.5-mpo awq, fix model card info typo by @Minamiyama in #3067
- BUG: fix max_tokens for MLX VL models. by @qinxuye in #3072
- BUG:fix vLLM parameter "enable_prefix_caching" by @Gmgge in #3081
- BUG: fix first token error and support deepseek stream api by @amumu96 in #3090
Documentation
- DOC: add auth usage guide for http request by @Minamiyama in #3065
- DOC: add xllamacpp related docs by @qinxuye in #3088
Others
- FIX: [UI] remove the restriction of model_format on n_gpu for llama.cpp by @yiboyasss in #3050
New Contributors
- @Gmgge made their first contribution in #3081
- @zky001 made their first contribution in #3077
- @rogercloud made their first contribution in #3103
Full Changelog: v1.3.1...v1.4.0