v1.4.0

Latest

Latest

XprobeBot released this 21 Mar 07:17

ac88d42

What's new in 1.4.0 (2025-03-21)

These are the changes in inference v1.4.0.

New features

FEAT: Support gemma-3 text part by @zky001 in #3077
FEAT: Gemma-3-it that supports vision by @qinxuye in #3102
FEAT: add deepseek v3 function calling by @rogercloud in #3103

Enhancements

ENH: xllamacpp backend raise exception if failed by @codingl2k1 in #3053
ENH: [UI] change 'GPU Count' to 'GPU Count per Replica'. by @yiboyasss in #3078

Bug fixes

BUG: [UI] fix dark mode bugs. by @yiboyasss in #3028
BUG: fix Internvl2.5-mpo awq, fix model card info typo by @Minamiyama in #3067
BUG: fix max_tokens for MLX VL models. by @qinxuye in #3072
BUG:fix vLLM parameter "enable_prefix_caching" by @Gmgge in #3081
BUG: fix first token error and support deepseek stream api by @amumu96 in #3090

Documentation

DOC: add auth usage guide for http request by @Minamiyama in #3065
DOC: add xllamacpp related docs by @qinxuye in #3088

Others

FIX: [UI] remove the restriction of model_format on n_gpu for llama.cpp by @yiboyasss in #3050

New Contributors

@Gmgge made their first contribution in #3081
@zky001 made their first contribution in #3077
@rogercloud made their first contribution in #3103

Full Changelog: v1.3.1...v1.4.0

Contributors

qinxuye, Minamiyama, and 6 other contributors

Assets 2