3.31.3

Latest

Latest

davidkoski released this 15 Apr 14:33

· 49 commits to main since this release

1c05248

First 3.x release.

See for information on what changed and how to upgrade

What's Changed

Decouple from tokenizer and downloader packages by @DePasqualeOrg in #118
Batched LLM inference part 1 - consolidating RoPE calls by @ronaldmannak in #178
Add speculative decoding by @petrukha-ivan in #173
Fix doc comments and verify in CI by @DePasqualeOrg in #176
Add more documentation for integrations to readme by @DePasqualeOrg in #201
Fix tool calling for Llama 3 by @aleroot in #145
Add IntegrationTesting Xcode project and additional integration test models by @atdrendel in #142
Fix links in readme by @DePasqualeOrg in #204
Fix Swift 6 Sendable error in Llama3ToolCallParser by @Lakr233 in #203
Add gemma 4 model (text, vision, MoE) by @adrgrondin in #180
Add Gemma 4 text model support (E2B and E4B) by @stefan-geens in #185
small v3 api fixes by @davidkoski in #190
v3 api embedder fixes by @davidkoski in #202
Fix prompt-cache round-trip support for ArraysCache, MambaCache, and CacheList by @ronaldmannak in #155
Prepare Gemma4Text for batched RoPE offsets by @ronaldmannak in #212
Fix Gemma 4 system message and modality order by @adrgrondin in #211
Add qwen3_next to tool call format inference by @alankessler in #166
add upgrade docs, how to use, how to develop. by @davidkoski in #206

New Contributors

@Lakr233 made their first contribution in #203
@stefan-geens made their first contribution in #185

Full Changelog: 2.31.3...3.31.3

Contributors

atdrendel, ronaldmannak, and 8 other contributors

Assets 2