First 3.x release.
See for information on what changed and how to upgrade
What's Changed
- Decouple from tokenizer and downloader packages by @DePasqualeOrg in #118
- Batched LLM inference part 1 - consolidating RoPE calls by @ronaldmannak in #178
- Add speculative decoding by @petrukha-ivan in #173
- Fix doc comments and verify in CI by @DePasqualeOrg in #176
- Add more documentation for integrations to readme by @DePasqualeOrg in #201
- Fix tool calling for Llama 3 by @aleroot in #145
- Add IntegrationTesting Xcode project and additional integration test models by @atdrendel in #142
- Fix links in readme by @DePasqualeOrg in #204
- Fix Swift 6 Sendable error in Llama3ToolCallParser by @Lakr233 in #203
- Add gemma 4 model (text, vision, MoE) by @adrgrondin in #180
- Add Gemma 4 text model support (E2B and E4B) by @stefan-geens in #185
- small v3 api fixes by @davidkoski in #190
- v3 api embedder fixes by @davidkoski in #202
- Fix prompt-cache round-trip support for
ArraysCache,MambaCache, andCacheListby @ronaldmannak in #155 - Prepare Gemma4Text for batched RoPE offsets by @ronaldmannak in #212
- Fix Gemma 4 system message and modality order by @adrgrondin in #211
- Add qwen3_next to tool call format inference by @alankessler in #166
- add upgrade docs, how to use, how to develop. by @davidkoski in #206
New Contributors
- @Lakr233 made their first contribution in #203
- @stefan-geens made their first contribution in #185
Full Changelog: 2.31.3...3.31.3