Skip to content

Commit beb9393

Browse files
committed
Add Jlama - Java-based LLM inference engine to Inference Engines section
- Jlama: Modern LLM inference engine for Java with 1.2k+ stars - Apache 2.0 licensed, actively maintained (last release Jan 2026) - Supports multiple models (Gemma, Llama, Mistral, Qwen, IBM Granite, GPT-2, BERT) - Features: PagedAttention, MoE, tool calling, embeddings, quantization (Q8/Q4) - Includes CLI and REST API for easy integration
1 parent b77337c commit beb9393

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,7 @@
348348
- **[CTranslate2](https://github.com/OpenNMT/CTranslate2)** ![GitHub stars](https://img.shields.io/github/stars/OpenNMT/CTranslate2?style=social) - Fast inference engine for Transformer models supporting OpenNMT and Hugging Face models. Optimized for CPU and GPU with batching, quantization (INT8/FP16), and dynamic memory management. Powers faster-whisper and other production deployments. MIT licensed.
349349
- **[PowerInfer](https://github.com/SJTU-IPADS/PowerInfer)** ![GitHub stars](https://img.shields.io/github/stars/SJTU-IPADS/PowerInfer?style=social) - High-speed LLM inference for local deployment on consumer GPUs. Achieves up to 11x speedup over llama.cpp on RTX 4090 by exploiting power-law neuron activation patterns. MIT licensed.
350350
- **[distributed-llama](https://github.com/b4rtaz/distributed-llama)** ![GitHub stars](https://img.shields.io/github/stars/b4rtaz/distributed-llama?style=social) - Distributed LLM inference connecting home devices into a powerful cluster. More devices means faster inference via tensor parallelism over Ethernet. Supports Linux, macOS, Windows, ARM, and x86_64 AVX2 CPUs. MIT licensed.
351+
- **[Jlama](https://github.com/tjake/Jlama)** ![GitHub stars](https://img.shields.io/github/stars/tjake/Jlama?style=social) - Modern LLM inference engine for Java. Supports multiple models (Gemma, Llama, Mistral, Qwen, IBM Granite, GPT-2, BERT) with PagedAttention, MoE, tool calling, embeddings, and quantization (Q8/Q4). Includes CLI and REST API. Apache 2.0 licensed.
351352

352353
#### Quantization, Distillation & Optimization
353354

0 commit comments

Comments
 (0)