Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,7 @@
- [vLLM](https://github.com/vllm-project/vllm) - A high-throughput and memory-efficient inference and serving engine for LLMs.
- [llama.cpp](https://github.com/ggerganov/llama.cpp) - LLM inference in C/C++.
- [ollama](https://github.com/ollama/ollama) - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
- [OmniRoute](https://github.com/diegosouzapw/OmniRoute) - A self-hostable AI gateway with 4-tier cascading fallback, multi-provider load balancing, and OpenAI-compatible API. Supports 200+ models across OpenAI, Anthropic, Google, and local providers.
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OmniRoute appears to be a gateway/routing tool rather than a direct inference engine. Similar tools like "AI Gateway" (Portkey) and "TensorZero" are listed in the "LLM Applications" section (lines 510, 551), while the "LLM Inference" section primarily contains tools that perform actual inference (vLLM, llama.cpp, ollama, TGI, TensorRT-LLM). Consider moving this entry to the "LLM Applications" section, or to the "other deployment tools" subsection within "LLM Inference" (after line 461) where similar routing/serving tools like FastChat are located.

Copilot uses AI. Check for mistakes.
- [TGI](https://huggingface.co/docs/text-generation-inference/en/index) - a toolkit for deploying and serving Large Language Models (LLMs).
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) - Nvidia Framework for LLM Inference
<details>
Expand Down
Loading