From 535501c3788191a563ac7c9375551b94fbab4942 Mon Sep 17 00:00:00 2001 From: Diego Souza <8016841+diegosouzapw@users.noreply.github.com> Date: Fri, 20 Feb 2026 13:42:46 -0300 Subject: [PATCH] =?UTF-8?q?Add=20OmniRoute=20=E2=80=94=20self-hostable=20A?= =?UTF-8?q?I=20gateway=20with=204-tier=20fallback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 083ff999..c3ece483 100644 --- a/README.md +++ b/README.md @@ -455,6 +455,7 @@ - [vLLM](https://github.com/vllm-project/vllm) - A high-throughput and memory-efficient inference and serving engine for LLMs. - [llama.cpp](https://github.com/ggerganov/llama.cpp) - LLM inference in C/C++. - [ollama](https://github.com/ollama/ollama) - Get up and running with Llama 3, Mistral, Gemma, and other large language models. +- [OmniRoute](https://github.com/diegosouzapw/OmniRoute) - A self-hostable AI gateway with 4-tier cascading fallback, multi-provider load balancing, and OpenAI-compatible API. Supports 200+ models across OpenAI, Anthropic, Google, and local providers. - [TGI](https://huggingface.co/docs/text-generation-inference/en/index) - a toolkit for deploying and serving Large Language Models (LLMs). - [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) - Nvidia Framework for LLM Inference