LLM Gateway - Intelligent Multi-Provider LLM Gateway

5

Providers

14

Benchmark Tasks

111

Tests Passing

0

Dependencies*

*Hono framework — zero dependency runtime

Features

Everything you need to run LLMs in production with full control over cost, quality, and reliability.

🧠

Smart Routing

Classifies prompt complexity and routes to the optimal model. Choose cost, quality, latency, or balanced strategies with configurable weights.

⚡

Semantic Caching

Redis-backed cosine similarity cache catches rephrased queries. "What's 2+2?" and "What is two plus two?" hit the same cache entry.

🛡

Circuit Breakers

Per-provider state machines with automatic failover. When a provider goes down, requests fail fast instead of waiting for timeouts.

💰

Budget Control

Per-key and global monthly token/USD limits with alerts at 80% and 95%. Never get a surprise bill again.

📊

Benchmarking

14 standardized tasks across 5 categories. Data-driven model selection — no more guessing which model is best for your use case.

🔄

OpenAI Compatible

Change your base_url and you're done. Works with any OpenAI SDK, LangChain, LlamaIndex, or plain curl.

Dashboard

Full-featured admin dashboard with real-time monitoring, routing configuration, and interactive playground.

Overview — Provider health, latency charts, token usage, budget tracking

Providers — Real-time latency monitoring and health status

Routing — Strategy selector, weight sliders, fallback chain config

API Keys — Key management with per-key budget limits

Analytics — Cost breakdown, provider usage, token consumption

Playground — Interactive prompt testing with routing metadata

Benchmarks — Model comparison with scorecards, radar charts, and detailed results

Architecture

Request pipeline: Auth → Cache → Route → Execute → Log

                         ┌────────────────────────────┐
                         │     Client Applications     │
                         │   (OpenAI SDK / curl / any) │
                         └──────────┬─────────────────┘
                                    │
                          POST /v1/chat/completions
                                    │
┌───────────────────────────────────▼────────────────────────────────────┐
│                        LLM Gateway (Hono)                              │
│                                                                        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────────────────┐  │
│  │  Auth &   │→│ Semantic   │→│  Router   │→│  Fallback Chain      │  │
│  │  Budget   │  │  Cache     │  │  Engine   │  │  + Circuit Breakers  │  │
│  └──────────┘  └───────────┘  └──────────┘  └──────────┬───────────┘  │
│                                                         │              │
│  ┌──────────────────────────────────────────────────────┘              │
│  │                                                                     │
│  ▼          ▼           ▼           ▼           ▼                      │
│ Ollama    OpenAI    Anthropic     Groq      Together                   │
│ (local)   (cloud)   (cloud)     (cloud)    (cloud)                    │
│                                                                        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────────────────┐  │
│  │Prometheus│  │ PostgreSQL │  │  Redis    │  │  Benchmark Suite     │  │
│  │ Metrics  │  │   Logs     │  │  Cache    │  │  (14 tasks)          │  │
│  └──────────┘  └───────────┘  └──────────┘  └──────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘

Supported Providers

Route across local and cloud providers with intelligent model selection.

Ollama

Local GPU inference
Free

Groq

Ultra-fast cloud
Very Low Cost

Together AI

Open-source models
Low Cost

OpenAI

GPT-4o, GPT-4o-mini
Medium-High

Anthropic

Claude Sonnet, Haiku
Medium-High