LLM Gateway

Intelligent multi-provider LLM gateway with cost-optimized routing, semantic caching, automated benchmarking, and real-time observability. Drop-in OpenAI API replacement.

TypeScript Hono Tests License
View on GitHub Quick Start
$ git clone https://github.com/aptsalt/llm-gateway.git
$ cd llm-gateway && npm install && npm run dev
5
Providers
14
Benchmark Tasks
111
Tests Passing
0
Dependencies*

*Hono framework — zero dependency runtime

Features

Everything you need to run LLMs in production with full control over cost, quality, and reliability.

🧠

Smart Routing

Classifies prompt complexity and routes to the optimal model. Choose cost, quality, latency, or balanced strategies with configurable weights.

Semantic Caching

Redis-backed cosine similarity cache catches rephrased queries. "What's 2+2?" and "What is two plus two?" hit the same cache entry.

🛡

Circuit Breakers

Per-provider state machines with automatic failover. When a provider goes down, requests fail fast instead of waiting for timeouts.

💰

Budget Control

Per-key and global monthly token/USD limits with alerts at 80% and 95%. Never get a surprise bill again.

📊

Benchmarking

14 standardized tasks across 5 categories. Data-driven model selection — no more guessing which model is best for your use case.

🔄

OpenAI Compatible

Change your base_url and you're done. Works with any OpenAI SDK, LangChain, LlamaIndex, or plain curl.

Dashboard

Full-featured admin dashboard with real-time monitoring, routing configuration, and interactive playground.

Dashboard Overview
Overview — Provider health, latency charts, token usage, budget tracking
Providers
Providers — Real-time latency monitoring and health status
Routing
Routing — Strategy selector, weight sliders, fallback chain config
API Keys
API Keys — Key management with per-key budget limits
Analytics
Analytics — Cost breakdown, provider usage, token consumption
Playground
Playground — Interactive prompt testing with routing metadata
Benchmarks
Benchmarks — Model comparison with scorecards, radar charts, and detailed results

Architecture

Request pipeline: Auth → Cache → Route → Execute → Log

                         ┌────────────────────────────┐
                         │     Client Applications     │
                         │   (OpenAI SDK / curl / any) │
                         └──────────┬─────────────────┘
                                    │
                          POST /v1/chat/completions
                                    │
┌───────────────────────────────────▼────────────────────────────────────┐
│                        LLM Gateway (Hono)                              │
│                                                                        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────────────────┐  │
│  │  Auth &   │→│ Semantic   │→│  Router   │→│  Fallback Chain      │  │
│  │  Budget   │  │  Cache     │  │  Engine   │  │  + Circuit Breakers  │  │
│  └──────────┘  └───────────┘  └──────────┘  └──────────┬───────────┘  │
│                                                         │              │
│  ┌──────────────────────────────────────────────────────┘              │
│  │                                                                     │
│  ▼          ▼           ▼           ▼           ▼                      │
│ Ollama    OpenAI    Anthropic     Groq      Together                   │
│ (local)   (cloud)   (cloud)     (cloud)    (cloud)                    │
│                                                                        │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────────────────┐  │
│  │Prometheus│  │ PostgreSQL │  │  Redis    │  │  Benchmark Suite     │  │
│  │ Metrics  │  │   Logs     │  │  Cache    │  │  (14 tasks)          │  │
│  └──────────┘  └───────────┘  └──────────┘  └──────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘

Supported Providers

Route across local and cloud providers with intelligent model selection.

Ollama

Local GPU inference
Free

Groq

Ultra-fast cloud
Very Low Cost

Together AI

Open-source models
Low Cost

OpenAI

GPT-4o, GPT-4o-mini
Medium-High

Anthropic

Claude Sonnet, Haiku
Medium-High