Intelligent multi-provider LLM gateway with cost-optimized routing, semantic caching, automated benchmarking, and real-time observability. Drop-in OpenAI API replacement.
*Hono framework — zero dependency runtime
Everything you need to run LLMs in production with full control over cost, quality, and reliability.
Classifies prompt complexity and routes to the optimal model. Choose cost, quality, latency, or balanced strategies with configurable weights.
Redis-backed cosine similarity cache catches rephrased queries. "What's 2+2?" and "What is two plus two?" hit the same cache entry.
Per-provider state machines with automatic failover. When a provider goes down, requests fail fast instead of waiting for timeouts.
Per-key and global monthly token/USD limits with alerts at 80% and 95%. Never get a surprise bill again.
14 standardized tasks across 5 categories. Data-driven model selection — no more guessing which model is best for your use case.
Change your base_url and you're done. Works with any OpenAI SDK, LangChain, LlamaIndex, or plain curl.
Full-featured admin dashboard with real-time monitoring, routing configuration, and interactive playground.
Request pipeline: Auth → Cache → Route → Execute → Log
┌────────────────────────────┐
│ Client Applications │
│ (OpenAI SDK / curl / any) │
└──────────┬─────────────────┘
│
POST /v1/chat/completions
│
┌───────────────────────────────────▼────────────────────────────────────┐
│ LLM Gateway (Hono) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ Auth & │→│ Semantic │→│ Router │→│ Fallback Chain │ │
│ │ Budget │ │ Cache │ │ Engine │ │ + Circuit Breakers │ │
│ └──────────┘ └───────────┘ └──────────┘ └──────────┬───────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ Ollama OpenAI Anthropic Groq Together │
│ (local) (cloud) (cloud) (cloud) (cloud) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │Prometheus│ │ PostgreSQL │ │ Redis │ │ Benchmark Suite │ │
│ │ Metrics │ │ Logs │ │ Cache │ │ (14 tasks) │ │
│ └──────────┘ └───────────┘ └──────────┘ └──────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
Route across local and cloud providers with intelligent model selection.
Local GPU inference
Free
Ultra-fast cloud
Very Low Cost
Open-source models
Low Cost
GPT-4o, GPT-4o-mini
Medium-High
Claude Sonnet, Haiku
Medium-High