Inference Gateway
Overview
The core data plane of Agentic AI platforms is the inference gateway. A 2-Tier architecture is recommended: kgateway (Tier 1) handles authentication, rate limiting, and guardrails, while Bifrost (Tier 2) performs model routing, fallback, and cost tracking. This section provides routing strategy overviews, production deployment guides, Cascade Routing tuning, and OpenClaw implementation examples.
Document List
📄️ Gateway Routing Strategy
kgateway + Bifrost/LiteLLM 2-Tier architecture with Cascade Routing, Semantic Router, and Hybrid Routing design patterns
📄️ Cascade Routing Tuning
Guide to tuning Inference Gateway Cascade Routing classification thresholds, Canary rollout, Fallback, and cost drift alerts based on production traces
📄️ OpenClaw AI Gateway
Deploy OpenClaw AI Agent Gateway on EKS with cost optimization, and achieve full observability using Bifrost Auto-Router + Cilium Hubble + Langfuse
🗃️ Setup Guide
3 items