Skip to main content

5 docs tagged with "inference-gateway"

View all tags

Cascade Routing Production Tuning

Guide to tuning Inference Gateway Cascade Routing classification thresholds, Canary rollout, Fallback, and cost drift alerts based on production traces

Inference Gateway

Routing strategies, deployment, cascade tuning, and implementation examples for kgateway and Bifrost-based 2-Tier inference gateways

Semantic Caching Strategy

LLM Gateway-level semantic caching strategy and implementation options comparison (GPTCache, Redis Semantic Cache, Portkey, Helicone, Bifrost+Redis)