Skip to main content

Basic Deployment

This document covers the procedures for deploying core components of the kgateway + Bifrost-based inference gateway. Route multiple services path-based behind a single NLB endpoint and implement multi-provider integration with Bifrost Gateway Mode.

Time Required

Learning: 30 min | Deployment: 45 min


1. kgateway Installation and Basic Resource Configuration

1.1 Install Gateway API CRDs

# Install Gateway API standard CRDs (v1.2.0+)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml

# Install with experimental features (HTTPRoute filters, etc.)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/experimental-install.yaml

1.2 Install kgateway v2.2.2 via Helm

# Add Helm repository
helm repo add kgateway oci://ghcr.io/kgateway-dev/charts
helm repo update

# Create namespace
kubectl create namespace kgateway-system

# Install kgateway v2.2.2
helm install kgateway kgateway/kgateway \
--namespace kgateway-system \
--version v2.2.2 \
--set controller.replicaCount=2 \
--set controller.resources.requests.cpu=500m \
--set controller.resources.requests.memory=512Mi \
--set controller.resources.limits.cpu=1000m \
--set controller.resources.limits.memory=1Gi \
--set metrics.enabled=true \
--set metrics.port=9091

1.3 Define GatewayClass

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: kgateway
spec:
controllerName: kgateway.dev/kgateway-controller
description: "Kgateway for AI inference routing"
parametersRef:
group: kgateway.dev
kind: GatewayClassConfig
name: kgateway-config
---
apiVersion: kgateway.dev/v1alpha1
kind: GatewayClassConfig
metadata:
name: kgateway-config
spec:
proxy:
replicas: 3
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
connectionSettings:
maxConnections: 10000
connectTimeout: 10s
idleTimeout: 60s

1.4 Gateway Resource (Single NLB Integration)

Production Environment Required

This is a basic configuration for development/test. In production, always apply Advanced Features: CloudFront + WAF/Shield and do not expose NLB directly. Opening SG publicly without authentication will be automatically blocked by company policy.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: unified-gateway
namespace: ai-gateway
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
gatewayClassName: kgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All

1.5 ReferenceGrant (Cross-Namespace Access)

ReferenceGrant is required for HTTPRoute to reference Services in different namespaces.

# Allow access to Services in ai-inference namespace
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-gateway-to-services
namespace: ai-inference
spec:
from:
- group: gateway.networking.k8s.io
kind: HTTPRoute
namespace: ai-gateway
to:
- group: ""
kind: Service
---
# Allow access to Langfuse Service in observability namespace
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-gateway-to-langfuse
namespace: observability
spec:
from:
- group: gateway.networking.k8s.io
kind: HTTPRoute
namespace: ai-gateway
to:
- group: ""
kind: Service

2. HTTPRoute Configuration

Route multiple services path-based behind a single NLB endpoint.

2.1 Direct vLLM Routing

This pattern routes directly from kgateway to vLLM without Bifrost. Simplest approach when using a single model.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: vllm-route
namespace: ai-inference
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/
backendRefs:
- name: vllm-service
port: 8000

2.2 Routing via Bifrost

Route through Bifrost when multi-provider integration, Cascade Routing, and OTel monitoring are required.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: bifrost-route
namespace: ai-gateway
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/
backendRefs:
- name: bifrost-service
namespace: ai-external
port: 8080

2.3 Langfuse Sub-path Routing (URLRewrite)

Langfuse (Next.js) serves from /, so URLRewrite is required to access via /langfuse prefix. For Langfuse architecture and deployment details, refer to Langfuse Deployment Guide.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: langfuse-route
namespace: observability
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
# /langfuse → / prefix removal
- matches:
- path:
type: PathPrefix
value: /langfuse/
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /
backendRefs:
- name: langfuse-web
port: 3000
# Next.js static assets
- matches:
- path:
type: PathPrefix
value: /_next
backendRefs:
- name: langfuse-web
port: 3000
# Langfuse auth API
- matches:
- path:
type: PathPrefix
value: /api/auth
backendRefs:
- name: langfuse-web
port: 3000
# Langfuse public API
- matches:
- path:
type: PathPrefix
value: /api/public
backendRefs:
- name: langfuse-web
port: 3000
# Favicon and static files
- matches:
- path:
type: PathPrefix
value: /icon.svg
backendRefs:
- name: langfuse-web
port: 3000

2.4 OTel URLRewrite (Bifrost → Langfuse)

Bifrost OTel plugin uses only the base path of collector_url, so kgateway converts it to the full OTLP path. For OTel integration details, refer to Langfuse OTel Configuration.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: langfuse-otel-route
namespace: observability
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /api/public/otel
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /api/public/otel/v1/traces
backendRefs:
- name: langfuse-web
port: 3000

2.5 Routing Endpoint Structure Summary

http://<NLB_ENDPOINT>/v1/*           → vLLM or Bifrost (Inference API)
http://<NLB_ENDPOINT>/langfuse/* → Langfuse (Observability UI)
http://<NLB_ENDPOINT>/_next/* → Langfuse (Static Assets)
http://<NLB_ENDPOINT>/api/public/* → Langfuse (API + OTel)
https://<AMG_ENDPOINT> → Grafana (Separate managed service)
Configuration Changes Apply Immediately

Gateway API CRD-based routing reflects in real-time without Pod restarts. When you modify HTTPRoute or Gateway resources, the kgateway controller automatically detects and applies changes immediately.


3. Bifrost Gateway Mode Configuration

3.1 config.json Structure

Bifrost Gateway Mode is configured declaratively with config.json. This is a verified working format.

{
"$schema": "https://www.getbifrost.ai/schema",
"providers": {
"openai": {
"keys": [
{
"name": "local-vllm",
"value": "dummy",
"weight": 1.0,
"models": ["glm-5"]
}
],
"network_config": {
"base_url": "http://glm5-serving.agentic-serving.svc.cluster.local:8000"
}
}
},
"plugins": [
{
"enabled": true,
"name": "otel",
"config": {
"service_name": "bifrost",
"trace_type": "otel",
"protocol": "http",
"collector_url": "http://langfuse-web.langfuse.svc.cluster.local:3000/api/public/otel/v1/traces",
"headers": {
"Authorization": "Basic <BASE64(pk:sk)>",
"x-langfuse-ingestion-version": "4"
}
}
}
]
}

3.2 Key Configuration Items

providers (Map Structure)

  • providers is a map (not array). Keys are Bifrost built-in provider names (openai, anthropic, etc.)
  • keys is an array, models restrict available models
  • Request model names use provider/model format (e.g., openai/glm-5)
providers Format Warning

Writing "providers": [...] (array) will make settings invisible in UI. Always use "providers": {...} (map).

OTel Plugin

  • trace_type must be "otel" (traces won't arrive at Langfuse if using "genai_extension")
  • collector_url is the full Langfuse OTLP path: /api/public/otel/v1/traces
  • Authorization header: Basic <BASE64(public_key:secret_key)> format

4. Bifrost K8s Deployment Pattern (PVC + initContainer)

Bifrost manages config.json + SQLite at the -app-dir path. Implement declarative deployment using PVC and initContainer.

4.1 PVC + ConfigMap + Deployment

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bifrost-data
namespace: ai-external
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: bifrost-gateway-config
namespace: ai-external
data:
config.json: |
{
"$schema": "https://www.getbifrost.ai/schema",
"providers": {
"openai": {
"keys": [{"name": "local-vllm", "value": "dummy", "weight": 1.0, "models": ["glm-5"]}],
"network_config": {"base_url": "http://vllm-service:8000"}
}
},
"plugins": [{
"enabled": true,
"name": "otel",
"config": {
"service_name": "bifrost",
"trace_type": "otel",
"protocol": "http",
"collector_url": "http://langfuse-web.langfuse.svc.cluster.local:3000/api/public/otel/v1/traces",
"headers": {
"Authorization": "Basic <BASE64(pk:sk)>",
"x-langfuse-ingestion-version": "4"
}
}
}]
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: bifrost
namespace: ai-external
spec:
replicas: 3
selector:
matchLabels:
app: bifrost
template:
metadata:
labels:
app: bifrost
spec:
securityContext:
fsGroup: 1000
initContainers:
- name: setup
image: busybox
command:
- sh
- -c
- |
cp /config/config.json /app/data/config.json
chown 1000:1000 /app/data/config.json
volumeMounts:
- name: bifrost-data
mountPath: /app/data
- name: gateway-config
mountPath: /config
containers:
- name: bifrost
image: bifrost/bifrost:v2.0.0
args: ["-app-dir", "/app/data"]
ports:
- containerPort: 8080
name: http
volumeMounts:
- name: bifrost-data
mountPath: /app/data
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
volumes:
- name: bifrost-data
persistentVolumeClaim:
claimName: bifrost-data
- name: gateway-config
configMap:
name: bifrost-gateway-config
---
apiVersion: v1
kind: Service
metadata:
name: bifrost-service
namespace: ai-external
spec:
selector:
app: bifrost
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
fsGroup: 1000 Required

Bifrost container runs as UID 1000. Without setting securityContext.fsGroup: 1000, PVC write permission errors will occur.


5. Bifrost provider/model Format and IDE Compatibility

Bifrost uses provider/model format for model names.

5.1 Correct Model Name Format

openai/gpt-4o           (provider/model)
anthropic/claude-sonnet-4
openai/glm-5 (self-hosted vLLM also uses openai provider)

gpt-4o (missing provider — error)
openai-gpt-4o (hyphen instead of slash — error)

5.2 IDE/Coding Tool Compatibility

Toolmodel Field TransmissionBifrost CompatibleConfiguration Method
ClinePassed as-isModel ID: openai/glm-5
Continue.devPassed as-ismodel: openai/glm-5
AiderLiteLLM prefix removal⚠️ double-prefix neededopenai/openai/glm-5
CursorSelf-validation rejectsRejects model names with /

5.3 Aider Connection Example

# double-prefix trick: LiteLLM removes first openai/ → sends openai/glm-5 to Bifrost
aider --model openai/openai/glm-5 \
--openai-api-base http://<NLB_ENDPOINT>/v1 \
--openai-api-key dummy \
--no-auto-commits

5.4 Continue.dev Configuration Example

{
"models": [
{
"title": "GLM-5 (Bifrost)",
"provider": "openai",
"model": "openai/glm-5",
"apiBase": "http://<NLB_ENDPOINT>/v1",
"apiKey": "dummy"
}
]
}

5.5 Cline Configuration Example

Settings -> API Provider -> OpenAI Compatible

  • Base URL: http://<NLB_ENDPOINT>/v1
  • Model: openai/glm-5
  • API Key: dummy

5.6 Python Client Example

from openai import OpenAI

client = OpenAI(
base_url="http://<NLB_ENDPOINT>/v1",
api_key="dummy"
)

response = client.chat.completions.create(
model="openai/glm-5", # provider/model format required
messages=[{"role": "user", "content": "Hello"}]
)
Endpoint De-identification

In production, map NLB endpoints to domain names (e.g., api.your-company.com). Do not expose direct IP addresses or AWS auto-generated DNS names.


6. SQLite Initialization Procedure (When config.json Changes)

Bifrost reads config.json once at startup and stores in SQLite. Afterward, it uses SQLite, so SQLite must be regenerated when config.json changes.

Change Procedure

# 1. Update ConfigMap
kubectl apply -f bifrost-gateway-config.yaml

# 2. Delete Pods (PVC data config.db automatically initialized)
kubectl delete pod -l app=bifrost -n ai-external

# 3. initContainer copies new config.json → Bifrost regenerates SQLite
kubectl get pods -n ai-external -l app=bifrost -w
Difference from kgateway CRD Changes

kgateway automatically reflects CRD changes (no Pod restart needed), but Bifrost requires Pod restart when ConfigMap changes. Always understand this difference during operations.


Verification

Verify configuration after deployment with these commands.

# 1. Check Gateway status
kubectl get gateway -n ai-gateway

# 2. Check HTTPRoute status
kubectl get httproute -A

# 3. Check NLB endpoint
export NLB_ENDPOINT=$(kubectl get gateway unified-gateway -n ai-gateway \
-o jsonpath='{.status.addresses[0].value}')
echo "NLB Endpoint: ${NLB_ENDPOINT}"

# 4. Test direct vLLM access (when using vllm-route)
curl -s http://${NLB_ENDPOINT}/v1/models | jq .

# 5. Test via Bifrost (when using bifrost-route)
curl -s http://${NLB_ENDPOINT}/v1/models | jq .

# 6. Test Langfuse access
curl -s -o /dev/null -w "%{http_code}" http://${NLB_ENDPOINT}/langfuse/
# Expected: 200

Next Steps

Basic deployment is complete. Proceed to the next steps:

  1. Troubleshooting: If errors occurred during deployment, refer to Troubleshooting Guide.
  2. Advanced Features: Configure LLM Classifier, CloudFront/WAF, Semantic Caching for production environments.
  3. Monitoring: Complete OTel integration by referring to Langfuse Deployment Guide.

References