基础部署
本文档涵盖部署基于 kgateway + Bifrost 的推理网关核心组件的流程。在单个 NLB 端点后基于路径路由多个服务,并通过 Bifrost Gateway Mode 实现多提供商集成。
学习:30分钟 | 部署:45分钟
1. kgateway 安装及基础资源配置
1.1 Gateway API CRD 安装
# 安装 Gateway API 标准 CRD(v1.2.0+)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
# 包含实验性功能的安装(HTTPRoute 过滤器等)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/experimental-install.yaml
1.2 kgateway v2.2.2 Helm 安装
# 添加 Helm 仓库
helm repo add kgateway oci://ghcr.io/kgateway-dev/charts
helm repo update
# 创建命名空间
kubectl create namespace kgateway-system
# 安装 kgateway v2.2.2
helm install kgateway kgateway/kgateway \
--namespace kgateway-system \
--version v2.2.2 \
--set controller.replicaCount=2 \
--set controller.resources.requests.cpu=500m \
--set controller.resources.requests.memory=512Mi \
--set controller.resources.limits.cpu=1000m \
--set controller.resources.limits.memory=1Gi \
--set metrics.enabled=true \
--set metrics.port=9091
1.3 GatewayClass 定义
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: kgateway
spec:
controllerName: kgateway.dev/kgateway-controller
description: "Kgateway for AI inference routing"
parametersRef:
group: kgateway.dev
kind: GatewayClassConfig
name: kgateway-config
---
apiVersion: kgateway.dev/v1alpha1
kind: GatewayClassConfig
metadata:
name: kgateway-config
spec:
proxy:
replicas: 3
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
connectionSettings:
maxConnections: 10000
connectTimeout: 10s
idleTimeout: 60s
1.4 Gateway 资源(单一 NLB 集成)
以下为开发/测试用基础配置。生产环境中必须应用 高级功能:CloudFront + WAF/Shield,不要直接暴露 NLB。未经认证公开 SG 将被公司策略自动阻止。
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: unified-gateway
namespace: ai-gateway
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
gatewayClassName: kgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
1.5 ReferenceGrant(跨命名空间访问)
HTTPRoute 要引用其他命名空间的 Service 需要 ReferenceGrant。
# 允许访问 ai-inference 命名空间的 Service
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-gateway-to-services
namespace: ai-inference
spec:
from:
- group: gateway.networking.k8s.io
kind: HTTPRoute
namespace: ai-gateway
to:
- group: ""
kind: Service
---
# 允许访问 observability 命名空间的 Langfuse Service
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-gateway-to-langfuse
namespace: observability
spec:
from:
- group: gateway.networking.k8s.io
kind: HTTPRoute
namespace: ai-gateway
to:
- group: ""
kind: Service
2. HTTPRoute 配置
在单个 NLB 端点后基于路径路由多个服务。
2.1 vLLM 直接路由
不通过 Bifrost,kgateway 直接路由到 vLLM 的模式。仅使用单一模型时最简单。
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: vllm-route
namespace: ai-inference
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/
backendRefs:
- name: vllm-service
port: 8000
2.2 Bifrost 经由路由
需要多提供商集成、Cascade Routing、OTel 监控时经由 Bifrost。
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: bifrost-route
namespace: ai-gateway
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/
backendRefs:
- name: bifrost-service
namespace: ai-external
port: 8080
2.3 Langfuse Sub-path 路由(URLRewrite)
Langfuse(Next.js)在 / 提供服务,因此要以 /langfuse 前缀访问需要 URLRewrite。Langfuse 架构及部署详情参阅 Langfuse 部署指南。
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: langfuse-route
namespace: observability
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
# /langfuse → / 移除前缀
- matches:
- path:
type: PathPrefix
value: /langfuse/
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /
backendRefs:
- name: langfuse-web
port: 3000
# Next.js 静态资源
- matches:
- path:
type: PathPrefix
value: /_next
backendRefs:
- name: langfuse-web
port: 3000
# Langfuse auth API
- matches:
- path:
type: PathPrefix
value: /api/auth
backendRefs:
- name: langfuse-web
port: 3000
# Langfuse public API
- matches:
- path:
type: PathPrefix
value: /api/public
backendRefs:
- name: langfuse-web
port: 3000
# Favicon 等静态文件
- matches:
- path:
type: PathPrefix
value: /icon.svg
backendRefs:
- name: langfuse-web
port: 3000
2.4 OTel URLRewrite(Bifrost → Langfuse)
Bifrost OTel 插件仅使用 collector_url 的 base path,因此 kgateway 转换为完整 OTLP 路径。OTel 集成详情参阅 Langfuse OTel 配置。
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: langfuse-otel-route
namespace: observability
spec:
parentRefs:
- name: unified-gateway
namespace: ai-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /api/public/otel
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /api/public/otel/v1/traces
backendRefs:
- name: langfuse-web
port: 3000
2.5 路由端点结构总结
http://<NLB_ENDPOINT>/v1/* → vLLM 或 Bifrost(推理 API)
http://<NLB_ENDPOINT>/langfuse/* → Langfuse(Observability UI)
http://<NLB_ENDPOINT>/_next/* → Langfuse(静态资源)
http://<NLB_ENDPOINT>/api/public/* → Langfuse(API + OTel)
https://<AMG_ENDPOINT> → Grafana(独立托管)
基于 Gateway API CRD 的路由无需重启 Pod 即可实时反映。修改 HTTPRoute 或 Gateway 资源后,kgateway 控制器会自动检测并立即应用。
3. Bifrost Gateway Mode 配置
3.1 config.json 结构
Bifrost Gateway Mode 使用声明式 config.json 进行配置。这是经过实际验证的格式。
{
"$schema": "https://www.getbifrost.ai/schema",
"providers": {
"openai": {
"keys": [
{
"name": "local-vllm",
"value": "dummy",
"weight": 1.0,
"models": ["glm-5"]
}
],
"network_config": {
"base_url": "http://glm5-serving.agentic-serving.svc.cluster.local:8000"
}
}
},
"plugins": [
{
"enabled": true,
"name": "otel",
"config": {
"service_name": "bifrost",
"trace_type": "otel",
"protocol": "http",
"collector_url": "http://langfuse-web.langfuse.svc.cluster.local:3000/api/public/otel/v1/traces",
"headers": {
"Authorization": "Basic <BASE64(pk:sk)>",
"x-langfuse-ingestion-version": "4"
}
}
}
]
}
3.2 主要配置项
providers(Map 结构)
providers是 map(非数组)。key 是 Bifrost 内置 provider 名称(openai、anthropic等)keys是 数组,models限制可用模型- 请求时模型名使用
provider/model格式(例:openai/glm-5)
使用 "providers": [...](数组)编写时 UI 中不显示配置。必须使用 "providers": {...}(map)编写。
OTel 插件
trace_type必须使用"otel"(使用"genai_extension"时 trace 不会送达 Langfuse)collector_url是 Langfuse OTLP 完整路径:/api/public/otel/v1/traces- Authorization 头:
Basic <BASE64(public_key:secret_key)>格式
4. Bifrost K8s 部署模式(PVC + initContainer)
Bifrost 在 -app-dir 路径管理 config.json + SQLite。使用 PVC 和 initContainer 实现声明式部署。
4.1 PVC + ConfigMap + Deployment
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bifrost-data
namespace: ai-external
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: bifrost-gateway-config
namespace: ai-external
data:
config.json: |
{
"$schema": "https://www.getbifrost.ai/schema",
"providers": {
"openai": {
"keys": [{"name": "local-vllm", "value": "dummy", "weight": 1.0, "models": ["glm-5"]}],
"network_config": {"base_url": "http://vllm-service:8000"}
}
},
"plugins": [{
"enabled": true,
"name": "otel",
"config": {
"service_name": "bifrost",
"trace_type": "otel",
"protocol": "http",
"collector_url": "http://langfuse-web.langfuse.svc.cluster.local:3000/api/public/otel/v1/traces",
"headers": {
"Authorization": "Basic <BASE64(pk:sk)>",
"x-langfuse-ingestion-version": "4"
}
}
}]
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: bifrost
namespace: ai-external
spec:
replicas: 3
selector:
matchLabels:
app: bifrost
template:
metadata:
labels:
app: bifrost
spec:
securityContext:
fsGroup: 1000
initContainers:
- name: setup
image: busybox
command:
- sh
- -c
- |
cp /config/config.json /app/data/config.json
chown 1000:1000 /app/data/config.json
volumeMounts:
- name: bifrost-data
mountPath: /app/data
- name: gateway-config
mountPath: /config
containers:
- name: bifrost
image: bifrost/bifrost:v2.0.0
args: ["-app-dir", "/app/data"]
ports:
- containerPort: 8080
name: http
volumeMounts:
- name: bifrost-data
mountPath: /app/data
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
volumes:
- name: bifrost-data
persistentVolumeClaim:
claimName: bifrost-data
- name: gateway-config
configMap:
name: bifrost-gateway-config
---
apiVersion: v1
kind: Service
metadata:
name: bifrost-service
namespace: ai-external
spec:
selector:
app: bifrost
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
Bifrost 容器以 UID 1000 运行。不设置 securityContext.fsGroup: 1000 将导致 PVC 写入权限错误。
5. Bifrost provider/model 格式及 IDE 兼容性
Bifrost 使用 provider/model 形式的模型名。
5.1 正确的模型名格式
openai/gpt-4o (提供商/模型)
anthropic/claude-sonnet-4
openai/glm-5 (自有 vLLM 也使用 openai provider)
gpt-4o (缺少提供商 — 错误)
openai-gpt-4o (连字符代替斜杠 — 错误)
5.2 IDE/编码工具兼容性
| 工具 | model 字段传递 | Bifrost 兼容 | 配置方法 |
|---|---|---|---|
| Cline | 原样传递 | ✅ | Model ID:openai/glm-5 |
| Continue.dev | 原样传递 | ✅ | model:openai/glm-5 |
| Aider | 移除 LiteLLM 前缀 | ⚠️ 需 double-prefix | openai/openai/glm-5 |
| Cursor | 自有验证拒绝 | ❌ | 拒绝包含 / 的模型名 |
5.3 Aider 连接示例
# double-prefix 技巧:LiteLLM 移除第一个 openai/ → 传递 openai/glm-5 到 Bifrost
aider --model openai/openai/glm-5 \
--openai-api-base http://<NLB_ENDPOINT>/v1 \
--openai-api-key dummy \
--no-auto-commits
5.4 Continue.dev 配置示例
{
"models": [
{
"title": "GLM-5 (Bifrost)",
"provider": "openai",
"model": "openai/glm-5",
"apiBase": "http://<NLB_ENDPOINT>/v1",
"apiKey": "dummy"
}
]
}
5.5 Cline 配置示例
Settings -> API Provider -> OpenAI Compatible
- Base URL:
http://<NLB_ENDPOINT>/v1 - Model:
openai/glm-5 - API Key:
dummy
5.6 Python 客户端示例
from openai import OpenAI
client = OpenAI(
base_url="http://<NLB_ENDPOINT>/v1",
api_key="dummy"
)
response = client.chat.completions.create(
model="openai/glm-5", # provider/model 格式必需
messages=[{"role": "user", "content": "Hello"}]
)
生产环境中应将 NLB 端点映射到域名(例:api.your-company.com)使用。不要直接暴露 IP 地址或 AWS 自动生成的 DNS 名称。
6. SQLite 初始化流程(config.json 变更时)
Bifrost 启动时读取一次 config.json 并保存到 SQLite。之后使用 SQLite,因此 config.json 变更时需重新生成 SQLite。
变更流程
# 1. 更新 ConfigMap
kubectl apply -f bifrost-gateway-config.yaml
# 2. 删除 Pod(PVC 数据的 config.db 自动初始化)
kubectl delete pod -l app=bifrost -n ai-external
# 3. initContainer 复制新 config.json → Bifrost 重新生成 SQLite
kubectl get pods -n ai-external -l app=bifrost -w
kgateway CRD 变更时自动反映(无需重启 Pod),但 Bifrost ConfigMap 变更时需重启 Pod。运营时必须熟知此区别。
验证
部署完成后使用以下命令验证配置。
# 1. 确认 Gateway 状态
kubectl get gateway -n ai-gateway
# 2. 确认 HTTPRoute 状态
kubectl get httproute -A
# 3. 确认 NLB 端点
export NLB_ENDPOINT=$(kubectl get gateway unified-gateway -n ai-gateway \
-o jsonpath='{.status.addresses[0].value}')
echo "NLB Endpoint: ${NLB_ENDPOINT}"
# 4. vLLM 直接访问测试(使用 vllm-route 时)
curl -s http://${NLB_ENDPOINT}/v1/models | jq .
# 5. Bifrost 经由测试(使用 bifrost-route 时)
curl -s http://${NLB_ENDPOINT}/v1/models | jq .
# 6. Langfuse 访问测试
curl -s -o /dev/null -w "%{http_code}" http://${NLB_ENDPOINT}/langfuse/
# 预期:200