OpenClaw AI Agent Gateway Deployment & Full Observability
Overview
OpenClaw (226k+ GitHub stars) is a general-purpose AI agent framework that provides autonomous agent workflows leveraging various LLMs. AWS offers the aws-samples/sample-OpenClaw-on-AWS-with-Bedrock sample as an EC2 + CloudFormation quick-start guide — a simple setup connecting a single Bedrock model on a single instance. While sufficient for prototyping or personal use, enterprise environments require a different approach.
This document covers deploying OpenClaw on an existing EKS cluster, combining multi-model routing with 3-layer observability to build a production-ready architecture.
| EC2 Standalone Deployment | EKS-Based Deployment (This Document) | |
|---|---|---|
| Infrastructure | EC2 + CloudFormation, instance-level management | Add Pods to existing EKS cluster, Karpenter auto-scaling |
| LLM Integration | Bedrock single model | Bifrost Auto-Router → content-based Bedrock multi-model (Claude/GLM/Solar) |
| Observability | CloudWatch basic metrics | 3-Layer: Network (Hubble) + LLM (Langfuse) + System (OTEL/Prometheus) |
| Cost Control | Instance size adjustment | Graviton4 ARM + Spot + semantic caching + budget control |
| Scalability | Manual scaling | HPA/Karpenter auto-scaling, native Spot instance interruption handling |
Related Documents
| Document | Content | Relationship |
|---|---|---|
| Inference Gateway | Kgateway-based routing | Theoretical foundation |
| Agent Monitoring | Langfuse/LangSmith monitoring | Monitoring theory |
| 17. OpenClaw AI Gateway (this document) | OpenClaw production deployment + full o11y | Hands-on implementation |
Architecture Design
This architecture is based on 6 key design decisions to simultaneously achieve cost efficiency, observability, and operational automation required in enterprise environments.
Key Design Decisions Summary
Each layer of this architecture is based on the following design decisions:
| Decision Area | Choice | Alternative | Key Rationale |
|---|---|---|---|
| Hosting Platform | EKS | EC2 standalone / AgentCore | Karpenter auto-scaling, o11y stack flexibility, Spot/Graviton combination possible. AgentCore is in Experimental stage with no cron support and limited o11y customization |
| LLM Gateway | Bifrost Proxy | LiteLLM / llm-d | Optimized for Bedrock multi-model architecture. Rust-based 50x faster performance, 100+ providers, budget control, one-line success_callback: ["langfuse"] integration. Hybrid Bifrost → llm-d → vLLM possible when adding self-hosted vLLM. LiteLLM is a viable alternative |
| LLM Observability | Langfuse (self-hosted) | Tempo / Loki | LLM-native: token usage, cost, tool call chains, prompt/completion content tracking. Tempo/Loki are general-purpose infra o11y tools that cannot track at the prompt level |
| Network Observability | Cilium Hubble (ENI mode) | CW Network Flow Monitor | L3/L4/L7 visibility (HTTP paths, status codes, DNS), interactive service map, $0. CW NFM supports L3/L4 only at $20-45/month |
| IAM Authentication | EKS Pod Identity | IRSA | No OIDC provider required, single command aws eks create-pod-identity-association mapping |
| Compute | Graviton4 M8g + Spot | x86 On-Demand | ARM64 20-40% cheaper, additional Spot savings, Karpenter native interruption handling |
Technology Stack
Compute
OpenClaw (TypeScript/Node.js) and Bifrost (Rust) are fully ARM64 compatible and use multi-arch images.
| Item | Configuration | Details |
|---|---|---|
| Instance | Graviton4 M8g | GA, 20-40% cost reduction vs x86, 60% better energy efficiency |
| Future Transition | Graviton5 M9g | 25% performance improvement over M8g, expected GA in 2026. Same cost with additional performance after GA transition |
| Purchase Option | Spot Instance preferred | 60-90% savings vs On-Demand. Karpenter v1.2+ native interruption handling + NMA node health monitoring |
Stable Spot Instance Operations
OpenClaw gateway is a stateless workload, making it well-suited for Spot Instances. Three layers are combined for stable operations:
| Layer | Tool | Role |
|---|---|---|
| Spot Interruption Handling | Karpenter v1.2+ | 2-minute warning detection → replacement node provisioning → Pod rescheduling |
| Node Health Monitoring | NMA (EKS Add-on) | Kernel/containerd/disk/network anomaly detection → Node Condition update |
| Auto Recovery | Node Auto Repair | Automatic replacement of unhealthy nodes reported by NMA |
Karpenter NodePool, EC2NodeClass, and NMA configurations are detailed in 5.1 Infrastructure.
Pod Configuration — graceful shutdown + AZ distribution:
spec:
terminationGracePeriodSeconds: 120 # Wait for in-flight LLM responses to complete
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: openclaw-gateway
terminationGracePeriodSeconds: 120 ensures in-flight LLM responses have time to complete during Spot interruptions, while topologySpreadConstraints distributes Pods across AZs to guard against single-AZ failures.
LLM Models — Content-Based Routing
| Query Type | Model | Provider | Rationale |
|---|---|---|---|
| General (default) | Claude Sonnet 4.6 | Bedrock | 1M context, best agent performance |
| Coding / Programming | GLM-4.7 | Bedrock | 102B MoE, optimized for code generation |
| Korean / Korea-related | Solar Pro 3 | Bedrock | 128K context, Korean-optimized, MoE 12B active |
Networking
| Component | Technology |
|---|---|
| CNI | Cilium (ENI mode) |
| Service Map | Hubble UI + Grafana |
| Bedrock Connection | VPC Endpoint (com.amazonaws.<region>.bedrock-runtime) |
Observability (3-Layer)
| Layer | Tool | Tracks |
|---|---|---|
| Network | Cilium Hubble | L7 HTTP flows, DNS, service map |
| LLM | Langfuse | Prompts/completions, token cost, tool call chains |
| System | OTEL → Prometheus/Grafana | CPU, memory, Pod health, custom metrics |
Cost Analysis
| Item | Estimated Monthly Cost |
|---|---|
| Gateway + Bifrost (ARM) | ~$30 |
| Bedrock API (Claude/GLM/Solar) | ~$15-40 |
| Langfuse (self-hosted) | ~$10 |
| Redis (cache) | ~$5 |
| Cilium + Hubble | $0 |
| Prometheus/Grafana | $0 (existing stack) |
| Total | $60-85/month |
Cost Optimization Strategies
| Strategy | Detail | Savings |
|---|---|---|
| Graviton4 ARM | vs x86 | 20-40% |
| Spot Instance | vs On-Demand, Karpenter native interruption handling | 60-90% (compute) |
| Auto-Router | Cost/quality optimization via specialized model routing | Optimized per model |
| Semantic Caching | Redis-based, caching identical/similar requests | ~90% for repeated requests |
| Budget Control | Monthly budget per virtual key, rate limiting | Overspend prevention |
| VPC Endpoint | Eliminates Bedrock NAT Gateway costs | Data transfer cost reduction |
Deployment Guide
5.1 Infrastructure
EKS Cluster Prerequisites
| Requirement | Details |
|---|---|
| EKS Version | 1.30+ |
| Karpenter | v1.0+ (includes native Spot interruption handling) |
| EKS Add-ons | Pod Identity Agent, EKS Node Monitoring Agent |
| VPC | Bedrock VPC Endpoint (com.amazonaws.<region>.bedrock-runtime) |
| CNI | Cilium ENI mode (if Hubble L7 visibility is needed) or VPC CNI |
EKS Auto Mode is recommended for new clusters. For existing clusters, just ensure the above requirements are met.
Karpenter — Cost-Optimized Node Configuration
Graviton4 M8g Spot-first configuration to minimize compute costs.
# EC2NodeClass — Graviton4 ARM64 node configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: openclaw-arm64
spec:
role: "KarpenterNodeRole-${CLUSTER_NAME}"
amiSelectorTerms:
- alias: al2023@latest # Amazon Linux 2023, automatically selects ARM64
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 30Gi
volumeType: gp3
deleteOnTermination: true
# NodePool — All Graviton gen 4+, Spot first, On-Demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: openclaw-gateway
spec:
template:
metadata:
labels:
workload-type: ai-gateway
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Spot preferred when available
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["3"] # Gen 4+ → includes m8g, c8g, r8g, m9g, etc.
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium", "large"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: openclaw-arm64
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
budgets:
- nodes: "1" # Max 1 node evicted at a time → ensures availability
limits:
cpu: "8" # Max 8 vCPU — cost ceiling
memory: 16Gi
By specifying instance-generation: Gt "3" + arch: arm64 instead of a specific instance-family, all families of Graviton gen 4+ (m8g, c8g, r8g, m9g, etc.) are included as candidates. When new Graviton generations become GA, Karpenter will automatically select the optimal instance based on price/performance without any NodePool changes.
Node Monitoring Agent — Node Health Monitoring
Karpenter handles Spot interruption events, but it does not detect system-level node issues such as kernel problems, containerd failures, disk/network anomalies. Enabling the EKS Node Monitoring Agent (NMA) as an EKS Add-on covers this gap.
| Detection Area | Karpenter | NMA |
|---|---|---|
| 2-min Spot interruption warning | Detects + provisions replacement node | - |
| Kernel/containerd failures | - | Detects → updates Node Condition |
| Disk/network anomalies | - | Detects → creates Kubernetes Event |
| Node Auto Repair integration | - | Triggers automatic replacement of unhealthy nodes |
# Enable NMA EKS Add-on
aws eks create-addon \
--cluster-name ${CLUSTER_NAME} \
--addon-name eks-node-monitoring-agent
By combining Karpenter (Spot interruption handling) + NMA (node health monitoring) + Node Auto Repair (automatic replacement), high availability can be maintained even in Spot instance environments.
IAM — EKS Pod Identity
# After enabling Pod Identity Agent add-on:
aws eks create-pod-identity-association \
--cluster-name ${CLUSTER_NAME} \
--namespace openclaw \
--service-account openclaw-sa \
--role-arn arn:aws:iam::${ACCOUNT_ID}:role/openclaw-bedrock-role
Required IAM Role permissions:
bedrock:InvokeModelbedrock:InvokeModelWithResponseStream
OpenClaw and Bifrost share the same ServiceAccount.
5.2 Bifrost AI Gateway
Config — Multi-Model + Auto-Router
model_list:
- model_name: claude-sonnet
litellm_params:
model: bedrock/anthropic.claude-sonnet-4-6
- model_name: glm-4.7
litellm_params:
model: bedrock/zhipu.glm-4.7
- model_name: solar-pro-3
litellm_params:
model: bedrock/upstage.solar-pro-3
router_settings:
routing_strategy: "content-based"
auto_router:
encoder_type: openai
encoder_name: text-embedding-3-small
routes:
- name: korean-queries
model: solar-pro-3
utterances:
- "Answer in Korean"
- "Korea-related question"
- "Korean language query"
- "Korean news"
- "Korean translation"
description: "Korean language or Korea-related queries"
score_threshold: 0.5
- name: coding-queries
model: glm-4.7
utterances:
- "write code"
- "debug this function"
- "Write code for me"
- "programming"
- "fix this bug"
description: "Coding, programming, and debugging queries"
score_threshold: 0.5
default_route: claude-sonnet
litellm_settings:
cache: true
cache_params:
type: redis
host: redis
port: 6379
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
Create Secrets
kubectl create secret generic bifrost-secrets \
--from-literal=BIFROST_MASTER_KEY=<your-master-key>
Since all models are invoked through Bedrock, no separate API keys are needed. IAM authentication is automatically handled via EKS Pod Identity.
- Redis sidecar: Auto-Router embedding cache + semantic cache
- Service: ClusterIP (port 4000, OpenAI-compatible API)
The above configuration is a practical example using LiteLLM cache: true + Redis. For overall design principles including similarity threshold selection, cache key design (multi-tenant namespace), PII-safe handling, and observability metrics, refer to Semantic Caching Strategy.
5.3 OpenClaw Gateway
Deployment
- Image:
ghcr.io/openclaw/openclaw:latest - Resources: 512Mi memory, 250m CPU
- NodeSelector:
kubernetes.io/arch: arm64 - Service: ClusterIP (port 18789)
Config (openclaw.json)
{
"ai": {
"provider": "openai",
"baseUrl": "http://bifrost-proxy:4000",
"model": "claude-sonnet"
},
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://otel-collector:4317",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}
5.4 Cilium CNI (ENI Mode) + Hubble
Cilium ENI Mode — Full VPC CNI replacement, single eBPF datapath
- Pod IPs assigned directly from ENI
- Full NetworkPolicy support
- Optimal performance with a single datapath
Hubble UI capabilities:
- Interactive service map: Visualize HTTP request flows between Pods
- L7 HTTP flows: View
POST /v1/chat/completions → 200 OK (320ms) - DNS query tracking: Real-time visibility into which external APIs are called
- Hubble Grafana dashboard: Prometheus metrics integration
- Cost: $0
5.5 LLM Observability (Langfuse)
- Langfuse self-hosted Helm chart (PostgreSQL + Langfuse server)
- Automatic integration via Bifrost
success_callback: ["langfuse"]
Tracked items:
- Prompt/completion content
- Token usage
- Cost per model
- Tool call chains
- Latency
5.6 System Observability (OTEL + Prometheus/Grafana)
- OTEL Collector: Receivers OTLP (gRPC :4317), Exporters Prometheus
- Reuse existing kube-prometheus-stack if available; otherwise deploy via Helm
- OpenClaw metrics:
openclaw.tokens,openclaw.cost.usd,openclaw.run.duration_ms,openclaw.message.*
Dashboards & Alerts
Langfuse (LLM Level)
- Agent Trace Explorer: Message → LLM call → tool execution → response chain
- Token Usage: Token consumption by model and time
- Cost Analytics: Daily/weekly cost trends
- Prompt/Completion Inspector: View actual inputs and outputs
Hubble (Network Level)
- Interactive Service Map: HTTP request flows between Pods
- L7 Visibility:
POST /v1/chat/completions → 200 OK (320ms) - DNS Query Tracking: Which external APIs are being called
Grafana (System Level)
- Gateway Health: Uptime, connections, memory, CPU
- Bifrost: Cache hit rate, request throughput, latency
- Pod/Node Resource Usage
Alert Rules
| Alert | Condition |
|---|---|
| Budget Threshold Approaching | Bifrost budget > 80% |
| Gateway Down | Pod restart > 3 in 5min |
| LLM Response Delay | Latency > 5s |
| Cache Hit Rate Drop | Cache hit < 30% |
| Error Rate | Error rate > 5% |
Verification Checklist
| # | Check | Command / Action |
|---|---|---|
| 1 | All Pods Running | kubectl get pods |
| 2 | Gateway Status | openclaw status (after port-forward) |
| 3 | Auto-Router Routing | Verify model list + routing in Bifrost UI |
| 4 | Service Map | Verify HTTP flows between Pods in Hubble UI |
| 5 | LLM Trace | Verify prompt → tool → response trace in Langfuse UI |
| 6 | System Metrics | Check Grafana dashboards |
| 7 | Bedrock Audit | Check CloudTrail logs |
| 8 | Routing Validation | Test Korean / coding / general queries individually |
| 9 | Spot Stability | Verify Karpenter node transition events with kubectl get events, confirm replacement node provisioning with kubectl get nodeclaims |
- Adding self-hosted vLLM: See llm-d Distributed Inference for
Bifrost → llm-d → vLLMhybrid setup - LiteLLM alternative: If Bifrost doesn't meet your requirements, LiteLLM can be used as a drop-in replacement (Python-based, same OpenAI-compatible API)
- Adding vector search RAG: See Milvus Vector DB
- Agent evaluation: Measure response quality with Ragas Evaluation