Skip to main content

OpenClaw AI Agent Gateway Deployment & Full Observability

Overview

OpenClaw (226k+ GitHub stars) is a general-purpose AI agent framework that provides autonomous agent workflows leveraging various LLMs. AWS offers the aws-samples/sample-OpenClaw-on-AWS-with-Bedrock sample as an EC2 + CloudFormation quick-start guide — a simple setup connecting a single Bedrock model on a single instance. While sufficient for prototyping or personal use, enterprise environments require a different approach.

This document covers deploying OpenClaw on an existing EKS cluster, combining multi-model routing with 3-layer observability to build a production-ready architecture.

EC2 Standalone DeploymentEKS-Based Deployment (This Document)
InfrastructureEC2 + CloudFormation, instance-level managementAdd Pods to existing EKS cluster, Karpenter auto-scaling
LLM IntegrationBedrock single modelBifrost Auto-Router → content-based Bedrock multi-model (Claude/GLM/Solar)
ObservabilityCloudWatch basic metrics3-Layer: Network (Hubble) + LLM (Langfuse) + System (OTEL/Prometheus)
Cost ControlInstance size adjustmentGraviton4 ARM + Spot + semantic caching + budget control
ScalabilityManual scalingHPA/Karpenter auto-scaling, native Spot instance interruption handling
DocumentContentRelationship
Inference GatewayKgateway-based routingTheoretical foundation
Agent MonitoringLangfuse/LangSmith monitoringMonitoring theory
17. OpenClaw AI Gateway (this document)OpenClaw production deployment + full o11yHands-on implementation

Architecture Design

This architecture is based on 6 key design decisions to simultaneously achieve cost efficiency, observability, and operational automation required in enterprise environments.

Key Design Decisions Summary

Each layer of this architecture is based on the following design decisions:

Decision AreaChoiceAlternativeKey Rationale
Hosting PlatformEKSEC2 standalone / AgentCoreKarpenter auto-scaling, o11y stack flexibility, Spot/Graviton combination possible. AgentCore is in Experimental stage with no cron support and limited o11y customization
LLM GatewayBifrost ProxyLiteLLM / llm-dOptimized for Bedrock multi-model architecture. Rust-based 50x faster performance, 100+ providers, budget control, one-line success_callback: ["langfuse"] integration. Hybrid Bifrost → llm-d → vLLM possible when adding self-hosted vLLM. LiteLLM is a viable alternative
LLM ObservabilityLangfuse (self-hosted)Tempo / LokiLLM-native: token usage, cost, tool call chains, prompt/completion content tracking. Tempo/Loki are general-purpose infra o11y tools that cannot track at the prompt level
Network ObservabilityCilium Hubble (ENI mode)CW Network Flow MonitorL3/L4/L7 visibility (HTTP paths, status codes, DNS), interactive service map, $0. CW NFM supports L3/L4 only at $20-45/month
IAM AuthenticationEKS Pod IdentityIRSANo OIDC provider required, single command aws eks create-pod-identity-association mapping
ComputeGraviton4 M8g + Spotx86 On-DemandARM64 20-40% cheaper, additional Spot savings, Karpenter native interruption handling

Technology Stack

Compute

OpenClaw (TypeScript/Node.js) and Bifrost (Rust) are fully ARM64 compatible and use multi-arch images.

ItemConfigurationDetails
InstanceGraviton4 M8gGA, 20-40% cost reduction vs x86, 60% better energy efficiency
Future TransitionGraviton5 M9g25% performance improvement over M8g, expected GA in 2026. Same cost with additional performance after GA transition
Purchase OptionSpot Instance preferred60-90% savings vs On-Demand. Karpenter v1.2+ native interruption handling + NMA node health monitoring

Stable Spot Instance Operations

OpenClaw gateway is a stateless workload, making it well-suited for Spot Instances. Three layers are combined for stable operations:

LayerToolRole
Spot Interruption HandlingKarpenter v1.2+2-minute warning detection → replacement node provisioning → Pod rescheduling
Node Health MonitoringNMA (EKS Add-on)Kernel/containerd/disk/network anomaly detection → Node Condition update
Auto RecoveryNode Auto RepairAutomatic replacement of unhealthy nodes reported by NMA

Karpenter NodePool, EC2NodeClass, and NMA configurations are detailed in 5.1 Infrastructure.

Pod Configuration — graceful shutdown + AZ distribution:

spec:
terminationGracePeriodSeconds: 120 # Wait for in-flight LLM responses to complete
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: openclaw-gateway

terminationGracePeriodSeconds: 120 ensures in-flight LLM responses have time to complete during Spot interruptions, while topologySpreadConstraints distributes Pods across AZs to guard against single-AZ failures.

LLM Models — Content-Based Routing

Query TypeModelProviderRationale
General (default)Claude Sonnet 4.6Bedrock1M context, best agent performance
Coding / ProgrammingGLM-4.7Bedrock102B MoE, optimized for code generation
Korean / Korea-relatedSolar Pro 3Bedrock128K context, Korean-optimized, MoE 12B active

Networking

ComponentTechnology
CNICilium (ENI mode)
Service MapHubble UI + Grafana
Bedrock ConnectionVPC Endpoint (com.amazonaws.<region>.bedrock-runtime)

Observability (3-Layer)

LayerToolTracks
NetworkCilium HubbleL7 HTTP flows, DNS, service map
LLMLangfusePrompts/completions, token cost, tool call chains
SystemOTEL → Prometheus/GrafanaCPU, memory, Pod health, custom metrics

Cost Analysis

ItemEstimated Monthly Cost
Gateway + Bifrost (ARM)~$30
Bedrock API (Claude/GLM/Solar)~$15-40
Langfuse (self-hosted)~$10
Redis (cache)~$5
Cilium + Hubble$0
Prometheus/Grafana$0 (existing stack)
Total$60-85/month

Cost Optimization Strategies

StrategyDetailSavings
Graviton4 ARMvs x8620-40%
Spot Instancevs On-Demand, Karpenter native interruption handling60-90% (compute)
Auto-RouterCost/quality optimization via specialized model routingOptimized per model
Semantic CachingRedis-based, caching identical/similar requests~90% for repeated requests
Budget ControlMonthly budget per virtual key, rate limitingOverspend prevention
VPC EndpointEliminates Bedrock NAT Gateway costsData transfer cost reduction

Deployment Guide

5.1 Infrastructure

EKS Cluster Prerequisites

RequirementDetails
EKS Version1.30+
Karpenterv1.0+ (includes native Spot interruption handling)
EKS Add-onsPod Identity Agent, EKS Node Monitoring Agent
VPCBedrock VPC Endpoint (com.amazonaws.<region>.bedrock-runtime)
CNICilium ENI mode (if Hubble L7 visibility is needed) or VPC CNI

EKS Auto Mode is recommended for new clusters. For existing clusters, just ensure the above requirements are met.

Karpenter — Cost-Optimized Node Configuration

Graviton4 M8g Spot-first configuration to minimize compute costs.

# EC2NodeClass — Graviton4 ARM64 node configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: openclaw-arm64
spec:
role: "KarpenterNodeRole-${CLUSTER_NAME}"
amiSelectorTerms:
- alias: al2023@latest # Amazon Linux 2023, automatically selects ARM64
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 30Gi
volumeType: gp3
deleteOnTermination: true
# NodePool — All Graviton gen 4+, Spot first, On-Demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: openclaw-gateway
spec:
template:
metadata:
labels:
workload-type: ai-gateway
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Spot preferred when available
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["3"] # Gen 4+ → includes m8g, c8g, r8g, m9g, etc.
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium", "large"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: openclaw-arm64
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
budgets:
- nodes: "1" # Max 1 node evicted at a time → ensures availability
limits:
cpu: "8" # Max 8 vCPU — cost ceiling
memory: 16Gi
Automatic Graviton Generation Upgrade

By specifying instance-generation: Gt "3" + arch: arm64 instead of a specific instance-family, all families of Graviton gen 4+ (m8g, c8g, r8g, m9g, etc.) are included as candidates. When new Graviton generations become GA, Karpenter will automatically select the optimal instance based on price/performance without any NodePool changes.

Node Monitoring Agent — Node Health Monitoring

Karpenter handles Spot interruption events, but it does not detect system-level node issues such as kernel problems, containerd failures, disk/network anomalies. Enabling the EKS Node Monitoring Agent (NMA) as an EKS Add-on covers this gap.

Detection AreaKarpenterNMA
2-min Spot interruption warningDetects + provisions replacement node-
Kernel/containerd failures-Detects → updates Node Condition
Disk/network anomalies-Detects → creates Kubernetes Event
Node Auto Repair integration-Triggers automatic replacement of unhealthy nodes
# Enable NMA EKS Add-on
aws eks create-addon \
--cluster-name ${CLUSTER_NAME} \
--addon-name eks-node-monitoring-agent

By combining Karpenter (Spot interruption handling) + NMA (node health monitoring) + Node Auto Repair (automatic replacement), high availability can be maintained even in Spot instance environments.

IAM — EKS Pod Identity

# After enabling Pod Identity Agent add-on:
aws eks create-pod-identity-association \
--cluster-name ${CLUSTER_NAME} \
--namespace openclaw \
--service-account openclaw-sa \
--role-arn arn:aws:iam::${ACCOUNT_ID}:role/openclaw-bedrock-role

Required IAM Role permissions:

  • bedrock:InvokeModel
  • bedrock:InvokeModelWithResponseStream

OpenClaw and Bifrost share the same ServiceAccount.

5.2 Bifrost AI Gateway

Config — Multi-Model + Auto-Router

model_list:
- model_name: claude-sonnet
litellm_params:
model: bedrock/anthropic.claude-sonnet-4-6

- model_name: glm-4.7
litellm_params:
model: bedrock/zhipu.glm-4.7

- model_name: solar-pro-3
litellm_params:
model: bedrock/upstage.solar-pro-3

router_settings:
routing_strategy: "content-based"
auto_router:
encoder_type: openai
encoder_name: text-embedding-3-small
routes:
- name: korean-queries
model: solar-pro-3
utterances:
- "Answer in Korean"
- "Korea-related question"
- "Korean language query"
- "Korean news"
- "Korean translation"
description: "Korean language or Korea-related queries"
score_threshold: 0.5
- name: coding-queries
model: glm-4.7
utterances:
- "write code"
- "debug this function"
- "Write code for me"
- "programming"
- "fix this bug"
description: "Coding, programming, and debugging queries"
score_threshold: 0.5
default_route: claude-sonnet

litellm_settings:
cache: true
cache_params:
type: redis
host: redis
port: 6379
success_callback: ["langfuse"]
failure_callback: ["langfuse"]

general_settings:
master_key: os.environ/LITELLM_MASTER_KEY

Create Secrets

kubectl create secret generic bifrost-secrets \
--from-literal=BIFROST_MASTER_KEY=<your-master-key>

Since all models are invoked through Bedrock, no separate API keys are needed. IAM authentication is automatically handled via EKS Pod Identity.

  • Redis sidecar: Auto-Router embedding cache + semantic cache
  • Service: ClusterIP (port 4000, OpenAI-compatible API)
Semantic Cache Design Principles

The above configuration is a practical example using LiteLLM cache: true + Redis. For overall design principles including similarity threshold selection, cache key design (multi-tenant namespace), PII-safe handling, and observability metrics, refer to Semantic Caching Strategy.

5.3 OpenClaw Gateway

Deployment

  • Image: ghcr.io/openclaw/openclaw:latest
  • Resources: 512Mi memory, 250m CPU
  • NodeSelector: kubernetes.io/arch: arm64
  • Service: ClusterIP (port 18789)

Config (openclaw.json)

{
"ai": {
"provider": "openai",
"baseUrl": "http://bifrost-proxy:4000",
"model": "claude-sonnet"
},
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://otel-collector:4317",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true
}
}
}

5.4 Cilium CNI (ENI Mode) + Hubble

Cilium ENI Mode — Full VPC CNI replacement, single eBPF datapath

  • Pod IPs assigned directly from ENI
  • Full NetworkPolicy support
  • Optimal performance with a single datapath

Hubble UI capabilities:

  • Interactive service map: Visualize HTTP request flows between Pods
  • L7 HTTP flows: View POST /v1/chat/completions → 200 OK (320ms)
  • DNS query tracking: Real-time visibility into which external APIs are called
  • Hubble Grafana dashboard: Prometheus metrics integration
  • Cost: $0

5.5 LLM Observability (Langfuse)

  • Langfuse self-hosted Helm chart (PostgreSQL + Langfuse server)
  • Automatic integration via Bifrost success_callback: ["langfuse"]

Tracked items:

  • Prompt/completion content
  • Token usage
  • Cost per model
  • Tool call chains
  • Latency

5.6 System Observability (OTEL + Prometheus/Grafana)

  • OTEL Collector: Receivers OTLP (gRPC :4317), Exporters Prometheus
  • Reuse existing kube-prometheus-stack if available; otherwise deploy via Helm
  • OpenClaw metrics: openclaw.tokens, openclaw.cost.usd, openclaw.run.duration_ms, openclaw.message.*

Dashboards & Alerts

Langfuse (LLM Level)

  • Agent Trace Explorer: Message → LLM call → tool execution → response chain
  • Token Usage: Token consumption by model and time
  • Cost Analytics: Daily/weekly cost trends
  • Prompt/Completion Inspector: View actual inputs and outputs

Hubble (Network Level)

  • Interactive Service Map: HTTP request flows between Pods
  • L7 Visibility: POST /v1/chat/completions → 200 OK (320ms)
  • DNS Query Tracking: Which external APIs are being called

Grafana (System Level)

  • Gateway Health: Uptime, connections, memory, CPU
  • Bifrost: Cache hit rate, request throughput, latency
  • Pod/Node Resource Usage

Alert Rules

AlertCondition
Budget Threshold ApproachingBifrost budget > 80%
Gateway DownPod restart > 3 in 5min
LLM Response DelayLatency > 5s
Cache Hit Rate DropCache hit < 30%
Error RateError rate > 5%

Verification Checklist

#CheckCommand / Action
1All Pods Runningkubectl get pods
2Gateway Statusopenclaw status (after port-forward)
3Auto-Router RoutingVerify model list + routing in Bifrost UI
4Service MapVerify HTTP flows between Pods in Hubble UI
5LLM TraceVerify prompt → tool → response trace in Langfuse UI
6System MetricsCheck Grafana dashboards
7Bedrock AuditCheck CloudTrail logs
8Routing ValidationTest Korean / coding / general queries individually
9Spot StabilityVerify Karpenter node transition events with kubectl get events, confirm replacement node provisioning with kubectl get nodeclaims

Next Steps
  • Adding self-hosted vLLM: See llm-d Distributed Inference for Bifrost → llm-d → vLLM hybrid setup
  • LiteLLM alternative: If Bifrost doesn't meet your requirements, LiteLLM can be used as a drop-in replacement (Python-based, same OpenAI-compatible API)
  • Adding vector search RAG: See Milvus Vector DB
  • Agent evaluation: Measure response quality with Ragas Evaluation