OpenClaw AI Agent Gateway Deployment & Full Observability

Overview

OpenClaw (226k+ GitHub stars) is a general-purpose AI agent framework that provides autonomous agent workflows leveraging various LLMs. AWS offers the aws-samples/sample-OpenClaw-on-AWS-with-Bedrock sample as an EC2 + CloudFormation quick-start guide — a simple setup connecting a single Bedrock model on a single instance. While sufficient for prototyping or personal use, enterprise environments require a different approach.

This document covers deploying OpenClaw on an existing EKS cluster, combining multi-model routing with 3-layer observability to build a production-ready architecture.

	EC2 Standalone Deployment	EKS-Based Deployment (This Document)
Infrastructure	EC2 + CloudFormation, instance-level management	Add Pods to existing EKS cluster, Karpenter auto-scaling
LLM Integration	Bedrock single model	Bifrost Auto-Router → content-based Bedrock multi-model (Claude/GLM/Solar)
Observability	CloudWatch basic metrics	3-Layer: Network (Hubble) + LLM (Langfuse) + System (OTEL/Prometheus)
Cost Control	Instance size adjustment	Graviton4 ARM + Spot + semantic caching + budget control
Scalability	Manual scaling	HPA/Karpenter auto-scaling, native Spot instance interruption handling

Document	Content	Relationship
Inference Gateway	Kgateway-based routing	Theoretical foundation
Agent Monitoring	Langfuse/LangSmith monitoring	Monitoring theory
17. OpenClaw AI Gateway (this document)	OpenClaw production deployment + full o11y	Hands-on implementation

Architecture Design

This architecture is based on 6 key design decisions to simultaneously achieve cost efficiency, observability, and operational automation required in enterprise environments.

Key Design Decisions Summary

Each layer of this architecture is based on the following design decisions:

Decision Area	Choice	Alternative	Key Rationale
Hosting Platform	EKS	EC2 standalone / AgentCore	Karpenter auto-scaling, o11y stack flexibility, Spot/Graviton combination possible. AgentCore is in Experimental stage with no cron support and limited o11y customization
LLM Gateway	Bifrost Proxy	LiteLLM / llm-d	Optimized for Bedrock multi-model architecture. Rust-based 50x faster performance, 100+ providers, budget control, one-line `success_callback: ["langfuse"]` integration. Hybrid `Bifrost → llm-d → vLLM` possible when adding self-hosted vLLM. LiteLLM is a viable alternative
LLM Observability	Langfuse (self-hosted)	Tempo / Loki	LLM-native: token usage, cost, tool call chains, prompt/completion content tracking. Tempo/Loki are general-purpose infra o11y tools that cannot track at the prompt level
Network Observability	Cilium Hubble (ENI mode)	CW Network Flow Monitor	L3/L4/L7 visibility (HTTP paths, status codes, DNS), interactive service map, $0. CW NFM supports L3/L4 only at $20-45/month
IAM Authentication	EKS Pod Identity	IRSA	No OIDC provider required, single command `aws eks create-pod-identity-association` mapping
Compute	Graviton4 M8g + Spot	x86 On-Demand	ARM64 20-40% cheaper, additional Spot savings, Karpenter native interruption handling

Technology Stack

Compute

OpenClaw (TypeScript/Node.js) and Bifrost (Rust) are fully ARM64 compatible and use multi-arch images.

Item	Configuration	Details
Instance	Graviton4 M8g	GA, 20-40% cost reduction vs x86, 60% better energy efficiency
Future Transition	Graviton5 M9g	25% performance improvement over M8g, expected GA in 2026. Same cost with additional performance after GA transition
Purchase Option	Spot Instance preferred	60-90% savings vs On-Demand. Karpenter v1.2+ native interruption handling + NMA node health monitoring

Stable Spot Instance Operations

OpenClaw gateway is a stateless workload, making it well-suited for Spot Instances. Three layers are combined for stable operations:

Layer	Tool	Role
Spot Interruption Handling	Karpenter v1.2+	2-minute warning detection → replacement node provisioning → Pod rescheduling
Node Health Monitoring	NMA (EKS Add-on)	Kernel/containerd/disk/network anomaly detection → Node Condition update
Auto Recovery	Node Auto Repair	Automatic replacement of unhealthy nodes reported by NMA

Karpenter NodePool, EC2NodeClass, and NMA configurations are detailed in 5.1 Infrastructure.

Pod Configuration — graceful shutdown + AZ distribution:

spec:
  terminationGracePeriodSeconds: 120  # Wait for in-flight LLM responses to complete
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: openclaw-gateway

terminationGracePeriodSeconds: 120 ensures in-flight LLM responses have time to complete during Spot interruptions, while topologySpreadConstraints distributes Pods across AZs to guard against single-AZ failures.

LLM Models — Content-Based Routing

Query Type	Model	Provider	Rationale
General (default)	Claude Sonnet 4.6	Bedrock	1M context, best agent performance
Coding / Programming	GLM-4.7	Bedrock	102B MoE, optimized for code generation
Korean / Korea-related	Solar Pro 3	Bedrock	128K context, Korean-optimized, MoE 12B active

Networking

Component	Technology
CNI	Cilium (ENI mode)
Service Map	Hubble UI + Grafana
Bedrock Connection	VPC Endpoint (`com.amazonaws.<region>.bedrock-runtime`)

Observability (3-Layer)

Layer	Tool	Tracks
Network	Cilium Hubble	L7 HTTP flows, DNS, service map
LLM	Langfuse	Prompts/completions, token cost, tool call chains
System	OTEL → Prometheus/Grafana	CPU, memory, Pod health, custom metrics

Cost Analysis

Item	Estimated Monthly Cost
Gateway + Bifrost (ARM)	~$30
Bedrock API (Claude/GLM/Solar)	~$15-40
Langfuse (self-hosted)	~$10
Redis (cache)	~$5
Cilium + Hubble	$0
Prometheus/Grafana	$0 (existing stack)
Total	$60-85/month

Cost Optimization Strategies

Strategy	Detail	Savings
Graviton4 ARM	vs x86	20-40%
Spot Instance	vs On-Demand, Karpenter native interruption handling	60-90% (compute)
Auto-Router	Cost/quality optimization via specialized model routing	Optimized per model
Semantic Caching	Redis-based, caching identical/similar requests	~90% for repeated requests
Budget Control	Monthly budget per virtual key, rate limiting	Overspend prevention
VPC Endpoint	Eliminates Bedrock NAT Gateway costs	Data transfer cost reduction

Deployment Guide

5.1 Infrastructure

EKS Cluster Prerequisites

Requirement	Details
EKS Version	1.30+
Karpenter	v1.0+ (includes native Spot interruption handling)
EKS Add-ons	Pod Identity Agent, EKS Node Monitoring Agent
VPC	Bedrock VPC Endpoint (`com.amazonaws.<region>.bedrock-runtime`)
CNI	Cilium ENI mode (if Hubble L7 visibility is needed) or VPC CNI

EKS Auto Mode is recommended for new clusters. For existing clusters, just ensure the above requirements are met.

Karpenter — Cost-Optimized Node Configuration

Graviton4 M8g Spot-first configuration to minimize compute costs.

# EC2NodeClass — Graviton4 ARM64 node configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: openclaw-arm64
spec:
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  amiSelectorTerms:
    - alias: al2023@latest     # Amazon Linux 2023, automatically selects ARM64
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 30Gi
        volumeType: gp3
        deleteOnTermination: true

# NodePool — All Graviton gen 4+, Spot first, On-Demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: openclaw-gateway
spec:
  template:
    metadata:
      labels:
        workload-type: ai-gateway
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]  # Spot preferred when available
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["3"]                   # Gen 4+ → includes m8g, c8g, r8g, m9g, etc.
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["medium", "large"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: openclaw-arm64
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    budgets:
      - nodes: "1"  # Max 1 node evicted at a time → ensures availability
  limits:
    cpu: "8"        # Max 8 vCPU — cost ceiling
    memory: 16Gi

Automatic Graviton Generation Upgrade

By specifying instance-generation: Gt "3" + arch: arm64 instead of a specific instance-family, all families of Graviton gen 4+ (m8g, c8g, r8g, m9g, etc.) are included as candidates. When new Graviton generations become GA, Karpenter will automatically select the optimal instance based on price/performance without any NodePool changes.

Node Monitoring Agent — Node Health Monitoring

Karpenter handles Spot interruption events, but it does not detect system-level node issues such as kernel problems, containerd failures, disk/network anomalies. Enabling the EKS Node Monitoring Agent (NMA) as an EKS Add-on covers this gap.

Detection Area	Karpenter	NMA
2-min Spot interruption warning	Detects + provisions replacement node	-
Kernel/containerd failures	-	Detects → updates Node Condition
Disk/network anomalies	-	Detects → creates Kubernetes Event
Node Auto Repair integration	-	Triggers automatic replacement of unhealthy nodes

# Enable NMA EKS Add-on
aws eks create-addon \
  --cluster-name ${CLUSTER_NAME} \
  --addon-name eks-node-monitoring-agent

By combining Karpenter (Spot interruption handling) + NMA (node health monitoring) + Node Auto Repair (automatic replacement), high availability can be maintained even in Spot instance environments.

IAM — EKS Pod Identity

# After enabling Pod Identity Agent add-on:
aws eks create-pod-identity-association \
  --cluster-name ${CLUSTER_NAME} \
  --namespace openclaw \
  --service-account openclaw-sa \
  --role-arn arn:aws:iam::${ACCOUNT_ID}:role/openclaw-bedrock-role

Required IAM Role permissions:

bedrock:InvokeModel
bedrock:InvokeModelWithResponseStream

OpenClaw and Bifrost share the same ServiceAccount.

5.2 Bifrost AI Gateway

Config — Multi-Model + Auto-Router

model_list:
  - model_name: claude-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-sonnet-4-6

  - model_name: glm-4.7
    litellm_params:
      model: bedrock/zhipu.glm-4.7

  - model_name: solar-pro-3
    litellm_params:
      model: bedrock/upstage.solar-pro-3

router_settings:
  routing_strategy: "content-based"
  auto_router:
    encoder_type: openai
    encoder_name: text-embedding-3-small
    routes:
      - name: korean-queries
        model: solar-pro-3
        utterances:
          - "Answer in Korean"
          - "Korea-related question"
          - "Korean language query"
          - "Korean news"
          - "Korean translation"
        description: "Korean language or Korea-related queries"
        score_threshold: 0.5
      - name: coding-queries
        model: glm-4.7
        utterances:
          - "write code"
          - "debug this function"
          - "Write code for me"
          - "programming"
          - "fix this bug"
        description: "Coding, programming, and debugging queries"
        score_threshold: 0.5
    default_route: claude-sonnet

litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: redis
    port: 6379
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

Create Secrets

kubectl create secret generic bifrost-secrets \
  --from-literal=BIFROST_MASTER_KEY=<your-master-key>

Since all models are invoked through Bedrock, no separate API keys are needed. IAM authentication is automatically handled via EKS Pod Identity.

Redis sidecar: Auto-Router embedding cache + semantic cache
Service: ClusterIP (port 4000, OpenAI-compatible API)

Semantic Cache Design Principles

The above configuration is a practical example using LiteLLM cache: true + Redis. For overall design principles including similarity threshold selection, cache key design (multi-tenant namespace), PII-safe handling, and observability metrics, refer to Semantic Caching Strategy.

5.3 OpenClaw Gateway

Deployment

Image: ghcr.io/openclaw/openclaw:latest
Resources: 512Mi memory, 250m CPU
NodeSelector: kubernetes.io/arch: arm64
Service: ClusterIP (port 18789)

Config (`openclaw.json`)

{
  "ai": {
    "provider": "openai",
    "baseUrl": "http://bifrost-proxy:4000",
    "model": "claude-sonnet"
  },
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://otel-collector:4317",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true
    }
  }
}

5.4 Cilium CNI (ENI Mode) + Hubble

Cilium ENI Mode — Full VPC CNI replacement, single eBPF datapath

Pod IPs assigned directly from ENI
Full NetworkPolicy support
Optimal performance with a single datapath

Hubble UI capabilities:

Interactive service map: Visualize HTTP request flows between Pods
L7 HTTP flows: View POST /v1/chat/completions → 200 OK (320ms)
DNS query tracking: Real-time visibility into which external APIs are called
Hubble Grafana dashboard: Prometheus metrics integration
Cost: $0

5.5 LLM Observability (Langfuse)

Langfuse self-hosted Helm chart (PostgreSQL + Langfuse server)
Automatic integration via Bifrost success_callback: ["langfuse"]

Tracked items:

Prompt/completion content
Token usage
Cost per model
Tool call chains
Latency

5.6 System Observability (OTEL + Prometheus/Grafana)

OTEL Collector: Receivers OTLP (gRPC :4317), Exporters Prometheus
Reuse existing kube-prometheus-stack if available; otherwise deploy via Helm
OpenClaw metrics: openclaw.tokens, openclaw.cost.usd, openclaw.run.duration_ms, openclaw.message.*

Dashboards & Alerts

Langfuse (LLM Level)

Agent Trace Explorer: Message → LLM call → tool execution → response chain
Token Usage: Token consumption by model and time
Cost Analytics: Daily/weekly cost trends
Prompt/Completion Inspector: View actual inputs and outputs

Hubble (Network Level)

Interactive Service Map: HTTP request flows between Pods
L7 Visibility: POST /v1/chat/completions → 200 OK (320ms)
DNS Query Tracking: Which external APIs are being called

Grafana (System Level)

Gateway Health: Uptime, connections, memory, CPU
Bifrost: Cache hit rate, request throughput, latency
Pod/Node Resource Usage

Alert Rules

Alert	Condition
Budget Threshold Approaching	Bifrost budget > 80%
Gateway Down	Pod restart > 3 in 5min
LLM Response Delay	Latency > 5s
Cache Hit Rate Drop	Cache hit < 30%
Error Rate	Error rate > 5%

Verification Checklist

#	Check	Command / Action
1	All Pods Running	`kubectl get pods`
2	Gateway Status	`openclaw status` (after port-forward)
3	Auto-Router Routing	Verify model list + routing in Bifrost UI
4	Service Map	Verify HTTP flows between Pods in Hubble UI
5	LLM Trace	Verify prompt → tool → response trace in Langfuse UI
6	System Metrics	Check Grafana dashboards
7	Bedrock Audit	Check CloudTrail logs
8	Routing Validation	Test Korean / coding / general queries individually
9	Spot Stability	Verify Karpenter node transition events with `kubectl get events`, confirm replacement node provisioning with `kubectl get nodeclaims`

Next Steps

Adding self-hosted vLLM: See llm-d Distributed Inference for Bifrost → llm-d → vLLM hybrid setup
LiteLLM alternative: If Bifrost doesn't meet your requirements, LiteLLM can be used as a drop-in replacement (Python-based, same OpenAI-compatible API)
Adding vector search RAG: See Milvus Vector DB
Agent evaluation: Measure response quality with Ragas Evaluation

Overview​

Related Documents​

Architecture Design​

Key Design Decisions Summary​

Technology Stack​

Compute​

Stable Spot Instance Operations​

LLM Models — Content-Based Routing​

Networking​

Observability (3-Layer)​

Cost Analysis​

Cost Optimization Strategies​

Deployment Guide​

5.1 Infrastructure​

EKS Cluster Prerequisites​

Karpenter — Cost-Optimized Node Configuration​

Node Monitoring Agent — Node Health Monitoring​

IAM — EKS Pod Identity​

5.2 Bifrost AI Gateway​

Config — Multi-Model + Auto-Router​

Create Secrets​

5.3 OpenClaw Gateway​

Deployment​

Config (openclaw.json)​

5.4 Cilium CNI (ENI Mode) + Hubble​

5.5 LLM Observability (Langfuse)​

5.6 System Observability (OTEL + Prometheus/Grafana)​

Dashboards & Alerts​

Langfuse (LLM Level)​

Hubble (Network Level)​

Grafana (System Level)​

Alert Rules​

Verification Checklist​