Kagent - Kubernetes AI Agent Management
In a multi-model ecosystem, AI agents must call multiple LLMs/SLMs, connect to tools and other agents via MCP/A2A protocols, and dynamically scale based on traffic. Kubernetes' Operator pattern is the most natural approach for declaratively defining such agents as CRDs and automatically managing their lifecycles. Kagent is a reference architecture that applies this pattern to AI agents.
1. Overview
Kagent declaratively defines agents, tools, and workflows through Custom Resource Definitions (CRDs), with an Operator automatically deploying and managing them. Instead of manually writing Deployments, Services, and ConfigMaps, a single Agent CRD integrates model connections, tool bindings, and scaling policies.
Kagent is currently in the reference architecture and design pattern stage, and an official open-source project has not yet been released. Examples in this document are based on conceptual implementations. For production environments, consider validated alternatives such as Bedrock AgentCore, KubeAI, or LangGraph Platform.
See the Kagent official documentation for deployment guides.
Alternative Solutions Comparison
| Solution | Features | Suitable Use Cases |
|---|---|---|
Kagent (Reference) | AI agent-specific CRD, workflow orchestration | Multi-agent systems, complex workflows |
KubeAI | Lightweight LLM serving, OpenAI-compatible API | Simple model serving, rapid prototyping |
Bedrock AgentCore | AWS managed Agent runtime, MCP/A2A native, auto-scaling | AWS-native Agent deployment, managed infrastructure preferred |
LangGraph Platform | Agent workflow framework, state management, LangSmith native integration | Complex multi-step agents, stateful workflows |
Key Features
- Declarative agent management: YAML-based agent definition and deployment
- Tool registry: Central management of tools available to agents via CRDs
- Auto-scaling: Dynamic scaling through HPA/KEDA integration
- Multi-agent orchestration: Inter-agent collaboration for complex workflows
- Observability integration: Native integration with Langfuse/LangSmith, OpenTelemetry
This document is intended for Kubernetes administrators, platform engineers, and MLOps engineers. Understanding of basic Kubernetes concepts (Pod, Deployment, CRD) is required.
CNS421: Streamline Amazon EKS Operations with Agentic AI — A code talk session covering automated EKS cluster management, real-time issue diagnosis, and automatic recovery using AI agents like Kagent.
Key Topics:
- Model Context Protocol (MCP): Standard protocol for AI agent integration with AWS services
- Automated Incident Response: Automatic diagnosis and recovery for Pod failures, resource shortages, network issues
- AWS Service Integration: Native integration with CloudWatch, Systems Manager, EKS API
2. Kagent Architecture
Kagent follows the Kubernetes Operator pattern, consisting of Controller, CRDs, and Webhooks.
Component Description
| Component | Role | Description |
|---|---|---|
| Kagent Controller | Reconciliation loop | Detect CRD changes and reconcile resources to desired state |
| Admission Webhook | Validation/Mutation | Validate and set defaults on CRD creation/modification |
| Metrics Server | Metrics collection | Expose agent state and performance metrics |
| Agent CRD | Agent definition | Spec, model, and tool configuration for AI agents |
| Tool CRD | Tool definition | Define tools (API, search, etc.) for agent use |
| Workflow CRD | Workflow definition | Define multi-agent collaboration workflows |
Component Interaction
Prerequisites
- Kubernetes cluster (v1.25+)
- kubectl CLI tool
- Helm v3 (for Helm installation)
- cert-manager (Webhook TLS certificate management)
3. CRD Structure
Agent CRD
The Agent CRD declaratively defines all settings for an AI agent. Below is the core spec structure:
apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
name: customer-support-agent
namespace: ai-agents
spec:
# Agent basic info
displayName: "Customer Support Agent"
description: "AI agent that responds to customer inquiries and creates tickets"
# Model configuration
model:
provider: openai # openai, anthropic, bedrock, vllm
name: gpt-4-turbo
endpoint: "" # Custom endpoint (vLLM, etc.)
temperature: 0.7
maxTokens: 4096
apiKeySecretRef:
name: openai-api-key
key: api-key
# System prompt
systemPrompt: |
You are a friendly and professional customer support agent.
# Tool list
tools:
- name: search-knowledge-base
- name: create-ticket
# Memory configuration
memory:
type: redis
config:
host: redis-master.ai-data.svc.cluster.local
ttl: 3600
maxHistory: 50
# Scaling configuration
scaling:
minReplicas: 2
maxReplicas: 10
metrics:
- type: cpu
target:
averageUtilization: 70
keda:
enabled: true
triggers:
- type: prometheus
metadata:
metricName: agent_active_sessions
threshold: "50"
# Resource limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
# Observability configuration
observability:
tracing:
enabled: true
provider: langfuse # langfuse, langsmith, cloudwatch (details: ../operations-mlops/observability/llmops-observability.md)
metrics:
enabled: true
port: 9090
Tool CRD
The Tool CRD defines tools available to agents. Tool types include api, retrieval, code, and human.
Key fields:
| Field | Description | Example |
|---|---|---|
spec.type | Tool type | retrieval, api, code, human |
spec.description | Description referenced by LLM for tool selection | "Search documents in knowledge base" |
spec.retrieval | Vector store connection config | Milvus, Pinecone, etc. |
spec.api | REST API call config | Endpoint, auth, timeout |
spec.parameters | Input parameter schema | name, type, required, enum |
spec.output | Output schema | JSON Schema format |
Memory CRD
Memory configuration for storing agent conversation context and state.
Key features:
| Feature | Description |
|---|---|
| Session memory | Redis/PostgreSQL-based short-term conversation history (TTL config) |
| Conversation compression | LLM-based conversation summarization when threshold exceeded |
| Long-term memory | Vector store-based agent experience accumulation |
| Memory types | redis, postgres, in-memory |
Workflow CRD
Define multi-agent workflows using the Workflow CRD.
Core structure:
| Field | Description |
|---|---|
spec.input | Workflow input parameter definition |
spec.steps | Per-step agent execution definition (sequential/parallel) |
spec.steps[].dependsOn | Dependency step specification (DAG construction) |
spec.steps[].parallel | Parallel execution flag |
spec.output | Workflow final output mapping |
spec.errorHandling | Behavior on step/workflow failure |
spec.timeout | Overall workflow timeout |
spec.concurrency | Concurrent execution limit (queue/reject/replace) |
4. Multi-Agent Orchestration
Define workflows where multiple agents collaborate to handle complex tasks.
Inter-Agent Communication Patterns
Orchestration Patterns
| Pattern | Description | Suitable For |
|---|---|---|
| Sequential Pipeline | Step-by-step sequential execution, previous step output feeds next input | Data processing, ETL |
| Parallel Fan-out | Same input sent to multiple agents in parallel | Multi-angle analysis, A/B comparison |
| DAG Workflow | Dependency-based directed acyclic graph execution | Complex research, report generation |
| Loop | Repeated execution until condition is met | Review-revision cycles, quality verification |
| Routing | Branch to different agents based on input content | Inquiry classification, domain distribution |
Workflow Example: Research Report
Workflow execution status is tracked via WorkflowRun CRD:
| Status | Description |
|---|---|
Pending | Waiting for execution |
Running | One or more steps are running |
Succeeded | All steps completed successfully |
Failed | One or more steps failed (retries exhausted) |
5. Agent Lifecycle Management
Operator-Managed Resources
When an Agent CRD is created, the Controller automatically creates/manages the following resources:
Agent CRD Creation
├── Deployment (agent Pod management)
├── Service (network access)
├── HPA/KEDA ScaledObject (autoscaling)
├── ConfigMap (agent configuration)
└── Secret reference (API keys, credentials)
Update Strategies
| Strategy | Description | Recommended Scenario |
|---|---|---|
| Rolling Update | Default strategy. Progressively replace Pods | General config changes |
| Canary Deployment | Test new version with separate Agent CRD | Model changes, major prompt revisions |
| Blue-Green | Run two versions simultaneously, switch traffic | Zero-downtime migration |
Scaling Strategies
| Metric | Description | Threshold Example |
|---|---|---|
| CPU utilization | Basic resource-based scaling | 70% |
| Memory utilization | Scale out on memory pressure | 80% |
| Active session count | KEDA + Prometheus custom metric | 50 sessions/Pod |
| Request throughput | Requests per second based | 100 RPS/Pod |
6. Observability Integration
Agent execution traces are sent to Langfuse, LangSmith, or CloudWatch Generative AI Observability. For a comparison of each tool, see LLMOps Observability Comparison.
Deployment guides:
- Langfuse: Architecture, Helm Deployment
- LangSmith: LangSmith Official Documentation
- CloudWatch: AWS Generative AI Observability
Key Alert Rules
| Alert | Condition | Severity |
|---|---|---|
| Agent error rate increase | Error rate > 5% (5 min sustained) | Critical |
| Agent response delay | P99 > 30s (5 min sustained) | Warning |
| Pod availability degradation | Ready Pods < 50% (5 min sustained) | Critical |
7. Conclusion
Using Kagent enables declarative management of AI agents in Kubernetes environments. Key benefits include:
- Declarative management: YAML-based agent definitions support GitOps workflows
- Automated operations: Automatic recovery and scaling through the Operator pattern
- Standardization: Agent definition standardization through CRDs
- Scalability: Leveraging Kubernetes-native scaling mechanisms
- Observability: Integrated monitoring and tracing support
- Agentic AI Platform Architecture - Overall platform design
- Agent Monitoring - Langfuse/LangSmith integration guide
- GPU Resource Management - Dynamic resource allocation
References
Official Documentation
- Kagent Concepts and Design Patterns
- KubeAI - Kubernetes AI Platform
- Bedrock AgentCore
- LangGraph Platform
- Kubernetes Operator Pattern
- KEDA Documentation
- re:Invent 2025 CNS421 - Streamline EKS Operations with Agentic AI