Skip to main content

Kagent - Kubernetes AI Agent Management

In a multi-model ecosystem, AI agents must call multiple LLMs/SLMs, connect to tools and other agents via MCP/A2A protocols, and dynamically scale based on traffic. Kubernetes' Operator pattern is the most natural approach for declaratively defining such agents as CRDs and automatically managing their lifecycles. Kagent is a reference architecture that applies this pattern to AI agents.

1. Overview

Kagent declaratively defines agents, tools, and workflows through Custom Resource Definitions (CRDs), with an Operator automatically deploying and managing them. Instead of manually writing Deployments, Services, and ConfigMaps, a single Agent CRD integrates model connections, tool bindings, and scaling policies.

Kagent Project Status

Kagent is currently in the reference architecture and design pattern stage, and an official open-source project has not yet been released. Examples in this document are based on conceptual implementations. For production environments, consider validated alternatives such as Bedrock AgentCore, KubeAI, or LangGraph Platform.

See the Kagent official documentation for deployment guides.

Alternative Solutions Comparison

🔍 Kagent Alternative Solutions Comparison
SolutionFeaturesSuitable Use Cases
Kagent (Reference)
AI agent-specific CRD, workflow orchestrationMulti-agent systems, complex workflows
KubeAI
Lightweight LLM serving, OpenAI-compatible APISimple model serving, rapid prototyping
Bedrock AgentCore
AWS managed Agent runtime, MCP/A2A native, auto-scalingAWS-native Agent deployment, managed infrastructure preferred
LangGraph Platform
Agent workflow framework, state management, LangSmith native integrationComplex multi-step agents, stateful workflows

Key Features

  • Declarative agent management: YAML-based agent definition and deployment
  • Tool registry: Central management of tools available to agents via CRDs
  • Auto-scaling: Dynamic scaling through HPA/KEDA integration
  • Multi-agent orchestration: Inter-agent collaboration for complex workflows
  • Observability integration: Native integration with Langfuse/LangSmith, OpenTelemetry
Target Audience

This document is intended for Kubernetes administrators, platform engineers, and MLOps engineers. Understanding of basic Kubernetes concepts (Pod, Deployment, CRD) is required.

re:Invent 2025 Related Session

CNS421: Streamline Amazon EKS Operations with Agentic AI — A code talk session covering automated EKS cluster management, real-time issue diagnosis, and automatic recovery using AI agents like Kagent.

Key Topics:

  • Model Context Protocol (MCP): Standard protocol for AI agent integration with AWS services
  • Automated Incident Response: Automatic diagnosis and recovery for Pod failures, resource shortages, network issues
  • AWS Service Integration: Native integration with CloudWatch, Systems Manager, EKS API

Watch the session


2. Kagent Architecture

Kagent follows the Kubernetes Operator pattern, consisting of Controller, CRDs, and Webhooks.

Component Description

ComponentRoleDescription
Kagent ControllerReconciliation loopDetect CRD changes and reconcile resources to desired state
Admission WebhookValidation/MutationValidate and set defaults on CRD creation/modification
Metrics ServerMetrics collectionExpose agent state and performance metrics
Agent CRDAgent definitionSpec, model, and tool configuration for AI agents
Tool CRDTool definitionDefine tools (API, search, etc.) for agent use
Workflow CRDWorkflow definitionDefine multi-agent collaboration workflows

Component Interaction

Prerequisites

  • Kubernetes cluster (v1.25+)
  • kubectl CLI tool
  • Helm v3 (for Helm installation)
  • cert-manager (Webhook TLS certificate management)

3. CRD Structure

Agent CRD

The Agent CRD declaratively defines all settings for an AI agent. Below is the core spec structure:

apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
name: customer-support-agent
namespace: ai-agents
spec:
# Agent basic info
displayName: "Customer Support Agent"
description: "AI agent that responds to customer inquiries and creates tickets"

# Model configuration
model:
provider: openai # openai, anthropic, bedrock, vllm
name: gpt-4-turbo
endpoint: "" # Custom endpoint (vLLM, etc.)
temperature: 0.7
maxTokens: 4096
apiKeySecretRef:
name: openai-api-key
key: api-key

# System prompt
systemPrompt: |
You are a friendly and professional customer support agent.

# Tool list
tools:
- name: search-knowledge-base
- name: create-ticket

# Memory configuration
memory:
type: redis
config:
host: redis-master.ai-data.svc.cluster.local
ttl: 3600
maxHistory: 50

# Scaling configuration
scaling:
minReplicas: 2
maxReplicas: 10
metrics:
- type: cpu
target:
averageUtilization: 70
keda:
enabled: true
triggers:
- type: prometheus
metadata:
metricName: agent_active_sessions
threshold: "50"

# Resource limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"

# Observability configuration
observability:
tracing:
enabled: true
provider: langfuse # langfuse, langsmith, cloudwatch (details: ../operations-mlops/observability/llmops-observability.md)
metrics:
enabled: true
port: 9090

Tool CRD

The Tool CRD defines tools available to agents. Tool types include api, retrieval, code, and human.

Key fields:

FieldDescriptionExample
spec.typeTool typeretrieval, api, code, human
spec.descriptionDescription referenced by LLM for tool selection"Search documents in knowledge base"
spec.retrievalVector store connection configMilvus, Pinecone, etc.
spec.apiREST API call configEndpoint, auth, timeout
spec.parametersInput parameter schemaname, type, required, enum
spec.outputOutput schemaJSON Schema format

Memory CRD

Memory configuration for storing agent conversation context and state.

Key features:

FeatureDescription
Session memoryRedis/PostgreSQL-based short-term conversation history (TTL config)
Conversation compressionLLM-based conversation summarization when threshold exceeded
Long-term memoryVector store-based agent experience accumulation
Memory typesredis, postgres, in-memory

Workflow CRD

Define multi-agent workflows using the Workflow CRD.

Core structure:

FieldDescription
spec.inputWorkflow input parameter definition
spec.stepsPer-step agent execution definition (sequential/parallel)
spec.steps[].dependsOnDependency step specification (DAG construction)
spec.steps[].parallelParallel execution flag
spec.outputWorkflow final output mapping
spec.errorHandlingBehavior on step/workflow failure
spec.timeoutOverall workflow timeout
spec.concurrencyConcurrent execution limit (queue/reject/replace)

4. Multi-Agent Orchestration

Define workflows where multiple agents collaborate to handle complex tasks.

Inter-Agent Communication Patterns

Orchestration Patterns

PatternDescriptionSuitable For
Sequential PipelineStep-by-step sequential execution, previous step output feeds next inputData processing, ETL
Parallel Fan-outSame input sent to multiple agents in parallelMulti-angle analysis, A/B comparison
DAG WorkflowDependency-based directed acyclic graph executionComplex research, report generation
LoopRepeated execution until condition is metReview-revision cycles, quality verification
RoutingBranch to different agents based on input contentInquiry classification, domain distribution

Workflow Example: Research Report

Workflow execution status is tracked via WorkflowRun CRD:

StatusDescription
PendingWaiting for execution
RunningOne or more steps are running
SucceededAll steps completed successfully
FailedOne or more steps failed (retries exhausted)

5. Agent Lifecycle Management

Operator-Managed Resources

When an Agent CRD is created, the Controller automatically creates/manages the following resources:

Agent CRD Creation
├── Deployment (agent Pod management)
├── Service (network access)
├── HPA/KEDA ScaledObject (autoscaling)
├── ConfigMap (agent configuration)
└── Secret reference (API keys, credentials)

Update Strategies

StrategyDescriptionRecommended Scenario
Rolling UpdateDefault strategy. Progressively replace PodsGeneral config changes
Canary DeploymentTest new version with separate Agent CRDModel changes, major prompt revisions
Blue-GreenRun two versions simultaneously, switch trafficZero-downtime migration

Scaling Strategies

MetricDescriptionThreshold Example
CPU utilizationBasic resource-based scaling70%
Memory utilizationScale out on memory pressure80%
Active session countKEDA + Prometheus custom metric50 sessions/Pod
Request throughputRequests per second based100 RPS/Pod

6. Observability Integration

Agent execution traces are sent to Langfuse, LangSmith, or CloudWatch Generative AI Observability. For a comparison of each tool, see LLMOps Observability Comparison.

Deployment guides:

Key Alert Rules

AlertConditionSeverity
Agent error rate increaseError rate > 5% (5 min sustained)Critical
Agent response delayP99 > 30s (5 min sustained)Warning
Pod availability degradationReady Pods < 50% (5 min sustained)Critical

7. Conclusion

Using Kagent enables declarative management of AI agents in Kubernetes environments. Key benefits include:

  • Declarative management: YAML-based agent definitions support GitOps workflows
  • Automated operations: Automatic recovery and scaling through the Operator pattern
  • Standardization: Agent definition standardization through CRDs
  • Scalability: Leveraging Kubernetes-native scaling mechanisms
  • Observability: Integrated monitoring and tracing support
Next Steps

References

Official Documentation