Kagent - Kubernetes AI Agent Management

In a multi-model ecosystem, AI agents must call multiple LLMs/SLMs, connect to tools and other agents via MCP/A2A protocols, and dynamically scale based on traffic. Kubernetes' Operator pattern is the most natural approach for declaratively defining such agents as CRDs and automatically managing their lifecycles. Kagent is a reference architecture that applies this pattern to AI agents.

1. Overview

Kagent declaratively defines agents, tools, and workflows through Custom Resource Definitions (CRDs), with an Operator automatically deploying and managing them. Instead of manually writing Deployments, Services, and ConfigMaps, a single Agent CRD integrates model connections, tool bindings, and scaling policies.

Kagent Project Status

Kagent is currently in the reference architecture and design pattern stage, and an official open-source project has not yet been released. Examples in this document are based on conceptual implementations. For production environments, consider validated alternatives such as Bedrock AgentCore, KubeAI, or LangGraph Platform.

See the Kagent official documentation for deployment guides.

Alternative Solutions Comparison

🔍 Kagent Alternative Solutions Comparison

Solution	Features	Suitable Use Cases
Kagent (Reference)	AI agent-specific CRD, workflow orchestration	Multi-agent systems, complex workflows
KubeAI	Lightweight LLM serving, OpenAI-compatible API	Simple model serving, rapid prototyping
Bedrock AgentCore	AWS managed Agent runtime, MCP/A2A native, auto-scaling	AWS-native Agent deployment, managed infrastructure preferred
LangGraph Platform	Agent workflow framework, state management, LangSmith native integration	Complex multi-step agents, stateful workflows

Key Features

Declarative agent management: YAML-based agent definition and deployment
Tool registry: Central management of tools available to agents via CRDs
Auto-scaling: Dynamic scaling through HPA/KEDA integration
Multi-agent orchestration: Inter-agent collaboration for complex workflows
Observability integration: Native integration with Langfuse/LangSmith, OpenTelemetry

Target Audience

This document is intended for Kubernetes administrators, platform engineers, and MLOps engineers. Understanding of basic Kubernetes concepts (Pod, Deployment, CRD) is required.

re:Invent 2025 Related Session

CNS421: Streamline Amazon EKS Operations with Agentic AI — A code talk session covering automated EKS cluster management, real-time issue diagnosis, and automatic recovery using AI agents like Kagent.

Key Topics:

Model Context Protocol (MCP): Standard protocol for AI agent integration with AWS services
Automated Incident Response: Automatic diagnosis and recovery for Pod failures, resource shortages, network issues
AWS Service Integration: Native integration with CloudWatch, Systems Manager, EKS API

Watch the session

2. Kagent Architecture

Kagent follows the Kubernetes Operator pattern, consisting of Controller, CRDs, and Webhooks.

Component Description

Component	Role	Description
Kagent Controller	Reconciliation loop	Detect CRD changes and reconcile resources to desired state
Admission Webhook	Validation/Mutation	Validate and set defaults on CRD creation/modification
Metrics Server	Metrics collection	Expose agent state and performance metrics
Agent CRD	Agent definition	Spec, model, and tool configuration for AI agents
Tool CRD	Tool definition	Define tools (API, search, etc.) for agent use
Workflow CRD	Workflow definition	Define multi-agent collaboration workflows

Component Interaction

Prerequisites

Kubernetes cluster (v1.25+)
kubectl CLI tool
Helm v3 (for Helm installation)
cert-manager (Webhook TLS certificate management)

3. CRD Structure

Agent CRD

The Agent CRD declaratively defines all settings for an AI agent. Below is the core spec structure:

apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
  name: customer-support-agent
  namespace: ai-agents
spec:
  # Agent basic info
  displayName: "Customer Support Agent"
  description: "AI agent that responds to customer inquiries and creates tickets"

  # Model configuration
  model:
    provider: openai          # openai, anthropic, bedrock, vllm
    name: gpt-4-turbo
    endpoint: ""              # Custom endpoint (vLLM, etc.)
    temperature: 0.7
    maxTokens: 4096
    apiKeySecretRef:
      name: openai-api-key
      key: api-key

  # System prompt
  systemPrompt: |
    You are a friendly and professional customer support agent.

  # Tool list
  tools:
    - name: search-knowledge-base
    - name: create-ticket

  # Memory configuration
  memory:
    type: redis
    config:
      host: redis-master.ai-data.svc.cluster.local
      ttl: 3600
      maxHistory: 50

  # Scaling configuration
  scaling:
    minReplicas: 2
    maxReplicas: 10
    metrics:
      - type: cpu
        target:
          averageUtilization: 70
    keda:
      enabled: true
      triggers:
        - type: prometheus
          metadata:
            metricName: agent_active_sessions
            threshold: "50"

  # Resource limits
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"

  # Observability configuration
  observability:
    tracing:
      enabled: true
      provider: langfuse       # langfuse, langsmith, cloudwatch (details: ../operations-mlops/observability/llmops-observability.md)
    metrics:
      enabled: true
      port: 9090

Tool CRD

The Tool CRD defines tools available to agents. Tool types include api, retrieval, code, and human.

Key fields:

Field	Description	Example
`spec.type`	Tool type	`retrieval`, `api`, `code`, `human`
`spec.description`	Description referenced by LLM for tool selection	"Search documents in knowledge base"
`spec.retrieval`	Vector store connection config	Milvus, Pinecone, etc.
`spec.api`	REST API call config	Endpoint, auth, timeout
`spec.parameters`	Input parameter schema	name, type, required, enum
`spec.output`	Output schema	JSON Schema format

Memory CRD

Memory configuration for storing agent conversation context and state.

Key features:

Feature	Description
Session memory	Redis/PostgreSQL-based short-term conversation history (TTL config)
Conversation compression	LLM-based conversation summarization when threshold exceeded
Long-term memory	Vector store-based agent experience accumulation
Memory types	`redis`, `postgres`, `in-memory`

Workflow CRD

Define multi-agent workflows using the Workflow CRD.

Core structure:

Field	Description
`spec.input`	Workflow input parameter definition
`spec.steps`	Per-step agent execution definition (sequential/parallel)
`spec.steps[].dependsOn`	Dependency step specification (DAG construction)
`spec.steps[].parallel`	Parallel execution flag
`spec.output`	Workflow final output mapping
`spec.errorHandling`	Behavior on step/workflow failure
`spec.timeout`	Overall workflow timeout
`spec.concurrency`	Concurrent execution limit (queue/reject/replace)

4. Multi-Agent Orchestration

Define workflows where multiple agents collaborate to handle complex tasks.

Inter-Agent Communication Patterns

Orchestration Patterns

Pattern	Description	Suitable For
Sequential Pipeline	Step-by-step sequential execution, previous step output feeds next input	Data processing, ETL
Parallel Fan-out	Same input sent to multiple agents in parallel	Multi-angle analysis, A/B comparison
DAG Workflow	Dependency-based directed acyclic graph execution	Complex research, report generation
Loop	Repeated execution until condition is met	Review-revision cycles, quality verification
Routing	Branch to different agents based on input content	Inquiry classification, domain distribution

Workflow Example: Research Report

Workflow execution status is tracked via WorkflowRun CRD:

Status	Description
`Pending`	Waiting for execution
`Running`	One or more steps are running
`Succeeded`	All steps completed successfully
`Failed`	One or more steps failed (retries exhausted)

5. Agent Lifecycle Management

Operator-Managed Resources

When an Agent CRD is created, the Controller automatically creates/manages the following resources:

Agent CRD Creation
  ├── Deployment (agent Pod management)
  ├── Service (network access)
  ├── HPA/KEDA ScaledObject (autoscaling)
  ├── ConfigMap (agent configuration)
  └── Secret reference (API keys, credentials)

Update Strategies

Strategy	Description	Recommended Scenario
Rolling Update	Default strategy. Progressively replace Pods	General config changes
Canary Deployment	Test new version with separate Agent CRD	Model changes, major prompt revisions
Blue-Green	Run two versions simultaneously, switch traffic	Zero-downtime migration

Scaling Strategies

Metric	Description	Threshold Example
CPU utilization	Basic resource-based scaling	70%
Memory utilization	Scale out on memory pressure	80%
Active session count	KEDA + Prometheus custom metric	50 sessions/Pod
Request throughput	Requests per second based	100 RPS/Pod

6. Observability Integration

Agent execution traces are sent to Langfuse, LangSmith, or CloudWatch Generative AI Observability. For a comparison of each tool, see LLMOps Observability Comparison.

Deployment guides:

Langfuse: Architecture, Helm Deployment
LangSmith: LangSmith Official Documentation
CloudWatch: AWS Generative AI Observability

Key Alert Rules

Alert	Condition	Severity
Agent error rate increase	Error rate > 5% (5 min sustained)	Critical
Agent response delay	P99 > 30s (5 min sustained)	Warning
Pod availability degradation	Ready Pods < 50% (5 min sustained)	Critical

7. Conclusion

Using Kagent enables declarative management of AI agents in Kubernetes environments. Key benefits include:

Declarative management: YAML-based agent definitions support GitOps workflows
Automated operations: Automatic recovery and scaling through the Operator pattern
Standardization: Agent definition standardization through CRDs
Scalability: Leveraging Kubernetes-native scaling mechanisms
Observability: Integrated monitoring and tracing support

Next Steps

Agentic AI Platform Architecture - Overall platform design
Agent Monitoring - Langfuse/LangSmith integration guide
GPU Resource Management - Dynamic resource allocation

1. Overview​

Alternative Solutions Comparison​

Key Features​

2. Kagent Architecture​

Component Description​

Component Interaction​

Prerequisites​

3. CRD Structure​

Agent CRD​

Tool CRD​

Memory CRD​

Workflow CRD​

4. Multi-Agent Orchestration​

Inter-Agent Communication Patterns​

Orchestration Patterns​

Workflow Example: Research Report​

5. Agent Lifecycle Management​

Operator-Managed Resources​

Update Strategies​

Scaling Strategies​

6. Observability Integration​

Key Alert Rules​

7. Conclusion​

References​

Official Documentation​

Related Documentation​