Skip to main content

AWS Native Agentic AI Platform: Agent-Centric Approach with Managed Services

Overview

By leveraging AWS managed services, you can focus on Agent business logic rather than infrastructure operations. AWS handles GPU management, scaling, availability, and security, while development teams invest their efforts solely on the problems Agents need to solve.

The AWS Agentic AI stack consists of three pillars.

PillarServiceRole
FoundationAmazon BedrockModel access, RAG, guardrails, prompt caching
DevelopmentStrands Agents SDKAgent framework, MCP native, tool integration
OperationsAmazon Bedrock AgentCoreServerless deployment, memory, gateway, policy, evaluation
Key Perspective

This document covers the Agent development optimization approach provided by AWS managed services. The strategy is to delegate areas where managed services are sufficient to AWS, focusing team capabilities on Agent business logic. However, this approach is the first step in a multi-model journey. When cost pressure from traffic growth, need for domain-specific SLMs, or data sovereignty requirements arise, the realistic optimum is to expand to EKS-based Open Architecture and combine self-hosted models with Bedrock in a hybrid approach.

Challenge Resolution Mapping

How the 5 key challenges covered in Technical Challenges are addressed with the AWS Native approach:

ChallengeAWS Native Solution
GPU resource management and cost optimizationBedrock serverless inference — no GPU management required
Intelligent inference routing and gatewayBedrock Cross-Region Inference + AgentCore Gateway
LLMOps observability and cost governanceAgentCore Observability + CloudWatch
Agent orchestration and safetyStrands SDK + Bedrock Guardrails + AgentCore Policy
Model supply chain managementBedrock Model Evaluation + Prompt Management
Core Value of AWS Native

Since AWS handles GPU infrastructure management, scaling, availability, and security, teams can focus solely on Agent business logic. For more fine-grained control, it can be combined with EKS-based Open Architecture.


AWS Agentic AI Service Architecture

3-Pillar Architecture


Amazon Bedrock: Foundation Layer

Amazon Bedrock provides the foundational infrastructure for the Agentic AI Platform. It offers single API access to over 100 foundation models, with managed support for RAG, guardrails, and prompt caching.

Key Features

FeatureDescriptionCore Value
Model Access100+ models: Claude, Nova, Llama, Mistral, etc.Single API, no code changes for model switching
Knowledge BasesDocument parsing → chunking → embedding → indexing → searchOne-click RAG pipeline, completed with just S3 upload
GuardrailsPII filtering, prompt injection defense, topic restrictionsPolicy configuration in console, no code changes
Prompt CachingCaching of repeated contextsUp to 90% cost reduction, up to 85% latency reduction
Cross-Region InferenceAutomatic cross-region traffic distributionAuto-fallback on capacity limits, improved availability
Prompt ManagementPrompt version control, A/B testingPrompt history tracking, rollback support
Model EvaluationAutomated model evaluation, batch processingLLM-as-a-judge, human evaluation workflows
Prompt Caching Usage

Agents with long system prompts or repetitive tool definitions can significantly reduce cost and latency by enabling Prompt Caching. It is especially effective for patterns where RAG context is frequently repeated.


Strands Agents SDK: Development Framework

Strands Agents SDK is an open-source agent framework released by AWS under Apache 2.0. It implements production-grade Agents with minimal code and supports various model providers beyond Bedrock through its model-agnostic design.

Minimal Code Agent Implementation

from strands import Agent
from strands.models import BedrockModel

# Basic Agent — completed in 3 lines
agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
tools=["calculator", "web_search"],
)
result = agent("Convert the current temperature in Seoul to Celsius and Fahrenheit")

MCP Native Support

from strands import Agent
from strands.tools.mcp import MCPClient

# MCP server connection — auto-discovers external tools and integrates with Agent
mcp_client = MCPClient(server_url="http://mcp-server:8080")

agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
tools=[mcp_client], # Automatic MCP tool discovery and registration
)
result = agent("Look up recent order history and check delivery status")

Custom Tool Definition

from strands import Agent, tool

@tool
def lookup_customer(customer_id: str) -> dict:
"""Looks up customer information."""
# Business logic implementation
return {"name": "John Doe", "tier": "GOLD", "since": "2023-01"}

@tool
def create_ticket(title: str, priority: str, description: str) -> dict:
"""Creates a customer inquiry ticket."""
return {"ticket_id": "TK-2026-0042", "status": "OPEN"}

agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
tools=[lookup_customer, create_ticket],
system_prompt="You are a customer service Agent. Look up customer information and create tickets when needed.",
)

Strands SDK Key Characteristics

CharacteristicDescription
Apache 2.0Free for commercial use, forkable
Model-agnosticSupports various backends: Bedrock, OpenAI, Anthropic API, Ollama, etc.
Framework-agnosticRuns on any runtime: FastAPI, Flask, Lambda, etc.
MCP NativeBuilt-in Model Context Protocol support, no separate adapter needed
AgentCore IntegrationProduction deployment with single agentcore deploy command
Streaming ResponsesPer-token streaming, real-time UX support

Amazon Bedrock AgentCore: Operations Platform

AgentCore is a platform that provides everything needed for production Agent operations as a managed service. Released as GA in 2025, it consists of 7 core services.

7 Core Services

1. Runtime — Serverless Agent Deployment

AgentCore Runtime provides an isolated execution environment based on Firecracker MicroVM.

ItemSpecification
Isolation LevelFirecracker MicroVM (hardware-level isolation)
Session DurationUp to 8 hours continuous session
ScalingAuto-scale from 0, scale to 0 when idle
Deploymentagentcore deploy CLI or CloudFormation
Cold StartWithin seconds
# Deploy a Strands Agent to AgentCore
agentcore deploy \
--agent-name "customer-service" \
--entry-point "agent.py" \
--runtime python3.12 \
--memory 512 \
--timeout 3600

2. Memory — Short/Long-term Memory Management

A managed memory service that enables Agents to remember conversation context and user preferences.

Memory TypeDescriptionUsage Example
Short-term MemoryIn-session conversation historyReferencing previous questions in multi-turn conversations
Long-term MemoryPersistent cross-session informationUser preferences, past interaction patterns
Auto-summarizationAutomatically summarizes long conversationsRetaining key information when context window is exceeded
User ProfilesPersonalization learning"This user prefers concise answers"

3. Gateway — Intelligent Tool Routing

AgentCore Gateway automatically converts REST APIs to MCP protocol and uses semantic tool search to select only relevant tools from hundreds of registered tools.

Semantic Tool Search

Even with 300 tools registered, the Gateway analyzes the user request and delivers only the 4 relevant tools to the Agent. This saves LLM context window and improves tool selection accuracy.

FeatureDescription
REST → MCP ConversionAutomatically wraps existing REST APIs as MCP tools
Semantic Search300 tools → 4 relevant auto-filtered
Tool RegistryCentralized tool registration and version management
Auth PropagationSafely propagates user authentication to tools

4. Identity — Delegated Authentication

FeatureDescription
IdP IntegrationOkta, Amazon Cognito, OIDC-compatible providers
Delegated AuthAgent authenticates to tools on behalf of the user (OAuth 2.0 token exchange)
Fine-grained PermissionsPer-tool, per-resource access control
Audit LogsAll authentication events recorded in CloudTrail

5. Policy — Natural Language Policy Definition

Policies defined in natural language are compiled into deterministic runtimes to ensure consistent policy enforcement.

# Natural language policy examples
Policy: "Only allow refund processing for Gold tier and above customers"
→ Compiled → Executed by deterministic rule engine (no LLM calls)

Policy: "PII must be masked when calling external APIs"
→ Compiled → Automatically applied at Gateway level
CharacteristicDescription
Natural Language InputNon-developers can define policies
Deterministic ExecutionCompiled policies are applied deterministically without LLM
Real-time EnforcementPolicy verification on every request at runtime
Audit TrailComplete history of policy application/rejection

6. Observability — Integrated Monitoring

FeatureDescription
CloudWatch IntegrationAutomatic collection of metrics, logs, and alarms
OpenTelemetryStandard instrumentation compatible with existing monitoring tools
Per-step TracesFull tracking from Agent reasoning → tool calls → responses
Cost DashboardPer-model, per-Agent, per-session cost visualization

7. Evaluation — Continuous Quality Monitoring

FeatureDescription
LLM-as-judgeLLM automatically evaluates Agent response quality
13 Evaluation CriteriaAccuracy, relevance, harmfulness, consistency, etc.
A/B TestingQuantitative measurement of quality impact from prompt/model changes
Continuous MonitoringReal-time quality tracking on production traffic
Human Evaluation WorkflowsParallel automated and expert evaluation

Architecture Patterns

Build → Deploy → Operate Workflow

Simple Agent Pattern

Suitable for Agents performing single tasks such as FAQ, billing lookup, and status checks.

Complex Agent Pattern (Multi-step)

Suitable for Agents that call multiple tools sequentially/in parallel and branch based on intermediate results.

Multi-Agent Pattern

Independent Agents collaborate to handle complex business processes.

from strands import Agent
from strands.models import BedrockModel
from strands.multiagent import MultiAgentOrchestrator

# Define specialist Agents
research_agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
system_prompt="You are a research specialist.",
tools=["web_search", "document_reader"],
)

analysis_agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
system_prompt="You are a data analysis specialist.",
tools=["calculator", "chart_generator"],
)

writer_agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
system_prompt="You are a report writing specialist.",
tools=["document_writer"],
)

# Multi-Agent Orchestration
orchestrator = MultiAgentOrchestrator(
agents=[research_agent, analysis_agent, writer_agent],
strategy="sequential", # Sequential execution: Research → Analysis → Writing
)
result = orchestrator("Write a Q1 2026 market trends report")

Deployment Guide

The practical deployment methods for the AWS Native Agentic AI Platform consist of three approaches:

Deployment Methods Overview

ApproachToolSuitable Scenario
CLI Deploymentagentcore deployQuick prototyping, single Agent deployment
IaC DeploymentCloudFormation / CDKProduction environments, reproducible infrastructure
Full-stack TemplateFAST TemplatesFull stack (Agent + API + UI) bootstrap

Strands + AgentCore Concept

Strands Agent Structure:

from strands import Agent
from strands.models import BedrockModel

# Define Agent with minimal code
agent = Agent(
model=BedrockModel(model_id="anthropic.claude-sonnet-4-20250514"),
tools=["calculator", "web_search"],
system_prompt="You are a math assistant.",
)

# Wrap as Lambda handler
def handler(event, context):
return agent(event["prompt"])

AgentCore Deployment Workflow:

  1. Write Agent code (Python)
  2. Execute agentcore deploy → Automatically deploy to Firecracker MicroVM
  3. Endpoint created → Agent callable via REST API
  4. Memory/Gateway/Policy automatically connected

CloudFormation IaC Pattern

Using AWS CloudFormation allows declarative management of Agents and related resources (Knowledge Base, Guardrails, etc.):

Resources:
CustomerServiceAgent:
Type: AWS::Bedrock::AgentCoreEndpoint
Properties:
AgentName: customer-service
Runtime: python3.12
EntryPoint: agent.py:handler
Environment:
Variables:
MODEL_ID: anthropic.claude-sonnet-4-20250514
KNOWLEDGE_BASE_ID: !Ref KnowledgeBase

KnowledgeBase:
Type: AWS::Bedrock::KnowledgeBase
Properties:
Name: customer-faq
StorageConfiguration:
Type: OPENSEARCH_SERVERLESS
Production Deployment Guide

For detailed kubectl/helm commands, complete YAML manifests, and Python boto3 deployment scripts, refer to the Reference Architecture section. This document focuses on concepts and patterns of the AWS Native approach.


Enterprise Use Cases

Baemin (Woowa Brothers): RAG-based Knowledge Agent

ItemDetails
ChallengeReducing internal policy search time for customer service agents
ArchitectureStrands Agent + Bedrock Knowledge Bases + Claude
Results30% improvement in consultation efficiency, 90% reduction in policy search time
Core ValueCompleted knowledge Agent with just S3 document upload, without building a RAG pipeline

CJ OnStyle: Multi-Agent Live Commerce

ItemDetails
ChallengeAutomating real-time customer Q&A during live broadcasts
ArchitectureMulti-agent (Product Info Agent + Order Agent + Recommendation Agent)
Results3x improvement in customer response rate, real-time processing latency under 2 seconds
Core ValueAgentCore Runtime auto-scaling handled live broadcast traffic surges

Amazon Devices: Manufacturing Agent

ItemDetails
ChallengeAutomating quality inspection model fine-tuning for manufacturing lines
ArchitectureStrands Agent + Bedrock Fine-tuning + AgentCore
ResultsFine-tuning time reduced from days to 1 hour
Core ValueAgent auto-orchestrated data preprocessing → fine-tuning → evaluation

Cost Structure

The cost of an AgentCore-based platform follows a pay-as-you-go serverless model.

Billing Model

ServiceBilling BasisCharacteristics
Bedrock InferenceInput/output token countOn-demand or provisioned throughput options
AgentCore RuntimeSession time + memory usage$0 when idle, up to 8-hour sessions
Knowledge BasesStorage + query countOpenSearch Serverless based
GuardrailsProcessed text unitsBilled separately for input/output
Prompt Caching90% discount on cache hitsGreater savings with more repeated patterns

Operational Cost Savings

AreaSavings Factor
GPU ManagementNo need for GPU instance provisioning, patching, scaling operations personnel
Infrastructure OperationsServerless architecture eliminates cluster management burden
Security ComplianceLeverage AWS SOC 2, HIPAA, ISO 27001 certifications
Availability ManagementMulti-AZ automatic placement, DR built-in with Cross-Region Inference
Monitoring SetupNative CloudWatch integration eliminates need for separate monitoring stack
Cost Optimization Tips
  • Prompt Caching: Must enable for Agents with long system prompts
  • Provisioned Throughput: Up to 50% savings vs on-demand for stable traffic
  • Cross-Region Inference: Prevents throttling with auto-fallback when regional capacity is limited
  • Batch Inference: Use batch mode for evaluation/analysis tasks that don't require real-time processing to save costs

MCP Protocol and EKS Integration

MCP (Model Context Protocol) Overview

MCP is a standard communication protocol between AI Agents and tools:

  • Tool Discovery: Agents dynamically discover available tools
  • Context Passing: Execution context and state passed in standardized format
  • Result Return: Tool execution results returned in structured format
  • Inter-Agent Communication: Multi-agent collaboration via A2A protocol

EKS MCP Server Integration

AWS provides EKS-dedicated hosted MCP servers to support integration between Kubernetes clusters and AI Agents:

EKS MCP Server Features
Pod Log Retrieval
Real-time streaming of specific Pod logs
K8s Event Retrieval
Cluster event search and analysis
CloudWatch Metrics
Cluster metrics retrieval and analysis
Resource Status Check
Resource status retrieval for Deployments, Services, etc.
Troubleshooting
Agent-based automated diagnostics

EKS MCP Server Deployment Concept:

MCP servers run within the Kubernetes cluster, enabling Agents to query cluster status and perform operations without executing kubectl commands.

# Clone AWS MCP server repository
git clone https://github.com/awslabs/mcp.git
cd mcp/servers/eks

# Build Docker image and deploy to EKS
docker build -t eks-mcp-server:latest .
kubectl apply -f k8s/deployment.yaml

AgentCore + MCP Integration Pattern:

Bedrock AgentCore registers MCP servers as Action Groups, enabling Agents to use Kubernetes tools:

import boto3

bedrock_agent = boto3.client('bedrock-agent')

# Create Agent
response = bedrock_agent.create_agent(
agentName='sre-agent',
foundationModel='anthropic.claude-sonnet-4-20250514',
instruction='You are an SRE agent for Kubernetes troubleshooting.',
agentResourceRoleArn='arn:aws:iam::ACCOUNT:role/BedrockAgentRole',
)

# Connect MCP tools (Action Group)
bedrock_agent.create_agent_action_group(
agentId=response['agent']['agentId'],
agentVersion='DRAFT',
actionGroupName='eks-mcp-tools',
actionGroupExecutor={'customControl': 'RETURN_CONTROL'},
apiSchema={
'payload': {
'openapi': '3.0.0',
'info': {'title': 'EKS MCP Tools', 'version': '1.0'},
'paths': {
'/pod-logs': {'post': {'description': 'Get pod logs'}},
'/k8s-events': {'post': {'description': 'Get K8s events'}},
}
}
}
)
Production Deployment Details

For complete boto3 scripts, IAM policies, and YAML manifests, refer to the Reference Architecture section.

Hybrid Strategy with Self-hosted Agents

EKS-based self-hosted Agents and Bedrock AgentCore can be used together:

Kagent vs Bedrock AgentCore
Comparison ItemKagent (Self-managed)Bedrock AgentCore
Execution EnvironmentEKS PodAWS Managed Runtime
Model SelectionFlexible (vLLM, external API)Bedrock Models
Tool ProtocolCustom CRDMCP Standard
ScalingKarpenter/HPAAuto-scaling
CostGPU Infrastructure CostAPI Call Cost
Best ForGPU availability, custom modelsFast production deployment

Hybrid Approach: Route high-frequency, cost-sensitive calls to EKS self-hosted Agents, and low-frequency calls requiring complex reasoning to Bedrock AgentCore for effective strategy.

Multi-Agent Orchestration

AgentCore supports inter-Agent collaboration via MCP/A2A:

Multi-Agent Patterns
Sequential
Sequential agent invocation
Diagnose → Analyze → Remediate
Parallel
Parallel agent invocation
Simultaneous multi-cluster inspection
Hierarchical
Hierarchical agent structure
Master agent + specialist agents
Collaborative
Agent-to-agent collaboration
Complex problem solving

AWS MCP Server Ecosystem

AWS provides official MCP servers as open source (github.com/awslabs/mcp):

AWS MCP Server Ecosystem
EKS MCP Server
Kubernetes cluster management
CloudWatch MCP Server
Metrics and logs retrieval
IAM Policy Autopilot
Least privilege policy generation
S3 MCP Server
Object storage access
RDS MCP Server
Database management

CloudWatch Gen AI Observability Integration

CloudWatch Gen AI Observability GA

CloudWatch Generative AI Observability became GA in October 2025. Natively integrated with AgentCore, Agent invocations, tool executions, and token usage are automatically recorded in CloudWatch without additional configuration.

  • Agent Execution Tracing: End-to-end tracing for full inference flow visibility
  • Tool Call Monitoring: Per-MCP-server call count, latency, and error rate tracking
  • Token Consumption Analysis: Per-model input/output token usage and cost tracking
  • Anomaly Detection: Automatic detection of abnormal patterns via CloudWatch Anomaly Detection

Next Steps

References

Official Documentation

Papers / Technical Blogs