AgentCore Hybrid Strategy
The implementation samples in this document (IAM OAuth propagation, Lambda Trace Forwarder, Dual-write Memory, Cost-arbitrage Router) are for design reference and have no E2E integration test history for AgentCore·EKS combinations in us-east-2 after teardown (2026-04-18). E2E verification for IAM boundaries, Trace correlation, and 3 PrivateLink endpoint types will be performed before Phase 3 deployment.
Decision Matrix 8-axis weights, Hand-off pattern catalog, IAM session sharing flow, and migration roadmap are in pre-review state awaiting MLOps lead review and real-deployment E2E verification. Value footnotes and banner will be updated upon validation.
Deployment verification tracking: Issue #6
Overview
Bedrock AgentCore is a powerful managed Agent platform, but enterprise environments often require combination with self-hosted infrastructure. This document provides a decision framework and validated pattern catalog for designing optimal hybrid architecture that combines AgentCore's serverless advantages with EKS-based Self-hosted infrastructure flexibility.
Before reading this document, refer to:
- AWS Native Platform — AgentCore 7-service overview (avoid duplication)
- EKS-based Open Architecture — Self-hosted stack composition
- AI Platform Selection Guide — Managed vs open-source decision
- SageMaker-EKS Integration — Hybrid VPC/IAM reference
Hybrid Deployment Motivation
Single Approach Limitations
Constraints of AgentCore Only:
- Bedrock GA 100+ model-centric (cannot host self fine-tuned SLMs)
- Token-based pricing (cost increase for high-frequency simple tasks)
- Latency with on-premises data sources
- Complex PrivateLink setup when MCP servers are inside VPC
Constraints of EKS Self-hosted Only:
- Agent Runtime infrastructure operational burden (Kagent Pod + Redis State Store)
- Complex autoscaling vs serverless scaling (KEDA Queue-based)
- Lack of managed memory management (direct implementation required)
- Direct multi-agent orchestration framework construction
Core Value of Hybrid
Cost Break-even Calculation
| Monthly Inference Volume | AgentCore Only | EKS Self-hosted Only | Hybrid (Cascade) | Optimal Approach |
|---|---|---|---|---|
| ~100K requests | $300-500 | $800-1,200 | $400-700 | AgentCore Only |
| ~500K requests | $1,500-2,000 | $1,200-1,800 | $800-1,200 | Hybrid starting point |
| ~1.5M requests | $4,500-6,000 | $2,500-3,500 | $2,000-2,800 | Hybrid required |
| ~5M+ requests | $15,000+ | $3,500-5,000 | $4,000-6,000 | EKS-centric Hybrid |
Hybrid approach becomes cost-effective at 500K+ monthly inference volumes. See Coding Tools Cost Analysis for detailed calculation formulas.
Decision Matrix: Where to Place Agents
Evaluate agent placement using 8 core dimensions.
| Evaluation Axis | AgentCore | EKS Kagent | Hybrid | Decision Criteria |
|---|---|---|---|---|
| Inference Latency | Medium (50-200ms) | Low (10-50ms) | Low | VPC internal tool calls → EKS |
| Cost | High for high-frequency | Low for high-frequency | Optimal | Simple=EKS, Complex=AgentCore |
| PII Handling | VPC external (constraints) | VPC internal (advantageous) | Flexible | Sensitive data → EKS MCP |
| Model Customization | Bedrock models only | Free (Qwen3, custom) | Free | Fine-tuned models → EKS |
| Tool Chain | REST→MCP conversion | K8s native | Both | External SaaS → AgentCore Gateway |
| Session Length | Max 8 hours | Unlimited | Unlimited | Long conversations → EKS State |
| Audit Requirements | CloudTrail automatic | Direct implementation required | CloudTrail + Custom | Regulatory → AgentCore priority |
| Team Capability | Kubernetes unnecessary | Kubernetes required | Optional | K8s beginner → AgentCore-centric |
Decision Flowchart
Data Gravity and Tool Colocation Patterns
What is Data Gravity?
Placing computing where data resides minimizes network latency and cost.
Typical Scenario:
- Milvus vector DB inside EKS VPC (GB~TB scale)
- AgentCore Runtime outside VPC (Bedrock service account)
- When Agent queries Milvus for RAG search PrivateLink traversal required → Increased latency + complexity
Reverse Call Pattern
Architecture where AgentCore Runtime calls MCP servers inside EKS VPC.
PrivateLink Setup
# privatelink-mcp-endpoint.yaml
apiVersion: v1
kind: Service
metadata:
name: mcp-server-nlb
namespace: mcp-system
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
spec:
type: LoadBalancer
selector:
app: mcp-server
ports:
- port: 443
targetPort: 8080
protocol: TCP
---
# Create VPC Endpoint Service (AWS Console or Terraform)
# 1. Confirm NLB ARN
# 2. Create VPC Endpoint Service (Acceptance required: No)
# 3. Add Endpoint access permission to AgentCore IAM Role
S3+KMS Boundary Setup
Securely share sensitive data between AgentCore and EKS via S3 + KMS encryption.
# secure_artifact_manager.py
import boto3
import json
class SecureArtifactManager:
def __init__(self, bucket: str, kms_key_id: str):
self.s3 = boto3.client('s3')
self.kms = boto3.client('kms')
self.bucket = bucket
self.kms_key_id = kms_key_id
def store_sensitive_result(self, agent_id: str, session_id: str, data: dict) -> str:
"""Store sensitive results encrypted in S3"""
key = f"agentcore/{agent_id}/{session_id}/result.json"
self.s3.put_object(
Bucket=self.bucket,
Key=key,
Body=json.dumps(data),
ServerSideEncryption='aws:kms',
SSEKMSKeyId=self.kms_key_id,
Metadata={'pii': 'true', 'agent-session': session_id}
)
return f"s3://{self.bucket}/{key}"
def load_from_eks(self, s3_uri: str) -> dict:
"""Load S3 object from EKS Pod (KMS decryption with Pod Identity)"""
bucket, key = s3_uri.replace('s3://', '').split('/', 1)
response = self.s3.get_object(Bucket=bucket, Key=key)
return json.loads(response['Body'].read())
IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/AgentCoreExecutionRole"
},
"Action": ["s3:PutObject"],
"Resource": "arn:aws:s3:::my-secure-artifacts/agentcore/*",
"Condition": {
"StringEquals": {"s3:x-amz-server-side-encryption": "aws:kms"}
}
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/EKSPodRole"
},
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-secure-artifacts/agentcore/*"
}
]
}
Hand-off Pattern Catalog
Pattern (a): Router-front (AgentCore Gateway→Self-hosted)
AgentCore Gateway analyzes requests and routes to either AgentCore Agent or EKS Self-hosted Agent.
Classification Criteria:
| Complexity Score | Routing Target | Example Tasks |
|---|---|---|
| 0.0-0.3 | EKS Qwen3-4B | Code completion, translation, summarization |
| 0.3-0.7 | AgentCore Claude Haiku | Basic analysis, simple reasoning |
| 0.7-1.0 | AgentCore Claude Sonnet | Architecture review, complex reasoning |
Implementation:
# classifier_router.py
from strands import Agent
from strands.models import BedrockModel
import boto3
bedrock_runtime = boto3.client('bedrock-agent-runtime')
class HybridRouter:
def __init__(self):
self.classifier = Agent(
model=BedrockModel(model_id="anthropic.claude-haiku-20250320"),
system_prompt="""You are a request complexity classifier.
Evaluate complexity on a 0.0-1.0 scale and respond with JSON.
{"complexity": 0.0-1.0, "reason": "explanation"}"""
)
def route(self, user_request: str) -> dict:
classification = self.classifier(f"Request: {user_request}")
complexity = classification['complexity']
if complexity < 0.3:
return self._route_to_eks(user_request)
elif complexity < 0.7:
return self._route_to_agentcore(user_request, model='haiku')
else:
return self._route_to_agentcore(user_request, model='sonnet')
def _route_to_eks(self, request: str) -> dict:
"""Route to EKS Kagent"""
import requests
response = requests.post(
"http://kagent-service.agents.svc.cluster.local/invoke",
json={"prompt": request, "model": "qwen3-4b"}
)
return {"response": response.json(), "routed_to": "eks-kagent"}
def _route_to_agentcore(self, request: str, model: str) -> dict:
"""Route to AgentCore"""
response = bedrock_runtime.invoke_agent(
agentId='AGENT123',
agentAliasId='ALIAS456',
sessionId='session-' + str(hash(request)),
inputText=request
)
return {"response": response, "routed_to": f"agentcore-{model}"}
Pattern (b): Escalation (Qwen3 Self→AgentCore Reasoning)
EKS Self-hosted Agent processes first, escalates to AgentCore when complexity exceeds threshold.
Escalation Triggers:
- LLM response confidence score < 0.7
- Tool call failure 2+ times
- User explicit request ("need more accurate answer")
Implementation:
# escalation_agent.py
from strands import Agent
import boto3
class EscalatingAgent:
def __init__(self):
self.primary_agent = Agent(
model=LocalModel("http://vllm-qwen3.vllm.svc.cluster.local"),
tools=["code_completion", "translation"]
)
self.bedrock_runtime = boto3.client('bedrock-agent-runtime')
def process(self, user_request: str) -> dict:
# 1st attempt: EKS Self-hosted Agent
response = self.primary_agent(user_request)
confidence = response.metadata.get('confidence', 0.0)
if confidence >= 0.7:
return {"response": response, "agent": "eks-qwen3", "confidence": confidence}
# Escalation: AgentCore Claude Sonnet
print(f"⚠️ Low confidence ({confidence}) → AgentCore escalation")
agentcore_response = self.bedrock_runtime.invoke_agent(
agentId='EXPERT_AGENT_ID',
agentAliasId='PROD_ALIAS',
sessionId='escalation-session',
inputText=f"Original request: {user_request}\n\nInitial attempt failed (Confidence: {confidence}). Accurate answer required."
)
return {"response": agentcore_response, "agent": "agentcore-sonnet", "escalated": True}
Pattern (c): Dual-write Memory (AgentCore Memory↔EKS Langfuse)
Synchronize conversation history between AgentCore and EKS Agents to maintain consistent context.
Synchronization Strategy:
| Event | AgentCore → EKS | EKS → AgentCore |
|---|---|---|
| Session start | Memory Session ID → S3 | Langfuse Trace ID → DynamoDB |
| Tool invocation | Action Group execution log → CloudWatch → Langfuse | Langfuse Span → CloudWatch Logs Insights |
| Session end | Memory summarization → S3 → Langfuse | Langfuse session statistics → AgentCore Analytics |
Implementation:
# dual_memory_sync.py
import boto3
from langfuse import Langfuse
from datetime import datetime
class DualMemoryManager:
def __init__(self):
self.s3 = boto3.client('s3')
self.langfuse = Langfuse(
public_key="lf_pk_...",
secret_key="lf_sk_...",
host="https://langfuse.eks.internal"
)
self.agentcore_memory_bucket = "agentcore-memory-export"
def sync_agentcore_to_langfuse(self, agent_id: str, session_id: str):
"""AgentCore Memory → Langfuse synchronization"""
# Export AgentCore Memory (S3)
memory_key = f"{agent_id}/{session_id}/memory.json"
memory_obj = self.s3.get_object(Bucket=self.agentcore_memory_bucket, Key=memory_key)
memory_data = json.loads(memory_obj['Body'].read())
# Create Langfuse Trace
trace = self.langfuse.trace(
id=session_id,
name=f"AgentCore Session {agent_id}",
metadata={"source": "agentcore", "agent_id": agent_id}
)
for turn in memory_data['conversation']:
trace.span(
name=f"Turn {turn['turn_id']}",
input=turn['user_input'],
output=turn['agent_response'],
metadata={"timestamp": turn['timestamp']}
)
trace.update(output=memory_data.get('summary'))
print(f"✅ AgentCore Memory → Langfuse sync complete: {session_id}")
def sync_langfuse_to_agentcore(self, trace_id: str, agent_id: str):
"""Langfuse → AgentCore Memory synchronization"""
trace = self.langfuse.get_trace(trace_id)
# Convert to AgentCore Memory format
memory_data = {
"agent_id": agent_id,
"session_id": trace_id,
"conversation": [
{"turn_id": i, "user_input": span.input, "agent_response": span.output}
for i, span in enumerate(trace.spans)
],
"synced_at": datetime.utcnow().isoformat()
}
# Upload to S3 (AgentCore imports)
self.s3.put_object(
Bucket=self.agentcore_memory_bucket,
Key=f"{agent_id}/{trace_id}/imported-memory.json",
Body=json.dumps(memory_data)
)
print(f"✅ Langfuse → AgentCore Memory sync complete: {trace_id}")
Pattern (d): Cost-arbitrage (High-freq=EKS, Low-freq Complex=AgentCore)
Select cost-optimal Agent based on request frequency and complexity.
Cost Model:
| Scenario | Monthly Requests | Avg Tokens | AgentCore Cost | EKS Cost | Optimal Choice |
|---|---|---|---|---|---|
| Code completion | 5M requests | 300 tokens | ~$15,000 | ~$3,500 | EKS |
| Architecture review | 50K requests | 5,000 tokens | ~$2,500 | $3,500 (GPU idle) | AgentCore |
| Translation | 2M requests | 500 tokens | ~$10,000 | ~$2,000 | EKS |
| Complex reasoning | 100K requests | 8,000 tokens | ~$8,000 | $4,000 (dedicated GPU) | AgentCore |
Routing Logic:
# cost_arbitrage_router.py
class CostArbitrageRouter:
def __init__(self):
self.request_counts = {} # Track request frequency
# Cost coefficients (example)
self.agentcore_cost_per_1k_tokens = 0.003 # Claude Haiku
self.eks_fixed_monthly = 500 # GPU instance fixed cost
self.eks_break_even_requests = 200000 # Break-even point
def should_use_eks(self, task_type: str, estimated_tokens: int) -> bool:
"""Cost-based routing decision"""
monthly_requests = self.request_counts.get(task_type, 0)
# High-frequency tasks → EKS
if monthly_requests > self.eks_break_even_requests:
return True
# Low-frequency + complex → AgentCore
if estimated_tokens > 5000 and monthly_requests < 50000:
return False
# Simple task → EKS (amortize fixed cost)
if estimated_tokens < 1000:
return True
return False # Default: AgentCore
IAM, Session, and Observability Integration Boundaries
AgentCore Identity OAuth Token Propagation
Safely propagate OAuth tokens issued by AgentCore Identity to EKS MCP servers.
EKS MCP Server Authentication Verification:
# mcp_auth_middleware.py
import jwt
from functools import wraps
from flask import request, jsonify
def validate_agentcore_token(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('X-Forwarded-Authorization', '').replace('Bearer ', '')
if not token:
return jsonify({"error": "Missing authorization token"}), 401
try:
# Verify with AgentCore Identity public key
payload = jwt.decode(
token,
audience="mcp-server",
issuer="https://bedrock.amazonaws.com/agentcore/identity",
algorithms=["RS256"],
options={"verify_signature": True}
)
request.user_id = payload['sub']
request.scopes = payload['scope']
return f(*args, **kwargs)
except jwt.ExpiredSignatureError:
return jsonify({"error": "Token expired"}), 401
except jwt.InvalidTokenError:
return jsonify({"error": "Invalid token"}), 401
return decorated
@app.route('/mcp/customer-lookup', methods=['POST'])
@validate_agentcore_token
def customer_lookup():
"""Only authenticated users can query customers"""
customer_id = request.json.get('customer_id')
# Record audit log with request.user_id
return {"customer": fetch_customer(customer_id)}
CloudWatch GenAI Observability ↔ Langfuse OTel Bridge
Integrate AgentCore traces and EKS Langfuse traces to track complete Agent flows.
Trace Correlation ID Rules:
| Source | Trace ID Format | Parent Span ID |
|---|---|---|
| AgentCore | ac-{session_id}-{timestamp} | ac-root |
| EKS Kagent | eks-{pod_name}-{trace_id} | ac-{session_id} (when calling AgentCore) |
| Hybrid Trace | hybrid-{session_id} | Shared by both sides |
Lambda Trace Forwarder:
# trace_forwarder_lambda.py
import boto3
import json
import requests
from datetime import datetime
cloudwatch = boto3.client('logs')
langfuse_endpoint = "https://langfuse.eks.internal/api/public/ingestion"
def lambda_handler(event, context):
"""CloudWatch GenAI Observability → Langfuse forwarding"""
for record in event['Records']:
message = json.loads(record['Sns']['Message'])
if message['source'] == 'aws.bedrock.agentcore':
trace_data = message['detail']
# Convert to Langfuse format
langfuse_trace = {
"id": f"hybrid-{trace_data['sessionId']}",
"name": f"AgentCore {trace_data['agentId']}",
"metadata": {
"source": "agentcore",
"agent_id": trace_data['agentId'],
"aws_region": message['region']
},
"spans": [
{
"name": step['actionGroupName'],
"input": step['input'],
"output": step['output'],
"start_time": step['startTime'],
"end_time": step['endTime']
}
for step in trace_data.get('actionGroupInvocations', [])
]
}
# Send to Langfuse
response = requests.post(
langfuse_endpoint,
json=langfuse_trace,
headers={"Authorization": f"Bearer {os.environ['LANGFUSE_API_KEY']}"}
)
print(f"✅ Trace forwarding complete: {trace_data['sessionId']} → Langfuse")
return {"statusCode": 200}
Gradual Migration Roadmap
Phase 1: AgentCore Only (0-3 months)
Goal: Fast production deployment, zero infrastructure operational burden
Checklist:
- Select Bedrock model (Claude Sonnet/Haiku)
- Agent implementation with Strands SDK
- Deploy to AgentCore (
agentcore deploy) - Configure Knowledge Bases RAG
- Enable CloudWatch GenAI Observability
Exit Criteria (Phase 2 transition triggers):
- Monthly inference volume exceeds 500K requests
- Bedrock token cost exceeds $1,500/month
- High VPC internal tool call frequency (p95 latency > 100ms)
Phase 2: Bedrock + Self-hosted SLM (3-6 months)
Goal: Cost optimization, offload simple tasks to EKS Qwen3-4B
Checklist:
- Configure EKS cluster (Auto Mode or Karpenter)
- Deploy Qwen3-4B with vLLM
- Implement LLM Classifier (Cascade Routing)
- Configure kgateway + Bifrost 2-Tier Gateway
- Build cost dashboard (track AgentCore vs EKS cost)
Exit Criteria (Phase 3 transition triggers):
- Context sharing required between EKS Agent and AgentCore Agent
- Require maintaining same session on both sides
- Fine-tuned custom models needed
Phase 3: Full Hybrid Cross-routing (6-12 months)
Goal: Bidirectional routing, unified context, optimal cost
Checklist:
- Implement dual-write Memory synchronization (pattern c)
- Integrate Trace Correlation ID
- PrivateLink Endpoint for MCP
- Implement cost-arbitrage Router (pattern d)
- Implement escalation logic (pattern b)
- Unified dashboard (AgentCore + EKS integrated observability)
Success Metrics:
- Cost reduction rate: 40-60% (vs Bedrock Only)
- p95 latency: 20% improvement vs AgentCore standalone
- Session context consistency: 95%+
- Agent availability: 99.9% (both-side Failover)
Transition Trigger Metrics
Quantitative metrics to determine each Phase transition.
| Metric | Phase 1 → 2 Threshold | Phase 2 → 3 Threshold |
|---|---|---|
| Monthly Inference Volume | > 500K requests | > 1.5M requests |
| Monthly Cost | > $1,500 | > $3,000 |
| Average Latency (p95) | > 100ms | > 200ms |
| Session Context Loss Rate | N/A | > 5% |
| Custom Model Requirement | Fine-tuning needed | Domain-specific SLM needed |
| Team K8s Capability | Beginner | Intermediate+ |
References
Official Documentation
- Amazon Bedrock AgentCore — AgentCore official documentation
- AgentCore Identity & Policy — Authentication & policy guide
- EKS PrivateLink — VPC internal connection
- AWS PrivateLink for Services — Service endpoints
Papers / Technical Blogs
- CloudWatch Generative AI Observability — Observability integration
- Hybrid AI Architecture Patterns — Hybrid patterns
- Langfuse Self-Hosting Guide — Self-hosting guide
- Building Cost-Effective AI Systems — Cost optimization
Related Documents (Internal)
- AWS Native Platform — AgentCore 7-service overview
- EKS-based Open Architecture — Self-hosted stack
- SageMaker-EKS Integration — VPC/IAM reference
- Coding Tools Cost Analysis — Break-even calculation