Skip to main content

AgentCore Hybrid Strategy

Verification Status

The implementation samples in this document (IAM OAuth propagation, Lambda Trace Forwarder, Dual-write Memory, Cost-arbitrage Router) are for design reference and have no E2E integration test history for AgentCore·EKS combinations in us-east-2 after teardown (2026-04-18). E2E verification for IAM boundaries, Trace correlation, and 3 PrivateLink endpoint types will be performed before Phase 3 deployment.

Verification Pending

Decision Matrix 8-axis weights, Hand-off pattern catalog, IAM session sharing flow, and migration roadmap are in pre-review state awaiting MLOps lead review and real-deployment E2E verification. Value footnotes and banner will be updated upon validation.

Deployment verification tracking: Issue #6

Overview

Bedrock AgentCore is a powerful managed Agent platform, but enterprise environments often require combination with self-hosted infrastructure. This document provides a decision framework and validated pattern catalog for designing optimal hybrid architecture that combines AgentCore's serverless advantages with EKS-based Self-hosted infrastructure flexibility.

Prerequisite Documents

Before reading this document, refer to:


Hybrid Deployment Motivation

Single Approach Limitations

Constraints of AgentCore Only:

  • Bedrock GA 100+ model-centric (cannot host self fine-tuned SLMs)
  • Token-based pricing (cost increase for high-frequency simple tasks)
  • Latency with on-premises data sources
  • Complex PrivateLink setup when MCP servers are inside VPC

Constraints of EKS Self-hosted Only:

  • Agent Runtime infrastructure operational burden (Kagent Pod + Redis State Store)
  • Complex autoscaling vs serverless scaling (KEDA Queue-based)
  • Lack of managed memory management (direct implementation required)
  • Direct multi-agent orchestration framework construction

Core Value of Hybrid

Cost Break-even Calculation

Monthly Inference VolumeAgentCore OnlyEKS Self-hosted OnlyHybrid (Cascade)Optimal Approach
~100K requests$300-500$800-1,200$400-700AgentCore Only
~500K requests$1,500-2,000$1,200-1,800$800-1,200Hybrid starting point
~1.5M requests$4,500-6,000$2,500-3,500$2,000-2,800Hybrid required
~5M+ requests$15,000+$3,500-5,000$4,000-6,000EKS-centric Hybrid
Break-even Point

Hybrid approach becomes cost-effective at 500K+ monthly inference volumes. See Coding Tools Cost Analysis for detailed calculation formulas.


Decision Matrix: Where to Place Agents

Evaluate agent placement using 8 core dimensions.

Evaluation AxisAgentCoreEKS KagentHybridDecision Criteria
Inference LatencyMedium (50-200ms)Low (10-50ms)LowVPC internal tool calls → EKS
CostHigh for high-frequencyLow for high-frequencyOptimalSimple=EKS, Complex=AgentCore
PII HandlingVPC external (constraints)VPC internal (advantageous)FlexibleSensitive data → EKS MCP
Model CustomizationBedrock models onlyFree (Qwen3, custom)FreeFine-tuned models → EKS
Tool ChainREST→MCP conversionK8s nativeBothExternal SaaS → AgentCore Gateway
Session LengthMax 8 hoursUnlimitedUnlimitedLong conversations → EKS State
Audit RequirementsCloudTrail automaticDirect implementation requiredCloudTrail + CustomRegulatory → AgentCore priority
Team CapabilityKubernetes unnecessaryKubernetes requiredOptionalK8s beginner → AgentCore-centric

Decision Flowchart


Data Gravity and Tool Colocation Patterns

What is Data Gravity?

Placing computing where data resides minimizes network latency and cost.

Typical Scenario:

  • Milvus vector DB inside EKS VPC (GB~TB scale)
  • AgentCore Runtime outside VPC (Bedrock service account)
  • When Agent queries Milvus for RAG search PrivateLink traversal required → Increased latency + complexity

Reverse Call Pattern

Architecture where AgentCore Runtime calls MCP servers inside EKS VPC.

# privatelink-mcp-endpoint.yaml
apiVersion: v1
kind: Service
metadata:
name: mcp-server-nlb
namespace: mcp-system
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
spec:
type: LoadBalancer
selector:
app: mcp-server
ports:
- port: 443
targetPort: 8080
protocol: TCP
---
# Create VPC Endpoint Service (AWS Console or Terraform)
# 1. Confirm NLB ARN
# 2. Create VPC Endpoint Service (Acceptance required: No)
# 3. Add Endpoint access permission to AgentCore IAM Role

S3+KMS Boundary Setup

Securely share sensitive data between AgentCore and EKS via S3 + KMS encryption.

# secure_artifact_manager.py
import boto3
import json

class SecureArtifactManager:
def __init__(self, bucket: str, kms_key_id: str):
self.s3 = boto3.client('s3')
self.kms = boto3.client('kms')
self.bucket = bucket
self.kms_key_id = kms_key_id

def store_sensitive_result(self, agent_id: str, session_id: str, data: dict) -> str:
"""Store sensitive results encrypted in S3"""
key = f"agentcore/{agent_id}/{session_id}/result.json"

self.s3.put_object(
Bucket=self.bucket,
Key=key,
Body=json.dumps(data),
ServerSideEncryption='aws:kms',
SSEKMSKeyId=self.kms_key_id,
Metadata={'pii': 'true', 'agent-session': session_id}
)
return f"s3://{self.bucket}/{key}"

def load_from_eks(self, s3_uri: str) -> dict:
"""Load S3 object from EKS Pod (KMS decryption with Pod Identity)"""
bucket, key = s3_uri.replace('s3://', '').split('/', 1)
response = self.s3.get_object(Bucket=bucket, Key=key)
return json.loads(response['Body'].read())

IAM Policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/AgentCoreExecutionRole"
},
"Action": ["s3:PutObject"],
"Resource": "arn:aws:s3:::my-secure-artifacts/agentcore/*",
"Condition": {
"StringEquals": {"s3:x-amz-server-side-encryption": "aws:kms"}
}
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/EKSPodRole"
},
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-secure-artifacts/agentcore/*"
}
]
}

Hand-off Pattern Catalog

Pattern (a): Router-front (AgentCore Gateway→Self-hosted)

AgentCore Gateway analyzes requests and routes to either AgentCore Agent or EKS Self-hosted Agent.

Classification Criteria:

Complexity ScoreRouting TargetExample Tasks
0.0-0.3EKS Qwen3-4BCode completion, translation, summarization
0.3-0.7AgentCore Claude HaikuBasic analysis, simple reasoning
0.7-1.0AgentCore Claude SonnetArchitecture review, complex reasoning

Implementation:

# classifier_router.py
from strands import Agent
from strands.models import BedrockModel
import boto3

bedrock_runtime = boto3.client('bedrock-agent-runtime')

class HybridRouter:
def __init__(self):
self.classifier = Agent(
model=BedrockModel(model_id="anthropic.claude-haiku-20250320"),
system_prompt="""You are a request complexity classifier.
Evaluate complexity on a 0.0-1.0 scale and respond with JSON.
{"complexity": 0.0-1.0, "reason": "explanation"}"""
)

def route(self, user_request: str) -> dict:
classification = self.classifier(f"Request: {user_request}")
complexity = classification['complexity']

if complexity < 0.3:
return self._route_to_eks(user_request)
elif complexity < 0.7:
return self._route_to_agentcore(user_request, model='haiku')
else:
return self._route_to_agentcore(user_request, model='sonnet')

def _route_to_eks(self, request: str) -> dict:
"""Route to EKS Kagent"""
import requests
response = requests.post(
"http://kagent-service.agents.svc.cluster.local/invoke",
json={"prompt": request, "model": "qwen3-4b"}
)
return {"response": response.json(), "routed_to": "eks-kagent"}

def _route_to_agentcore(self, request: str, model: str) -> dict:
"""Route to AgentCore"""
response = bedrock_runtime.invoke_agent(
agentId='AGENT123',
agentAliasId='ALIAS456',
sessionId='session-' + str(hash(request)),
inputText=request
)
return {"response": response, "routed_to": f"agentcore-{model}"}

Pattern (b): Escalation (Qwen3 Self→AgentCore Reasoning)

EKS Self-hosted Agent processes first, escalates to AgentCore when complexity exceeds threshold.

Escalation Triggers:

  • LLM response confidence score < 0.7
  • Tool call failure 2+ times
  • User explicit request ("need more accurate answer")

Implementation:

# escalation_agent.py
from strands import Agent
import boto3

class EscalatingAgent:
def __init__(self):
self.primary_agent = Agent(
model=LocalModel("http://vllm-qwen3.vllm.svc.cluster.local"),
tools=["code_completion", "translation"]
)
self.bedrock_runtime = boto3.client('bedrock-agent-runtime')

def process(self, user_request: str) -> dict:
# 1st attempt: EKS Self-hosted Agent
response = self.primary_agent(user_request)
confidence = response.metadata.get('confidence', 0.0)

if confidence >= 0.7:
return {"response": response, "agent": "eks-qwen3", "confidence": confidence}

# Escalation: AgentCore Claude Sonnet
print(f"⚠️ Low confidence ({confidence}) → AgentCore escalation")
agentcore_response = self.bedrock_runtime.invoke_agent(
agentId='EXPERT_AGENT_ID',
agentAliasId='PROD_ALIAS',
sessionId='escalation-session',
inputText=f"Original request: {user_request}\n\nInitial attempt failed (Confidence: {confidence}). Accurate answer required."
)
return {"response": agentcore_response, "agent": "agentcore-sonnet", "escalated": True}

Pattern (c): Dual-write Memory (AgentCore Memory↔EKS Langfuse)

Synchronize conversation history between AgentCore and EKS Agents to maintain consistent context.

Synchronization Strategy:

EventAgentCore → EKSEKS → AgentCore
Session startMemory Session ID → S3Langfuse Trace ID → DynamoDB
Tool invocationAction Group execution log → CloudWatch → LangfuseLangfuse Span → CloudWatch Logs Insights
Session endMemory summarization → S3 → LangfuseLangfuse session statistics → AgentCore Analytics

Implementation:

# dual_memory_sync.py
import boto3
from langfuse import Langfuse
from datetime import datetime

class DualMemoryManager:
def __init__(self):
self.s3 = boto3.client('s3')
self.langfuse = Langfuse(
public_key="lf_pk_...",
secret_key="lf_sk_...",
host="https://langfuse.eks.internal"
)
self.agentcore_memory_bucket = "agentcore-memory-export"

def sync_agentcore_to_langfuse(self, agent_id: str, session_id: str):
"""AgentCore Memory → Langfuse synchronization"""
# Export AgentCore Memory (S3)
memory_key = f"{agent_id}/{session_id}/memory.json"
memory_obj = self.s3.get_object(Bucket=self.agentcore_memory_bucket, Key=memory_key)
memory_data = json.loads(memory_obj['Body'].read())

# Create Langfuse Trace
trace = self.langfuse.trace(
id=session_id,
name=f"AgentCore Session {agent_id}",
metadata={"source": "agentcore", "agent_id": agent_id}
)

for turn in memory_data['conversation']:
trace.span(
name=f"Turn {turn['turn_id']}",
input=turn['user_input'],
output=turn['agent_response'],
metadata={"timestamp": turn['timestamp']}
)

trace.update(output=memory_data.get('summary'))
print(f"✅ AgentCore Memory → Langfuse sync complete: {session_id}")

def sync_langfuse_to_agentcore(self, trace_id: str, agent_id: str):
"""Langfuse → AgentCore Memory synchronization"""
trace = self.langfuse.get_trace(trace_id)

# Convert to AgentCore Memory format
memory_data = {
"agent_id": agent_id,
"session_id": trace_id,
"conversation": [
{"turn_id": i, "user_input": span.input, "agent_response": span.output}
for i, span in enumerate(trace.spans)
],
"synced_at": datetime.utcnow().isoformat()
}

# Upload to S3 (AgentCore imports)
self.s3.put_object(
Bucket=self.agentcore_memory_bucket,
Key=f"{agent_id}/{trace_id}/imported-memory.json",
Body=json.dumps(memory_data)
)
print(f"✅ Langfuse → AgentCore Memory sync complete: {trace_id}")

Pattern (d): Cost-arbitrage (High-freq=EKS, Low-freq Complex=AgentCore)

Select cost-optimal Agent based on request frequency and complexity.

Cost Model:

ScenarioMonthly RequestsAvg TokensAgentCore CostEKS CostOptimal Choice
Code completion5M requests300 tokens~$15,000~$3,500EKS
Architecture review50K requests5,000 tokens~$2,500$3,500 (GPU idle)AgentCore
Translation2M requests500 tokens~$10,000~$2,000EKS
Complex reasoning100K requests8,000 tokens~$8,000$4,000 (dedicated GPU)AgentCore

Routing Logic:

# cost_arbitrage_router.py
class CostArbitrageRouter:
def __init__(self):
self.request_counts = {} # Track request frequency

# Cost coefficients (example)
self.agentcore_cost_per_1k_tokens = 0.003 # Claude Haiku
self.eks_fixed_monthly = 500 # GPU instance fixed cost
self.eks_break_even_requests = 200000 # Break-even point

def should_use_eks(self, task_type: str, estimated_tokens: int) -> bool:
"""Cost-based routing decision"""
monthly_requests = self.request_counts.get(task_type, 0)

# High-frequency tasks → EKS
if monthly_requests > self.eks_break_even_requests:
return True

# Low-frequency + complex → AgentCore
if estimated_tokens > 5000 and monthly_requests < 50000:
return False

# Simple task → EKS (amortize fixed cost)
if estimated_tokens < 1000:
return True

return False # Default: AgentCore

IAM, Session, and Observability Integration Boundaries

AgentCore Identity OAuth Token Propagation

Safely propagate OAuth tokens issued by AgentCore Identity to EKS MCP servers.

EKS MCP Server Authentication Verification:

# mcp_auth_middleware.py
import jwt
from functools import wraps
from flask import request, jsonify

def validate_agentcore_token(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('X-Forwarded-Authorization', '').replace('Bearer ', '')

if not token:
return jsonify({"error": "Missing authorization token"}), 401

try:
# Verify with AgentCore Identity public key
payload = jwt.decode(
token,
audience="mcp-server",
issuer="https://bedrock.amazonaws.com/agentcore/identity",
algorithms=["RS256"],
options={"verify_signature": True}
)
request.user_id = payload['sub']
request.scopes = payload['scope']
return f(*args, **kwargs)
except jwt.ExpiredSignatureError:
return jsonify({"error": "Token expired"}), 401
except jwt.InvalidTokenError:
return jsonify({"error": "Invalid token"}), 401

return decorated

@app.route('/mcp/customer-lookup', methods=['POST'])
@validate_agentcore_token
def customer_lookup():
"""Only authenticated users can query customers"""
customer_id = request.json.get('customer_id')
# Record audit log with request.user_id
return {"customer": fetch_customer(customer_id)}

CloudWatch GenAI Observability ↔ Langfuse OTel Bridge

Integrate AgentCore traces and EKS Langfuse traces to track complete Agent flows.

Trace Correlation ID Rules:

SourceTrace ID FormatParent Span ID
AgentCoreac-{session_id}-{timestamp}ac-root
EKS Kagenteks-{pod_name}-{trace_id}ac-{session_id} (when calling AgentCore)
Hybrid Tracehybrid-{session_id}Shared by both sides

Lambda Trace Forwarder:

# trace_forwarder_lambda.py
import boto3
import json
import requests
from datetime import datetime

cloudwatch = boto3.client('logs')
langfuse_endpoint = "https://langfuse.eks.internal/api/public/ingestion"

def lambda_handler(event, context):
"""CloudWatch GenAI Observability → Langfuse forwarding"""
for record in event['Records']:
message = json.loads(record['Sns']['Message'])

if message['source'] == 'aws.bedrock.agentcore':
trace_data = message['detail']

# Convert to Langfuse format
langfuse_trace = {
"id": f"hybrid-{trace_data['sessionId']}",
"name": f"AgentCore {trace_data['agentId']}",
"metadata": {
"source": "agentcore",
"agent_id": trace_data['agentId'],
"aws_region": message['region']
},
"spans": [
{
"name": step['actionGroupName'],
"input": step['input'],
"output": step['output'],
"start_time": step['startTime'],
"end_time": step['endTime']
}
for step in trace_data.get('actionGroupInvocations', [])
]
}

# Send to Langfuse
response = requests.post(
langfuse_endpoint,
json=langfuse_trace,
headers={"Authorization": f"Bearer {os.environ['LANGFUSE_API_KEY']}"}
)
print(f"✅ Trace forwarding complete: {trace_data['sessionId']} → Langfuse")

return {"statusCode": 200}

Gradual Migration Roadmap

Phase 1: AgentCore Only (0-3 months)

Goal: Fast production deployment, zero infrastructure operational burden

Checklist:

  • Select Bedrock model (Claude Sonnet/Haiku)
  • Agent implementation with Strands SDK
  • Deploy to AgentCore (agentcore deploy)
  • Configure Knowledge Bases RAG
  • Enable CloudWatch GenAI Observability

Exit Criteria (Phase 2 transition triggers):

  • Monthly inference volume exceeds 500K requests
  • Bedrock token cost exceeds $1,500/month
  • High VPC internal tool call frequency (p95 latency > 100ms)

Phase 2: Bedrock + Self-hosted SLM (3-6 months)

Goal: Cost optimization, offload simple tasks to EKS Qwen3-4B

Checklist:

  • Configure EKS cluster (Auto Mode or Karpenter)
  • Deploy Qwen3-4B with vLLM
  • Implement LLM Classifier (Cascade Routing)
  • Configure kgateway + Bifrost 2-Tier Gateway
  • Build cost dashboard (track AgentCore vs EKS cost)

Exit Criteria (Phase 3 transition triggers):

  • Context sharing required between EKS Agent and AgentCore Agent
  • Require maintaining same session on both sides
  • Fine-tuned custom models needed

Phase 3: Full Hybrid Cross-routing (6-12 months)

Goal: Bidirectional routing, unified context, optimal cost

Checklist:

  • Implement dual-write Memory synchronization (pattern c)
  • Integrate Trace Correlation ID
  • PrivateLink Endpoint for MCP
  • Implement cost-arbitrage Router (pattern d)
  • Implement escalation logic (pattern b)
  • Unified dashboard (AgentCore + EKS integrated observability)

Success Metrics:

  • Cost reduction rate: 40-60% (vs Bedrock Only)
  • p95 latency: 20% improvement vs AgentCore standalone
  • Session context consistency: 95%+
  • Agent availability: 99.9% (both-side Failover)

Transition Trigger Metrics

Quantitative metrics to determine each Phase transition.

MetricPhase 1 → 2 ThresholdPhase 2 → 3 Threshold
Monthly Inference Volume> 500K requests> 1.5M requests
Monthly Cost> $1,500> $3,000
Average Latency (p95)> 100ms> 200ms
Session Context Loss RateN/A> 5%
Custom Model RequirementFine-tuning neededDomain-specific SLM needed
Team K8s CapabilityBeginnerIntermediate+

References

Official Documentation

Papers / Technical Blogs