AI Platform Selection Guide: Managed vs Open Source vs Hybrid

When customers start developing AI directly, the first question they face is "Should we use managed services or build with open source?" This document provides a decision framework to select the optimal approach among SageMaker Unified Studio, Bedrock AgentCore, and EKS-based open architecture based on customer circumstances.

AI platform construction paths are broadly divided into three categories:

(A) AWS Managed: Start with no infrastructure operations using Bedrock + Strands SDK + AgentCore
(B) EKS + Open Source: Secure maximum control with self-hosting using vLLM, llm-d, Langfuse, etc.
(C) Hybrid: Achieve balance of cost, control, and speed by combining Bedrock and EKS

Prerequisites

Before reading this document, refer to the following:

Platform Architecture — 6-layer core design blueprint
Technical Challenges — 5 core challenge analysis

AWS AI Platform Service Landscape

AWS AI services are structured into 4 tiers. Customers start at lower tiers and move to higher tiers as needed.

Key Tier Distinctions:

Tier 1-3: AWS managed services allow you to start without infrastructure operations.
Tier 4: Choose when fine-grained control, cost optimization, or data sovereignty is required.
Most customers start at Tier 1 and expand incrementally, while enterprises tend to combine Tier 3 and Tier 4 in hybrid configurations.

SageMaker Unified Studio

Integrated AI Development Environment

SageMaker Unified Studio is an integrated AI development environment released in H2 2024, designed to perform ML/data/analytics tasks in a single IDE. Previously, teams had to use fragmented tools like SageMaker Studio Classic, Athena, and Glue Studio separately, but Unified Studio consolidates them into one.

Key Differentiators

Feature	Description	Improvement vs Previous
Unified IDE	JupyterLab + SQL Editor + No-code Interface	Data+ML integration vs SageMaker Studio Classic
Built-in MLflow	Experiment tracking, model registry, model comparison	No need to operate separate MLflow server
Lakehouse Integration	Apache Iceberg tables, Glue Catalog native integration	One-stop data engineering → ML pipeline
Governance Collaboration	Amazon DataZone-based IAM sharing, data lineage tracking	Secure data/model sharing between teams
Unified Compute	Manage training, notebooks, pipelines in single environment	Prevent resource fragmentation

Positioning: When to Choose?

Key Message

SageMaker Unified Studio is a development environment (Tier 2). It has a complementary relationship with Bedrock (inference) or EKS (serving), and provides the greatest value when data teams and ML teams need to collaborate on a single platform.

Platform Comparison Matrix

The optimal approach varies based on customer circumstances. Compare each platform option across 5 key evaluation dimensions.

AI Platform 5-Axis Comparison Matrix

Evaluation Axis	Bedrock + AgentCore	SageMaker Unified Studio	EKS+Open Source	Hybrid
Cost Structure	Usage-based pricing, no GPU management	Instance+usage hybrid, notebook/training separate	Spot/MIG optimization, upfront investment needed	Bedrock + self-hosted SLM, Cascade 66% savings
Operational Burden	Minimal — AWS fully managed	Low — minimal infra management, focus on ML workflows	Medium — K8s/GPU ops capability needed (reduced with Auto Mode)	Medium — understanding of both environments required
Data Sovereignty	Processed within AWS region	VPC isolation, training data stays in S3	Full control — model+data isolation within VPC	Selective isolation per workload
Customization	Limited — Bedrock-supported models, within Guardrails scope	MLflow, custom pipelines, fine-tuning support	Fully flexible — all open models, LoRA, custom gateway	Selective expansion as needed
Time-to-Value	2-4 weeks — start with API calls only	4-8 weeks — environment setup + pipeline configuration	2-4 months — cluster + GPU + model serving setup	1-3 months — Bedrock start + gradual EKS expansion

Detailed Cost Analysis

For detailed cost comparison between self-hosting and Bedrock (break-even points, Cascade Routing savings), refer to Coding Tools Cost Analysis.

Decision Flowchart

A decision flow you can use in customer meetings. Find the optimal approach by answering key questions.

The Flowchart is a Starting Point

This flowchart is the starting point for conversation, not the final conclusion. Actual customer situations are complex, and most enterprises converge on a hybrid approach.

Recommended Path by Customer Maturity

Starting points and expansion paths vary based on the customer's current AI/ML maturity level.

AI Platform Maturity Path

Maturity Level	Characteristics	Recommended Stack	Core Services	Timeline
Level 1 — AI Explorer	No AI/ML workloads, need fast PoC	AWS Managed First	Bedrock API + Strands SDK + AgentCore	2-4 weeks
Level 2 — AI Builder	Some ML in production, training pipelines needed	SageMaker + Bedrock Hybrid	SageMaker Unified Studio + Bedrock + S3/Glue	1-3 months
Level 3 — AI Optimizer	Large-scale inference, cost pressure, custom models	EKS Open Architecture + Cascade Routing	EKS + vLLM/llm-d + kgateway + Bifrost + Langfuse	3-6 months

Detailed Guide by Level:

Level 1 (Exploration): → AWS Native Platform
Level 2 (Build): → SageMaker-EKS Integration
Level 3 (Optimization): → EKS-based Open Architecture, Inference Gateway

Hybrid Combination Patterns

Most enterprises converge on hybrid approaches rather than a single path. Here are 4 proven combination patterns.

Hybrid Pattern Summary

Pattern	Configuration	Best Fit Scenario	Complexity
Bedrock + EKS SLM	Bedrock (inference) + EKS self-hosted SLM (high-frequency)	Large-scale inference with urgent API cost reduction	★★☆☆☆
SageMaker Training + EKS Serving	SageMaker (training/experimentation) + EKS+vLLM (serving)	Organizations with separated ML and serving teams	★★★☆☆
AgentCore + Self-hosted Models	AgentCore (Agent runtime) + EKS (custom model inference)	AWS-managed Agent operations, self-hosted models	★★★☆☆
Full Stack	Unified Studio (dev) + Bedrock (external) + EKS (self-hosted) + AgentCore (ops)	Enterprise AI CoE, full AI lifecycle management	★★★★☆

Pattern 1: Bedrock + EKS SLM (Cascade Routing)

When to use: When monthly inference volume exceeds 500K requests and 60-70% of requests are simple tasks (code completion, translation, summarization)

Core value: Maintain Bedrock API quality while reducing costs by 40-60%

Reference: Inference Gateway & Cascade Routing

Pattern 2: SageMaker Training + EKS Serving

When to use: When training custom models and minimizing inference costs

Core value: SageMaker's managed training environment + EKS cost-efficient serving

Reference: SageMaker-EKS Integration

Pattern 3: AgentCore + Self-Hosted Models

When to use: When operating Agent runtime serverlessly but self-hosting specific domain models

Core value: AgentCore's serverless operability + custom model domain accuracy

Reference: AWS Native Platform

Pattern 4: Full Stack (SageMaker + Bedrock + EKS)

The most complex but most flexible pattern:

Data & Training: SageMaker Unified Studio + Pipelines
Production Inference: Bedrock API (high-reliability tasks) + EKS vLLM (high-volume tasks)
Agent Runtime: AgentCore (serverless) + Kagent (Kubernetes-native)
Observability: CloudWatch (managed) + Langfuse (self-hosted)

This pattern is chosen by large enterprises to meet different requirements across teams. Due to high architectural complexity, clear operational responsibility boundaries and a service catalog are essential.

Reference: For technical implementation of hybrid architecture, refer to SageMaker-EKS Integration.

Cost Simulation Summary

Optimal options and estimated costs based on monthly inference volume.

Monthly Inference Volume	Optimal Option	Est. Monthly Cost	Notes
~100K requests	Bedrock API	~$300-500	No GPU management, fastest start
~500K requests	Bedrock + Cascade	~$800-1,200	Start separating simple requests with SLM
~1.5M requests	Hybrid transition point	~$2,500-3,500	Near self-hosting break-even
~5M+ requests	EKS self-hosting	~$3,500-5,000	60%+ savings with Spot + Cascade

Detailed Cost Analysis

For detailed analysis of instance costs, Spot savings rates, and Cascade Routing effects, refer to Coding Tools Cost Analysis.

Customer Discovery Checklist

10 key questions to identify the optimal approach in customer meetings.

Are you currently operating AI/ML workloads? → Determine maturity level
What is your monthly inference request volume? → Cost optimization path
Do you need to self-host Open Weight models? → EKS necessity
Do you have data sovereignty or VPC isolation requirements? → Self-hosting/hybrid
Does your team have Kubernetes operations experience? → Assess operational burden
Do you perform ML training and data engineering together? → SageMaker Unified Studio
What is your monthly budget range? → Cost structure matching
When is your target production deployment date? → Time-to-value path
Do you have multi-cloud or on-premises hybrid requirements? → EKS Hybrid Nodes
What AWS services are you currently using? → Leverage existing investments

References

Official Documentation

Amazon SageMaker Unified Studio — Integrated AI development environment
Amazon Bedrock Documentation — Bedrock official documentation
Amazon EKS Best Practices — EKS recommendations
AWS Well-Architected Framework — Architecture framework

Papers / Technical Blogs

Choosing the Right AI Platform — Platform selection guide
Cost Optimization for LLM Inference — Cost optimization strategies
Hybrid AI Architecture Patterns — Hybrid patterns
Building Production ML Systems — Production ML guide

Platform Architecture — 6 core layers
Technical Challenges — 5 core challenges
AWS Native Platform — Managed service details
EKS-based Open Architecture — Self-hosting details

AWS AI Platform Service Landscape​

SageMaker Unified Studio​

Integrated AI Development Environment​

Key Differentiators​

Positioning: When to Choose?​

Platform Comparison Matrix​

Decision Flowchart​

Recommended Path by Customer Maturity​

Hybrid Combination Patterns​

Pattern 1: Bedrock + EKS SLM (Cascade Routing)​

Pattern 2: SageMaker Training + EKS Serving​

Pattern 3: AgentCore + Self-Hosted Models​

Pattern 4: Full Stack (SageMaker + Bedrock + EKS)​

Cost Simulation Summary​

Customer Discovery Checklist​

References​

Official Documentation​

Papers / Technical Blogs​

Related Documents (Internal)​