Agentic AI Platform
In-depth technical documentation on the architecture, deployment, and operations of the Agentic AI Platform
In-depth technical documentation on the architecture, deployment, and operations of the Agentic AI Platform
Technical status and EKS application scenarios for GPU workload checkpoint/restore during Spot reclaim and scheduling events (Experimental)
Hands-on guide to deploying large open-source models on EKS, based on the GLM-5.1 experience
Optimal node strategies for GPU workloads across EKS Auto Mode, Karpenter, MNG, and Hybrid Nodes
A complete guide for adopting Amazon EKS Hybrid Nodes: architecture, configuration, networking, DNS, GPU servers, cost analysis, and Dynamic Resource Allocation (DRA)
Guide to building Agentic AI platform using Amazon EKS and open-source ecosystem
EKS GPU node strategy, Karpenter·KEDA·DRA resource management, NVIDIA GPU stack, AWS Neuron stack
GPU resource management and cost optimization using Karpenter, KEDA, and DRA on EKS
2-Tier GPU autoscaling, DCGM/vLLM monitoring, Bifrost→Bedrock Cascade Fallback, Hybrid Node on-premises integration, large MoE deployment lessons learned
EKS architecture overview for maximizing LLM Inference performance — starting point for vLLM, KV Cache-Aware Routing, Disaggregated Serving, LWS multi-node, and Hybrid Node integration
Benchmark comparing performance and cost efficiency of GPU instances (p5, p4d, g6e) and AWS custom silicon (Trainium2, Inferentia2) for vLLM-based Llama 4 model serving
llm-d architecture concepts, KV Cache-aware routing, Disaggregated Serving, EKS Auto Mode integration strategy
Model serving guide divided into GPU infrastructure layer and inference/training framework layer
Architecture concepts, distributed deployment strategies, and performance optimization principles for Mixture of Experts models
Benchmark comparing Aggregated vs Disaggregated LLM serving performance using NVIDIA Dynamo — Running AIPerf 4 modes in an EKS environment
Architecture and EKS integration for GPU Operator, DCGM, MIG, Time-Slicing, and Dynamo
Production deployment and configuration reference architecture for the Agentic AI Platform
5 key challenges faced when operating Agentic AI workloads
vLLM PagedAttention, parallelization strategies, Multi-LoRA, and hardware support architecture