6 篇文档已标记「optimization」

Disaggregated Serving + LWS 멀티노드

Prefill/Decode 분리 아키텍처와 NIXL 공통 KV 전송 엔진, LeaderWorkerSet 기반 700B+ 대형 MoE 모델 멀티노드 배포 가이드

EKS Pod Resource Optimization Guide

Kubernetes Pod CPU/Memory resource configuration, QoS classes, VPA/HPA autoscaling, and resource right-sizing strategies

GPU 리소스·관측·Hybrid Node·실전 교훈

2-Tier GPU 오토스케일링·DCGM/vLLM 모니터링·Bifrost→Bedrock Cascade Fallback·Hybrid Node 온프레 통합·대형 MoE 배포 실전 교훈

Inference Optimization on EKS

LLM Inference 성능을 극대화하는 EKS 아키텍처 개요 — vLLM, KV Cache-Aware Routing, Disaggregated Serving, LWS 멀티노드, Hybrid Node 통합의 시작점

KV Cache 최적화 (vLLM Deep Dive + Cache-Aware Routing)

vLLM PagedAttention·Continuous Batching·FP8 KV Cache 등 핵심 기술 정리와 llm-d/NVIDIA Dynamo의 KV Cache-Aware Routing 비교 및 Gateway 구성

Large-Scale EKS Cost Management: 30-90% Reduction Strategy

FinOps strategies for achieving 30-90% cost reduction in Amazon EKS environments. Includes cost structure analysis, Karpenter optimization, tool selection, and real-world success cases