GPU Infrastructure

The layer that determines which GPU instances · how to schedule · and which driver/partitioning stack to manage on Kubernetes. This layer must be established for upper-layer inference frameworks (vLLM, llm-d, etc.) to run stably.

🖥️

EKS GPU Node Strategy

Auto Mode vs Karpenter vs Managed Node Group vs Hybrid Node — optimal node selection by workload, security hardening, troubleshooting.

📊

GPU Resource Management

Karpenter NodePool, KEDA scaling, DRA dynamic resource allocation, Spot/Consolidation cost optimization strategies.

💚

NVIDIA GPU Stack

GPU Operator ClusterPolicy, DCGM monitoring, MIG·Time-Slicing partitioning, Dynamo inference framework.

🧭

AWS Neuron Stack

Trainium2/Inferentia2, Neuron SDK 2.x, aws-neuron-device-plugin, NxD Inference, vLLM Neuron backend.

Selection Guide

If focused on NVIDIA, read Node Strategy → Resource Management → NVIDIA Stack; if considering AWS silicon (Trainium/Inferentia), read Node Strategy → Neuron Stack.