Skip to main content

GPU Infrastructure

The layer that determines which GPU instances · how to schedule · and which driver/partitioning stack to manage on Kubernetes. This layer must be established for upper-layer inference frameworks (vLLM, llm-d, etc.) to run stably.

Selection Guide

If focused on NVIDIA, read Node Strategy → Resource Management → NVIDIA Stack; if considering AWS silicon (Trainium/Inferentia), read Node Strategy → Neuron Stack.