EKS GPU Node Strategy

Overview

When operating GPU workloads on EKS, node type selection directly impacts operational complexity, cost, and feature utilization. GPU inference and training workloads have special requirements unlike general container workloads:

Driver dependencies: NVIDIA GPU drivers, Container Toolkit, Device Plugin
Advanced features: MIG (Multi-Instance GPU), Time-Slicing, Fractional GPU
Monitoring: DCGM (Data Center GPU Manager)-based metrics
Scheduling: Topology-Aware Placement, Gang Scheduling

AWS EKS provides 4 node types for GPU workloads:

Node Type	Description
EKS Auto Mode	AWS fully manages the entire node lifecycle (GPU drivers pre-installed)
Karpenter	Auto-scaling + Custom AMI, MIG, full user customization
Managed Node Group	AWS-managed node groups, only option supporting DRA (Dynamic Resource Allocation)
Hybrid Node	Connect on-premises GPU servers to the EKS cluster

Core Principle

You can operate multiple node types simultaneously in a single EKS cluster. Configure the optimal node combination matching your workload characteristics.

Scope of This Document

This document focuses on node type selection and hybrid architecture design. Detailed NVIDIA software stack (GPU Operator/DCGM/Dynamo), GPU autoscaling, llm-d distributed inference, and security/troubleshooting are covered in their respective specialized documents (see Section 7 Related Documents).

2. Node Type Comparison

2.1 Feature Comparison Table

Feature	Auto Mode	Karpenter	Managed Node Group	Hybrid Node
Management Owner	AWS fully managed	Self-Managed	AWS managed	On-Premises
Auto-scaling	Automatic (AWS controlled)	Automatic (NodePool-based)	Manual/Limited	Manual
Custom AMI	Not available	Available	Available	Available
SSH Access	Not available	Available	Available	Available
GPU Driver	Pre-installed (AWS)	User-installed	User-installed	User-installed
GPU Operator	Available (Device Plugin label disabled)	Available	Available	Available
Root Filesystem	Read-Only	Read-Write	Read-Write	Read-Write
MIG Support	Not available (NodeClass read-only)	Available	Available	Available
DRA Compatible	Not available (internal Karpenter-based)	Not available (#1231)	Available (recommended)	Available
DCGM Exporter	Install via GPU Operator	Included in GPU Operator	Manual installation	Included in GPU Operator
Run:ai Compatible	Available (Device Plugin disabled)	Available	Available	Available
Cost	Low (no management needed)	Medium	Medium	Low (Capex)
Suitable Workloads	Simple inference	Advanced GPU features	DRA workloads	On-premises integration

2.2 Selection Guide: When to Use Which Node

Choose Auto Mode when:

You want to quickly start inference services without GPU driver management burden
Serving large models (70B+) that don't require MIG or Fractional GPU
System/non-GPU workloads (API Gateway, Agent, Observability)

Choose Karpenter when:

You need flexible control over MIG partitioning, Custom AMI, Spot Instances
Using projects dependent on GPU Operator ClusterPolicy (Run:ai, KAI Scheduler)
Optimizing GPU utilization for small/medium models (MIG partitioning)

Choose Managed Node Group when:

DRA (Dynamic Resource Allocation)-based GPU management is required
Using DRA-exclusive instances like P6e-GB200 UltraServer

Choose Hybrid Node when:

Integrating existing on-premises GPU server assets into EKS
Data residency requirements

3. EKS Auto Mode GPU Support and Limitations

3.1 GPU Stack Auto-Provided by Auto Mode

EKS Auto Mode pre-installs the following on GPU instances:

NVIDIA GPU Driver - AWS-managed version, /dev/nvidia* devices auto-created
NVIDIA Container Toolkit - containerd plugin auto-configured
NVIDIA Device Plugin - nvidia.com/gpu resource auto-registered
GPU Resource Registration - Pods can immediately request nvidia.com/gpu: 1

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  containers:
  - name: cuda-test
    image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1

3.2 Installing GPU Operator on Auto Mode: Device Plugin Disable Pattern

GPU Operator can be installed on Auto Mode. The key is to disable only the Device Plugin via node labels while keeping other components (DCGM Exporter, NFD, GFD) running normally. This pattern was validated in awslabs/ai-on-eks PR #288.

Why is GPU Operator needed? Several projects including KAI Scheduler and Run:ai depend on GPU Operator's ClusterPolicy CRD. Without ClusterPolicy, these projects cannot even start. This is the core reason for installing GPU Operator on Auto Mode.

For complete GPU Operator architecture and component details, see NVIDIA GPU Stack.

ClusterPolicy CRD (GPU Operator)
  ↓ depends on
KAI Scheduler (GPU-aware Pod placement)
Run:ai (Fractional GPU, Gang Scheduling)
  ↓ reads
DCGM Exporter (GPU metrics)
NFD/GFD (Hardware labels)

GPU Operator Component	Auto Mode Setting	Reason
Driver	`enabled: false`	Pre-installed in AMI
Container Toolkit	`enabled: false`	Pre-installed in AMI
Device Plugin	Disabled via label	AWS manages its own Device Plugin
DCGM Exporter	`enabled: true`	GPU metrics collection
NFD / GFD	`enabled: true`	Hardware feature detection and GPU attribute labeling

NodePool label configuration to disable Device Plugin:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-auto-mode
spec:
  template:
    metadata:
      labels:
        nvidia.com/gpu.deploy.device-plugin: "false"
    spec:
      requirements:
        - key: eks.amazonaws.com/instance-family
          operator: In
          values: ["p5", "g6e", "g5"]
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: default

Helm Values (for Auto Mode):

driver:
  enabled: false
toolkit:
  enabled: false
devicePlugin:
  enabled: true           # Globally enabled, selectively disabled via node labels
dcgmExporter:
  enabled: true
  serviceMonitor:
    enabled: true
nfd:
  enabled: true
gfd:
  enabled: true

Actual Auto Mode Limitations

While GPU Operator installation is possible, since NodeClass is read-only, the following are not available:

MIG Partitioning: Cannot configure MIG profiles in NodeClass
Custom AMI: Cannot pin specific driver versions
SSH/SSM Access: Cannot directly debug nodes

If MIG-based GPU partitioning is needed, switch to Karpenter + GPU Operator.

3.3 Large GPU Instance Support Status (Verified 2026.04)

Auto Mode large GPU instance support status confirmed during GLM-5 (744B MoE) deployment. p5.48xlarge Spot provisioning was successful, but p5en/p6 have current limitations.

Detailed Support Status: See EKS Auto Mode GPU Instance Support Status

3.4 Auto Mode + MNG Hybrid Limitation

The hybrid pattern of adding MNG to an Auto Mode cluster for p5en/p6 usage is currently not possible:

MNG creation stalls in CREATING state for 30+ minutes
CloudFormation stack Resources field remains null
Auto Mode's managed compute layer conflicts internally with MNG's ASG-based management

Conclusion: For large GPUs (H200+, B200), use EKS Standard Mode + Karpenter + MNG.

3.5 Device Plugin Conflict Resolution

Installing GPU Operator with devicePlugin.enabled=true on Auto Mode nodes conflicts with the built-in Device Plugin.

kubectl describe node <gpu-node> | grep nvidia.com/gpu
# Allocatable: nvidia.com/gpu: 0  (expected: 8)

Solution: Add nvidia.com/gpu.deploy.device-plugin: "false" label to NodePool (see Section 3.2)

3.6 Node Force Termination Not Available

EC2 instances managed by Auto Mode block ec2:TerminateInstances. Abnormal node recovery procedure:

Delete workload: kubectl delete pod <gpu-pod>
Delete NodeClaim: kubectl delete nodeclaim <nodeclaim-name>
Karpenter detects empty node and auto-terminates (5-10 min)
New NodeClaim creation starts a healthy node

3.7 How to Verify Auto Mode Instance Support

You can pre-verify specific instance type support with a NodePool dry-run:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-test-dryrun
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["p5en.48xlarge"]
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: default
  limits:
    nvidia.com/gpu: "8"

If NoCompatibleInstanceTypes appears in kubectl get nodeclaim events after dry-run, that instance type is not supported in Auto Mode.

4. Karpenter GPU NodePool Configuration

4.1 Why Karpenter

Karpenter is the optimal balance point that maintains Auto Mode's auto-scaling advantages while fully utilizing GPU Operator.

Feature	Auto Mode	Karpenter
Auto-scaling	Automatic (AWS controlled)	Automatic (NodePool-based)
GPU Operator	Available (Device Plugin disabled)	Fully available
Custom AMI	Not available	Available
MIG Support	Not available	Available
Spot Instance	Limited	Fully supported
Node Replacement Speed	Fast	Very fast

4.2 Inference Workload NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-inference
spec:
  template:
    metadata:
      labels:
        node-type: gpu-inference
        gpu-operator: enabled
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - p5.48xlarge      # H100 x8 (640GB HBM3)
            - g6e.12xlarge     # L40S x4 (192GB GDDR6)
            - g5.12xlarge      # A10G x4 (96GB GDDR6)
        - key: karpenter.sh/capacity-type
          operator: In
          values: [on-demand]
        - key: topology.kubernetes.io/zone
          operator: In
          values: [us-west-2a, us-west-2b, us-west-2c]
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule
          value: "true"
      kubelet:
        maxPods: 110
        evictionHard:
          memory.available: "10Gi"
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m
  limits:
    cpu: "1000"
    memory: "4000Gi"
    nvidia.com/gpu: "32"

4.3 Training Workload NodePool (Spot + On-Demand fallback)

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-training
spec:
  template:
    metadata:
      labels:
        node-type: gpu-training
        gpu-operator: enabled
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - p5.48xlarge      # H100 x8
        - key: karpenter.sh/capacity-type
          operator: In
          values: [spot, on-demand]  # Spot first, On-Demand fallback
      taints:
        - key: workload
          effect: NoSchedule
          value: "training"
      kubelet:
        maxPods: 50
        evictionHard:
          memory.available: "20Gi"
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30m  # Prevent training interruption
  limits:
    nvidia.com/gpu: "64"

4.4 EC2NodeClass Configuration

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu-inference
spec:
  amiSelectorTerms:
    - alias: al2023
  role: KarpenterNodeRole-eks-genai-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: eks-genai-cluster
        subnet-type: private
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: eks-genai-cluster
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 200Gi
        volumeType: gp3
        iops: 16000
        throughput: 1000
        encrypted: true
        deleteOnTermination: true
  metadataOptions:
    httpEndpoint: enabled
    httpPutResponseHopLimit: 2
    httpTokens: required  # IMDSv2
  tags:
    Environment: production
    ManagedBy: karpenter

4.5 GPU Operator Helm Values (for Karpenter Nodes)

# helm install gpu-operator nvidia/gpu-operator -f values.yaml
driver:
  enabled: false          # AL2023: Pre-installed in AMI

toolkit:
  enabled: false          # AL2023: Pre-installed in AMI

devicePlugin:
  enabled: true
  nodeSelector:
    gpu-operator: enabled
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

migManager:
  enabled: true
  nodeSelector:
    gpu-operator: enabled
  config:
    name: mig-parted-config
    default: "all-balanced"

dcgmExporter:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 15s
  nodeSelector:
    gpu-operator: enabled

nfd:
  enabled: true

gfd:
  enabled: true
  nodeSelector:
    gpu-operator: enabled

operator:
  nodeSelector:
    node-type: gpu-inference  # Karpenter NodePool label
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule
  defaultRuntime: containerd

Key Configuration Points:

nodeSelector: gpu-operator: enabled -- Excludes Auto Mode nodes
driver/toolkit: false -- Pre-installed in AL2023 AMI
migManager: true -- Enables MIG functionality on Karpenter nodes

4.6 GPU Topology-Based Scheduling

In distributed training, placing GPUs connected via NVLink on the same node is critical for performance:

# GPU topology hints in Pod configuration
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: pytorch-ddp
    resources:
      limits:
        nvidia.com/gpu: 4
  # Place GPUs within the same NVLink domain
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: distributed-training

4.7 Spot Price Comparison (us-east-2, 2026.04)

Instance	On-Demand	Spot (Lowest)	VRAM	Savings
p5.48xlarge	$98/hr	$12.5/hr	640GB	87%
p5en.48xlarge	~$120/hr	$12.1/hr	1,128GB	90%
p6-b200.48xlarge	$180/hr	$11.4/hr	1,536GB	94%

Spot Usage Recommendation

Large GPU instances can achieve 85-94% cost savings with Spot. Actively use Spot for PoC/demo environments, and set consolidationPolicy: WhenEmpty to prevent unnecessary disruption. Prices are approximate; verify real-time pricing at AWS Spot Pricing.

5. Recommended Hybrid Architecture

5.1 3-Node Type Coexistence Architecture

Operate Auto Mode + Karpenter + Hybrid Node simultaneously in a single EKS cluster.

5.2 Per-Workload Node Placement Strategy

Workload Type	Node Type	GPU Operator	Reason
System Components	Auto Mode	Not needed	No management needed, cost minimization
API Gateway / Agent	Auto Mode	Not needed	CPU workloads
Simple GPU Inference (70B+)	Auto Mode	Optional (needed for DCGM)	MIG not needed, fast scaling
MIG-Based Inference	Karpenter	Required	MIG Manager needed
Fractional GPU	Karpenter	Required	Run:ai needed
Model Training	Karpenter	Required	Gang Scheduling, Spot
DRA Workloads	Managed Node Group	Required	Not supported on Karpenter/Auto Mode
On-Premises GPU	Hybrid Node	Required	No AWS-managed GPU stack

5.3 MNG Hybrid for DRA Workloads

DRA (Dynamic Resource Allocation) was promoted to GA in K8s 1.34, providing advanced GPU management beyond Device Plugin including fine-grained GPU memory allocation and NVLink topology-aware scheduling. However, DRA cannot be used with Karpenter and Auto Mode.

DRA + Karpenter/Auto Mode Incompatibility

Karpenter skips node provisioning when it detects spec.resourceClaims in a Pod (PR #2384). Karpenter simulates Pod requirements to calculate optimal instances, but DRA's ResourceSlice is only issued by the DRA Driver after a node exists — making pre-node-creation simulation impossible (chicken-and-egg problem).

The only officially supported node management method for DRA workloads is Managed Node Group + Cluster Autoscaler.

Workload	Node Type	GPU Allocation Method	Scaling
DRA Workloads (llm-d, P6e-GB200)	Managed Node Group	ResourceClaim (DRA)	Cluster Autoscaler
Standard GPU Inference (vLLM standalone)	Karpenter / Auto Mode	`nvidia.com/gpu` (Device Plugin)	Karpenter
Non-GPU Workloads	Karpenter / Auto Mode	-	Karpenter

For detailed DRA scale-out strategies, see GPU Resource Management.

5.4 Recommended Node Strategy by Model Size

Model Size	Example	Recommended Node	Reason
70B+	Qwen3-72B, Llama-3-70B	Auto Mode + llm-d	Uses nearly all GPU, management convenience
30B-65B	Qwen3-32B	Auto Mode or Karpenter	50%+ GPU usage, choose based on situation
13B-30B	Llama-3-13B	Karpenter + MIG 2-way split	GPU utilization improvement needed
7B and below	Llama-3-8B, Mistral-7B	Karpenter + MIG 4-7 way split	Severe GPU waste, MIG essential
Multi-Model	Multiple models simultaneously	Karpenter + MIG	Separate MIG partitions per model
Dev/Test	Model agnostic	Auto Mode	Quick start

5.5 Cost Impact by Model Size

Based on p5.48xlarge (H100 x8), monthly cost approximately $98,000:

Configuration	7B Model Instances	GPU Usage	GPU Utilization	Effective Cost/Instance
Auto Mode (full GPU allocation)	8	8 GPUs	~25%	$12,250
Karpenter + MIG (4-way split)	8	2 GPUs	~80%	$3,063
Savings	Same	75% reduction	3.2x improvement	75% reduction

Model Size and Cost Efficiency

The smaller the model parameters, the greater the GPU waste on Auto Mode. Running a 7B model on H100 leaves 80% of GPU memory idle, which is a direct cost waste. MIG partitioning is essential for small/medium models.

5.6 Optimal Configuration for Current Timeframe (2026.04)

For most LLM serving environments, DRA is not yet essential. Device Plugin + MIG combination can sufficiently cover GPU partitioning and topology placement, and Karpenter's fast scale-out is more favorable for LLM serving SLOs than MNG + Cluster Autoscaler.

Criteria	Karpenter + Device Plugin	MNG + DRA
Scale-out Speed	Fast (Karpenter)	Slow (Cluster Autoscaler)
GPU Partitioning	MIG supported (GPU Operator)	DRA native
Operational Complexity	Single stack	MNG + Karpenter mixed
K8s Version	1.32+	1.34+ (DRA GA)
Ecosystem Maturity	Production-proven	Early stage

5.7 Recommended Configuration by Scale

Small Scale (< 32 GPUs)

Configuration: Auto Mode + Karpenter (GPU dedicated)
  - Auto Mode: General workloads
  - Karpenter: GPU inference (Device Plugin)
  - GPU Operator: DCGM monitoring
Cost: $5,000 - $15,000/month

Medium Scale (32 - 128 GPUs)

Configuration: Karpenter + GPU Operator + KEDA
  - Karpenter NodePool: Separate Prefill / Decode / Small models
  - GPU Operator: MIG, DCGM, NFD/GFD
  - KEDA: KV Cache / TTFT-based Pod scaling
Cost: $15,000 - $80,000/month

Large Scale (> 128 GPUs)

Configuration: Karpenter + GPU Operator + Run:ai + Hybrid Node
  - Karpenter: GPU Operator + Run:ai
  - Hybrid Node: On-premises GPU farm integration
  - When adopting P6e-GB200: Add MNG + DRA
Cost: $80,000 - $500,000/month (cloud) + Capex (on-premises)

5.8 DRA Transition Timing

Condition	Transition Required
P6e-GB200 UltraServer Adoption	Required (Device Plugin not supported)
Multi-Node NVLink / IMEX Needed	Required (ComputeDomain is DRA-exclusive)
CEL-Based Fine-Grained GPU Attribute Selection	Recommended
GPU Sharing (MPS)	Recommended
Karpenter DRA Support GA	Optimal transition timing (MNG not needed)

Transition Strategy

Now: Karpenter + GPU Operator (Device Plugin + MIG) -- Fastest and most operationally viable production configuration

When Adopting P6e-GB200: MNG (DRA, GPU) + Karpenter (non-GPU) hybrid

After Karpenter DRA GA: Karpenter + DRA integration -- Final target configuration

6. AWS Accelerator Selection Guide (NVIDIA vs Neuron)

EKS GPU node strategies have traditionally been designed around NVIDIA GPUs (p/g series), but as of 2026, Trainium2/Inferentia2 based AWS custom accelerators have matured as production alternatives. Neuron stack details are covered in AWS Neuron Stack, while this section only summarizes selection criteria for node strategy planning.

6.1 NVIDIA GPU vs AWS Neuron Decision Matrix

Criteria	NVIDIA GPU (p5/p5en/p6/g6e)	AWS Neuron (trn2/inf2)
Model Ecosystem Recency	Immediate support (new models Day-1)	AWS porting cycle delay (weeks to months)
Long-Term TCO	Higher (H100/H200/B200 Spot still expensive)	Favorable cost per token (per AWS data)
Capacity Availability	Tight depending on region/timing	Relatively easier to secure
Custom CUDA Kernels	Full support	Not supported (NEFF compilation required)
Quantization Formats	AWQ/GPTQ/GGUF extensive	BF16/FP16/FP8, AWQ/GPTQ limited
Observability Ecosystem	GPU Operator + DCGM mature	neuron-monitor + OSS exporter
Open-Source Serving	vLLM, SGLang, TRT-LLM rich	NxD Inference / vLLM Neuron / TGI Neuron
Bedrock Continuity	Unrelated	Same path as Bedrock internal stack
Hybrid (On-Premises)	Possible with Hybrid Node	EC2 only (on-premises not available)

6.2 Selection Flow

6.3 Recommended Mixed Operation Pattern

Frontier (Latest Models) Layer: NVIDIA GPU (p5en/p6) — Rapid adoption of new models
Volume (High-Frequency Inference) Layer: Neuron (trn2/inf2) — Low-cost serving of stable models at scale
Edge/On-Premises: Hybrid Node + NVIDIA GPU — Neuron is EC2-only

For detailed Neuron SDK, Device Plugin, Karpenter NodePool, and inference framework selection (NxD Inference / vLLM Neuron / TGI Neuron), see AWS Neuron Stack.

7. Node Strategy Decision Flowchart

Decision Summary Table

Question	Answer	Recommended Node Type	GPU Operator
GPU not needed	-	Auto Mode	Not needed
Simple GPU inference (no MIG)	-	Auto Mode GPU	Optional
MIG needed	-	Karpenter	Required
DRA needed	-	Managed Node Group	Required
Fractional GPU / Run:ai	-	Karpenter	Required
On-premises GPU	-	Hybrid Node	Required
Cost minimization (Spot acceptable)	-	Karpenter Spot	Required
Large-scale training (Gang Scheduling)	-	Karpenter + Run:ai	Required
P6e-GB200	DRA required	Managed Node Group	Required

GPU Stack and Monitoring

For detailed NVIDIA GPU software stack including GPU Operator, DCGM, MIG, Time-Slicing, KAI Scheduler, and Dynamo, see the dedicated document.

NVIDIA GPU Stack - GPU Operator, DCGM Exporter, MIG Manager, Dynamo, KAI Scheduler

GPU Resource Management

For GPU autoscaling strategies based on Karpenter, KEDA, and DRA, see:

GPU Resource Management - Karpenter NodePool, KEDA Scaling, DRA Scale-out Strategy

Inference Engines

llm-d EKS Auto Mode - llm-d distributed inference, KV-cache aware routing, Auto Mode/Karpenter node strategy
vLLM Model Serving - vLLM deployment and optimization

Hybrid Infrastructure

For EKS Hybrid Node registration of on-premises GPU servers, VPN/Direct Connect configuration, and GPU Operator installation, see:

Hybrid Infrastructure - On-premises + cloud hybrid architecture

Deployment and Security

For production deployment YAML, security policies (Pod Security Standards, NetworkPolicy, IAM), and troubleshooting guides for GPU workloads, see Reference Architecture.

Reference Architecture: GPU Infrastructure - GPU security, troubleshooting, deployment guide

Platform Architecture

EKS-Based Open Architecture - Overall Agentic AI platform architecture

Overview​

Scope of This Document​

2. Node Type Comparison​

2.1 Feature Comparison Table​

2.2 Selection Guide: When to Use Which Node​

3. EKS Auto Mode GPU Support and Limitations​

3.1 GPU Stack Auto-Provided by Auto Mode​

3.2 Installing GPU Operator on Auto Mode: Device Plugin Disable Pattern​

3.3 Large GPU Instance Support Status (Verified 2026.04)​

3.4 Auto Mode + MNG Hybrid Limitation​

3.5 Device Plugin Conflict Resolution​

3.6 Node Force Termination Not Available​

3.7 How to Verify Auto Mode Instance Support​

4. Karpenter GPU NodePool Configuration​

4.1 Why Karpenter​

4.2 Inference Workload NodePool​

4.3 Training Workload NodePool (Spot + On-Demand fallback)​

4.4 EC2NodeClass Configuration​

4.5 GPU Operator Helm Values (for Karpenter Nodes)​

4.6 GPU Topology-Based Scheduling​

4.7 Spot Price Comparison (us-east-2, 2026.04)​

5. Recommended Hybrid Architecture​

5.1 3-Node Type Coexistence Architecture​

5.2 Per-Workload Node Placement Strategy​

5.3 MNG Hybrid for DRA Workloads​

5.4 Recommended Node Strategy by Model Size​

5.5 Cost Impact by Model Size​

5.6 Optimal Configuration for Current Timeframe (2026.04)​

5.7 Recommended Configuration by Scale​

5.8 DRA Transition Timing​

6. AWS Accelerator Selection Guide (NVIDIA vs Neuron)​

6.1 NVIDIA GPU vs AWS Neuron Decision Matrix​

6.2 Selection Flow​

6.3 Recommended Mixed Operation Pattern​

7. Node Strategy Decision Flowchart​

Decision Summary Table​

8. Related Documents​

GPU Stack and Monitoring​

GPU Resource Management​

Inference Engines​

Hybrid Infrastructure​

Deployment and Security​

Platform Architecture​