Skip to main content

17 docs tagged with "scope:tech"

View all tags

EKS GPU Node Strategy

Optimal node strategies for GPU workloads across EKS Auto Mode, Karpenter, MNG, and Hybrid Nodes

GPU Infrastructure

EKS GPU node strategy, Karpenter·KEDA·DRA resource management, NVIDIA GPU stack, AWS Neuron stack

Inference Frameworks

vLLM·llm-d·MoE·NeMo — AI framework layer for actual model serving, distributed inference, and fine-tuning on GPUs

Inference Optimization on EKS

EKS architecture overview for maximizing LLM Inference performance — starting point for vLLM, KV Cache-Aware Routing, Disaggregated Serving, LWS multi-node, and Hybrid Node integration

MoE Model Serving Concept Guide

Architecture concepts, distributed deployment strategies, and performance optimization principles for Mixture of Experts models

NeMo Framework

NVIDIA NeMo Framework distributed training, fine-tuning, and TensorRT-LLM conversion architecture

NVIDIA GPU Stack

Architecture and EKS integration for GPU Operator, DCGM, MIG, Time-Slicing, and Dynamo

Semantic Caching Strategy

LLM Gateway-level semantic caching strategy and implementation options comparison (GPTCache, Redis Semantic Cache, Portkey, Helicone, Bifrost+Redis)

vLLM Model Serving

vLLM PagedAttention, parallelization strategies, Multi-LoRA, and hardware support architecture