3 docs tagged with "training"

GRPO/DPO Training Job

Production configuration for running NeMo-RL (GRPO) and TRL (DPO) training jobs with labeled preference datasets on Karpenter Spot node pools and Volcano Gang Scheduling.

NeMo Framework

NVIDIA NeMo Framework distributed training, fine-tuning, and TensorRT-LLM conversion architecture

SageMaker-EKS Hybrid ML Architecture

A hybrid ML architecture that trains on SageMaker and serves on EKS