Skip to main content

One doc tagged with "trl"

View all tags

GRPO/DPO Training Job

Production configuration for running NeMo-RL (GRPO) and TRL (DPO) training jobs with labeled preference datasets on Karpenter Spot node pools and Volcano Gang Scheduling.