Continuous Training Pipeline

Overview

The Continuous Training Pipeline is the implementation architecture of the Self-Improving Agent Loop that automatically converts production inference traces into training data for continuous model improvement. It collects Langfuse OTel traces into an S3 Data Lake, evaluates quality with a Reward Labeler, and performs preference tuning with GRPO/DPO. After passing evaluation, it gradually rolls out to production via Canary deployment.

Why Continuous Training

Traditional training methods rely on static datasets. However, production user feedback occurs continuously, and without incorporating it, models increasingly diverge from actual usage patterns over time.

Challenge	Traditional Approach	Continuous Training
Data Collection	Manual labeling (monthly)	Automatic trace collection (real-time)
Feedback Integration	3-6 months	1 week
Quality Improvement	Wait for new dataset	Immediate user feedback integration
Cost	$10K/month labeling	Reward Model automation

Design Document Link

This document covers how to implement the 5-stage architecture of Self-Improving Agent Loop on EKS. Refer to the design document for background and strategic decisions.

ADR Agreement Required Before Production

To apply this pipeline to production traffic, the scope, automation boundaries, data gates, and rollback criteria defined in ADR — Self-Improving Agent Loop Decision must be agreed upon at the organizational level. Operate the Train/Deploy stages with manual approval gates.

5-Stage Pipeline Flow

Key Concepts:

Trace → Dataset: Convert Langfuse production inference logs into training data
Reward Labeling: Score trace quality 0-1 with Ragas + LLM Judge
GRPO/DPO: High-score traces as preferred, low-score as non-preferred
Eval Gate: Verify quality threshold after training
Canary → 100%: Gradual traffic increase, immediate rollback on regression

Trace → Dataset Materializer — Langfuse OTel collection, S3 Iceberg tables, Reward Labeler Fleet
GRPO/DPO Training Job — NeMo-RL/TRL-based preference tuning with Karpenter Spot node pools
Eval Gate · Registry · KPI — Threshold verification, Canary deployment, MLflow Registry, cost KPIs

Summary

The Continuous Training Pipeline automatically incorporates production feedback into model improvement through a 5-stage workflow:

Trace → Dataset: Langfuse OTel → S3 Iceberg (partitioned by date/model/consent)
Reward Labeling: Ragas + Qwen3-4B Judge Fleet (KServe + KEDA)
GRPO/DPO Training: NeMo-RL or TRL (Karpenter Spot p5en.48xlarge × 3 nodes)
Eval Gate: Threshold verification + Canary 5% → 25% → 100% (kgateway)
Registry & Rollback: MLflow + Agent Versioning + automatic rollback

Key Points:

Cost Efficiency: Spot instances + bi-weekly iterations → ~$4K/month
Quality Improvement: Target 1% monthly faithfulness increase
Safety: Eval Gate + gradual Canary + automatic rollback
ROI: Potential 400% revenue increase versus training cost

Next Steps

Self-Improving Agent Loop — Design architecture and strategy
Custom Model Pipeline — SFT training prerequisites
Cascade Routing Tuning — Post-deployment routing optimization
Agent Versioning — Model/code/prompt synchronization

References

Official Documentation

NVIDIA NeMo Framework — Large-scale model training and RLHF
HuggingFace TRL — DPO/PPO reference implementation
MLflow — Model registry and version management
Gateway API — Canary traffic splitting

Papers & Technical Blogs

GRPO Paper (arxiv 2402.03300) — Group Relative Policy Optimization
DPO Paper (arxiv 2305.18290) — Direct Preference Optimization

Production Checklist

Overview
Why Continuous Training
5-Stage Pipeline Flow
Sub-Documents
Summary
Next Steps
References

Continuous Training Pipeline

Overview

Why Continuous Training

5-Stage Pipeline Flow

Sub-Documents

📄️ Trace to Dataset

📄️ GRPO/DPO Training

📄️ Evaluation & Rollout

Summary

Next Steps

References

Official Documentation

Papers & Technical Blogs

Overview​

Why Continuous Training​

5-Stage Pipeline Flow​

Sub-Documents​

📄️ Trace to Dataset

📄️ GRPO/DPO Training

📄️ Evaluation & Rollout

Summary​

Next Steps​

References​

Official Documentation​

Papers & Technical Blogs​

Related Documents​

Overview

Why Continuous Training

5-Stage Pipeline Flow

Sub-Documents

Summary

Next Steps

References

Official Documentation

Papers & Technical Blogs

Related Documents