4 docs tagged with "evaluation"

AIDLC Evaluation Framework

Evaluation-driven Loop in Agent/LLM Development Process — Comparison of SWE-bench Verified, METR, Ragas, DeepEval, LangSmith, Braintrust, AWS Labs aidlc-evaluator

Governance · Evaluation · Compliance

Comprehensive governance documentation covering quality evaluation, operational playbooks, AI Gateway guardrails, compliance, and domain customization

Ragas RAG Evaluation Framework

RAG pipeline quality evaluation and continuous improvement using Ragas

Trace → Dataset Materializer

Load Langfuse OTel traces into S3 Parquet/Iceberg and automatically construct GRPO/DPO training datasets by labeling rewards with Ragas + LLM Judge Fleet.