Prompt & Model Registry
Why Prompt Registry is Needed
Central repository managing prompt and model versions like code. Solves the following problems:
- Version Tracking: "It worked yesterday but is strange today" → track which prompts changed
- Environment-specific Deployment: Use different versions for staging/canary/production environments
- Rollback: Immediate recovery to previous version when issues occur
- Audit Evidence: Retain change history for financial/medical regulatory compliance
Langfuse Prompt
Langfuse is a self-hostable LLMOps platform. Prompt registry features:
- Version Control: Automatic version increment with each change (
v1,v2, ...) - Labels: Apply environment-specific labels like
production,staging,canary - Rollout Management: Applications reference only the label when a specific label is attached to a specific version (
get_prompt("financial-analysis", label="production")) - Diff View: Visualize changes between versions
- Access Log: Track which session used which prompt version
from langfuse import Langfuse
client = Langfuse()
# Query prompt version
prompt = client.get_prompt("financial-analysis", label="production")
print(prompt.version) # Example: 5
print(prompt.prompt) # Actual text
# Deploy new version
client.create_prompt(
name="financial-analysis",
prompt="You are a conservative investment advisor...",
labels=["staging"] # Deploy to staging first
)
# After validation
client.update_prompt_label("financial-analysis", version=6, label="production")
Advantages:
- Self-hosted, RBAC, S3+KMS backend capable
- Integrated with observability (automatically record prompt version in trace)
Disadvantages:
- Python SDK-centric (TypeScript SDK exists but with limited features)
- Simple UI (shows diff but no approval workflow)
PromptLayer
PromptLayer is a SaaS prompt registry.
- Version Tagging: Git-style tags (
v1.0,v1.1-alpha) - Visual Diff: Highlight word-level changes between versions
- A/B Experiments: Deploy two versions simultaneously and compare performance
- Analytics: Dashboard for latency, token usage, error rate by version
Advantages:
- Ready to use without installation
- Team collaboration features (comments, approval workflow)
Disadvantages:
- SaaS only (no on-premises)
- Data sovereignty issues (prompts stored on external servers)
Braintrust Prompts
Braintrust is an evaluation platform that also provides prompt management features.
- Playground: Write prompt → immediately evaluate with test set
- Versioning: Automatic version increment + commit message
- Dataset Integration: Automatically trigger evaluation run on prompt changes
- Experimentation: Side-by-side comparison of two prompt versions
Advantages:
- Evaluation and prompt management in one platform
- Automatically check quality regression on every change
Disadvantages:
- Strong in development/test stages rather than runtime deployment
- Weak production rollout features (requires direct implementation)
AWS Bedrock Prompt Management
AWS Bedrock provides Prompt Management features (GA November 2024).
- Prompt Versions: Create immutable versions with
CreatePromptVersionAPI - Alias: Connect aliases like
PROD,STAGINGto versions - IAM Integration: Set policies to allow only specific versions
- CloudTrail: Audit logs of who deployed which version and when
import boto3
bedrock = boto3.client('bedrock-agent')
# Create new version
response = bedrock.create_prompt_version(
promptIdentifier='arn:aws:bedrock:us-east-1:123456789012:prompt/fin-analysis',
description='Changed to conservative investment advisor style'
)
version_id = response['version']
# Update production alias
bedrock.update_prompt_alias(
promptIdentifier='arn:aws:bedrock:us-east-1:123456789012:prompt/fin-analysis',
aliasIdentifier='PROD',
promptVersion=version_id
)
Advantages:
- AWS native, IAM/CloudTrail/KMS integration
- Directly referenceable from Lambda, Step Functions
Disadvantages:
- Requires Bedrock model usage (Claude, Llama, etc.)
- Requires separate configuration from self-hosted LLM (vLLM, llm-d)
Comparison Table
| Feature | Langfuse | PromptLayer | Braintrust | Bedrock PM |
|---|---|---|---|---|
| Deployment Method | Self-hosted | SaaS | SaaS | AWS Managed |
| Version Control | ✅ | ✅ | ✅ | ✅ |
| Labels/Alias | ✅ | ✅ | ❌ | ✅ |
| Visual Diff | Basic | ✅ | ✅ | ❌ |
| Approval Workflow | ❌ | ✅ | ❌ | ❌(Implement with IAM) |
| A/B Experiments | Manual | ✅ | ✅ | Manual |
| Auto Evaluation Integration | Possible (trace-based) | ❌ | ✅ | Possible (Lambda) |
| Data Sovereignty | ✅ | ❌ | ❌ | ✅(Within region) |
| AIDLC Suitability | ⭐ High | Medium (SaaS) | High (Eval-focused) | High (Bedrock only) |
AIDLC Recommendation: Langfuse (self-hosted requirement) or AWS Bedrock PM (when using Bedrock). PromptLayer/Braintrust if SaaS is allowed.
Implementation Guide
Langfuse Self-hosted Deployment
# langfuse-values.yaml (Helm)
replicaCount: 2
postgresql:
enabled: true
auth:
password: "secure-password"
env:
- name: DATABASE_URL
value: "postgresql://user:pass@postgres:5432/langfuse"
- name: NEXTAUTH_SECRET
valueFrom:
secretKeyRef:
name: langfuse-secrets
key: nextauth-secret
- name: S3_BUCKET_NAME
value: "langfuse-prompts"
- name: S3_ENDPOINT
value: "https://s3.us-east-1.amazonaws.com"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
helm repo add langfuse https://langfuse.github.io/langfuse-k8s
helm install langfuse langfuse/langfuse -f langfuse-values.yaml
Bedrock Prompt Management Setup
import boto3
bedrock = boto3.client('bedrock-agent', region_name='us-east-1')
# 1. Create prompt
prompt_response = bedrock.create_prompt(
name='financial-analysis',
description='Financial analysis expert prompt',
variants=[{
'name': 'default',
'templateType': 'TEXT',
'templateConfiguration': {
'text': {
'text': 'You are a financial analysis expert...'
}
}
}]
)
prompt_arn = prompt_response['arn']
# 2. Create version
version_response = bedrock.create_prompt_version(
promptIdentifier=prompt_arn,
description='v1 initial version'
)
# 3. Create PROD alias
bedrock.create_prompt_alias(
promptIdentifier=prompt_arn,
name='PROD',
promptVersion='1'
)
References
- Langfuse Prompts: langfuse.com/docs/prompts
- PromptLayer: promptlayer.com
- Braintrust Prompts: braintrust.dev/docs/guides/prompts
- AWS Bedrock Prompt Management: AWS Documentation
Next Steps
Once you've built the prompt registry:
- Deployment Strategies — Select and implement Canary/Shadow/A-B strategies
- Governance & Automation — Build automatic regression detection and rollback system
- Evaluation Framework — Integrate Golden Dataset-based evaluation