Migration Execution Strategy

Phase 1: Planning & Assessment (2 weeks)

Objectives:

Inventory current NGINX Ingress resources
Evaluate technology selection
Develop risk management plan

Tasks:

Current State Analysis

# Export all Ingress resources
kubectl get ingress -A -o yaml > nginx-ingress-inventory.yaml

# Analyze annotation usage
kubectl get ingress -A -o json | \
  jq -r '.items[].metadata.annotations | keys[]' | \
  sort | uniq -c | sort -rn

# Check TLS certificate count
kubectl get ingress -A -o json | \
  jq -r '.items[].spec.tls[].secretName' | sort | uniq | wc -l

Technology Selection

Review Section 5 solution comparison
Conduct stakeholder interviews
Evaluate budget and operational capabilities

Risk Assessment

# Risk register example
risks:
  - id: R1
    description: "Traffic loss during migration"
    probability: Medium
    impact: Critical
    mitigation: "Blue-Green deployment, gradual traffic shift"

  - id: R2
    description: "Performance degradation"
    probability: Low
    impact: High
    mitigation: "Pre-migration benchmarking, rollback plan"

  - id: R3
    description: "TLS certificate management issues"
    probability: Medium
    impact: Medium
    mitigation: "Test in PoC, automate with cert-manager"

Deliverables:

NGINX Ingress inventory spreadsheet
Technology selection decision document
Migration project plan
Risk register and mitigation strategies

Phase 2: PoC Environment Setup (2 weeks)

Objectives:

Validate selected solution in isolated environment
Develop migration scripts
Train team

Tasks:

Create PoC Cluster

# Create test EKS cluster
eksctl create cluster \
  --name gateway-api-poc \
  --region us-west-2 \
  --version 1.32 \
  --nodegroup-name poc-workers \
  --node-type m5.large \
  --nodes 2

Install Selected Solution (Example: Cilium)

# Install Cilium Gateway API
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set gatewayAPI.enabled=true \
  --set kubeProxyReplacement=true

Feature Validation

Migrate 3-5 representative Ingress resources to HTTPRoute
Test authentication, rate limiting, URL rewrite
Performance benchmarking (baseline vs Gateway API)

Documentation

# PoC Report Template

## Test Summary
- **Duration**: 2026-01-15 to 2026-01-28
- **Solution**: Cilium Gateway API v1.19
- **Test Cases**: 12 (11 passed, 1 issue)

## Performance Results
| Metric | NGINX Ingress | Cilium Gateway | Improvement |
|--------|---------------|----------------|-------------|
| P95 Latency | 45ms | 12ms | 73% reduction |
| RPS (single instance) | 8,500 | 24,000 | 182% increase |

## Issues Encountered
1. **Issue**: Rate limiting configuration complexity
   **Resolution**: Created helper script (see scripts/rate-limit-helper.sh)

## Recommendation
✅ Proceed to Phase 3 (Parallel Operation)

Deliverables:

PoC cluster running selected solution
Migration script templates
PoC test report
Team training materials

Phase 3: Parallel Operation (Staging) (3 weeks)

Objectives:

Deploy Gateway API alongside NGINX in staging
Validate with production-like traffic
Refine operational procedures

Tasks:

Deploy Gateway API in Staging

# GatewayClass
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: cilium
spec:
  controllerName: io.cilium/gateway-controller

---
# Gateway (parallel to NGINX)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: staging-gateway
  namespace: gateway-system
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 8443  # Different port from NGINX (443)
    tls:
      certificateRefs:
      - name: staging-tls

Traffic Mirroring (if supported)

# AWS ALB example: Mirror 10% traffic to Gateway API
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: mirrored-route
spec:
  rules:
  - backendRefs:
    - name: app-service
      port: 80
      weight: 90  # 90% to NGINX
    - name: app-service-via-gateway
      port: 80
      weight: 10  # 10% to Gateway API (testing)

Monitoring Setup

# Prometheus ServiceMonitor for Cilium
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cilium-gateway
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cilium
  endpoints:
  - port: prometheus
    interval: 30s

Deliverables:

Gateway API deployed in staging
7 days of parallel operation data
Monitoring dashboards configured
Incident response procedures documented

Phase 4: Gradual Production Migration (4 weeks)

Objectives:

Migrate production traffic with zero downtime
Monitor and validate each step
Quick rollback capability

Tasks:

Week 1: Deploy Gateway API (0% traffic)

# Deploy Gateway API infrastructure
kubectl apply -f production/gatewayclass.yaml
kubectl apply -f production/gateway.yaml

# Verify readiness
kubectl wait --for=condition=Ready gateway/production-gateway -n gateway-system --timeout=300s

Week 2: Canary Migration (10% traffic)

# HTTPRoute with 90/10 split
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-canary
spec:
  parentRefs:
  - name: production-gateway
  rules:
  - backendRefs:
    - name: api-service-nginx
      port: 80
      weight: 90  # NGINX Ingress
    - name: api-service-gateway
      port: 80
      weight: 10  # Gateway API

Monitoring:

# Compare error rates
kubectl top pods -l app=api-service
kubectl logs -l app=api-service --tail=1000 | grep ERROR | wc -l

# Latency comparison
curl -w "@curl-format.txt" https://api.example.com/health

Week 3: Increase to 50% traffic

# Update HTTPRoute weight
kubectl patch httproute api-canary --type=json \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 50},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 50}]'

# Monitor for 48 hours

Week 4: Complete Migration (100% traffic)

# Update to 100% Gateway API
kubectl patch httproute api-canary --type=json \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 0},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 100}]'

# Monitor for 7 days before Phase 5

Rollback Plan:

# Emergency rollback script
#!/bin/bash
# rollback-to-nginx.sh

echo "Rolling back to NGINX Ingress..."

# Revert traffic to 100% NGINX
kubectl patch httproute api-canary --type=json \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 100},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 0}]'

# Verify NGINX Ingress health
kubectl get ingress -A
kubectl describe ingress <ingress-name>

echo "Rollback complete. Verify traffic flow."

Deliverables:

Week 1: Gateway API infrastructure deployed (0% traffic)
Week 2: Canary validated (10% traffic)
Week 3: Half migration validated (50% traffic)
Week 4: Full migration complete (100% traffic)
No P1/P2 incidents during migration

Phase 5: NGINX Decommission (1 week)

Objectives:

Safely remove NGINX Ingress Controller
Archive configurations for audit
Close migration project

Tasks:

Final Validation (Day 1-2)

# Verify all traffic on Gateway API
kubectl get httproute -A

# Check for any remaining Ingress resources
kubectl get ingress -A

# Validate metrics
# - No increase in error rates
# - Latency within acceptable range
# - No customer complaints

Archive NGINX Configuration (Day 3)

# Backup all NGINX resources
kubectl get ingress,configmap,secret -A -o yaml > nginx-archive-$(date +%Y%m%d).yaml

# Store in version control
git add nginx-archive-*.yaml
git commit -m "Archive NGINX Ingress configuration before decommission"
git push

Delete NGINX Resources (Day 4-5)

# Delete NGINX Ingress Controller
helm uninstall nginx-ingress -n ingress-nginx

# Delete namespace
kubectl delete namespace ingress-nginx

# Delete remaining Ingress resources
kubectl delete ingress --all -A

# Verify cleanup
kubectl get all -n ingress-nginx  # Should be empty

Post-Migration Review (Day 6-7)

# Migration retrospective template
## What Went Well
- Gradual traffic migration prevented incidents
- Monitoring provided clear visibility
- Team training was effective

## What Could Be Improved
- PoC phase took longer than expected
- Need better automation for weight updates

## Action Items
- [ ] Document migration procedures for future clusters
- [ ] Create runbooks for common Gateway API issues
- [ ] Schedule quarterly training on Gateway API features

## Metrics
- **Total Duration**: 11 weeks (planned 12)
- **Incidents**: 0 P1/P2, 2 P3 (resolved in under 1hr)
- **Performance**: 15% latency improvement
- **Cost**: No change (Cilium open-source)

Deliverables:

NGINX Ingress fully decommissioned
Configuration archived in version control
Post-migration report
Team retrospective completed

1. Migration Approach

2. 5-Phase Detailed Plan

1. Migration Approach​

2. 5-Phase Detailed Plan​

Related Documents​

1. Migration Approach

2. 5-Phase Detailed Plan

Related Documents