Skip to main content

Cross-Cluster Object Replication (HA) Architecture Guide

Written: 2026-03-24 | Updated: 2026-03-24 | Reading time: ~12 min

Reference Environment: EKS 1.32+, ArgoCD 2.13+, Flux v2.4+, Velero 1.15+

1. Overview

Relying on a single EKS cluster in production means a cluster failure brings down the entire service. Cross-Cluster Object Replication is a strategy that ensures high availability by consistently replicating Kubernetes objects (ConfigMaps, Secrets, RBAC, CRDs, NetworkPolicies, etc.) across multiple clusters.

Current State

EKS does not provide managed Cross-Cluster Object Replication. Therefore, you must implement it yourself by combining open-source tools and architecture patterns. This guide compares the pros and cons of each pattern and provides selection criteria based on workload types.

Scope of This Guide

IncludedNot Included
K8s object replication (ConfigMap, Secret, CRD, RBAC, etc.)Application data replication (DB replicas)
GitOps-based declarative synchronizationService mesh-based traffic routing
Stateful object backup/restore (Velero)Storage layer replication (EBS, EFS)
DNS failover strategiesApplication-level HA patterns

2. Multi-Cluster Architecture Pattern Comparison

There are three core patterns for implementing Cross-Cluster Object Replication.

Pattern 1: API Proxy (Push Model)

A central routing layer directly proxies CRUD requests to each cluster's API Server.

  • How it works: Direct API calls from a central point to each cluster
  • Pros: Lightweight and intuitive
  • Cons: Credential security vulnerabilities, no multi-cluster Watch support, increasing connection complexity

Pattern 2: Multi-cluster Controller (Kubefed-style)

A central controller monitors each cluster's state via Informer-based List-Watch and synchronizes through CRDs.

  • How it works: Central controller monitors and synchronizes each cluster's state
  • Pros: Dynamic cluster discovery, federation policies
  • Cons: Watch event overflow at ~10+ clusters, Informer cache size limits, plaintext credential storage risk
Kubefed Project Status

Kubefed (v2) is effectively in maintenance mode by the Kubernetes SIG. It is not recommended for new projects.

Agents in each cluster pull the desired state from a central source (Git or hub cluster) and reconcile locally. This follows the same principle as kubelet receiving Pod specs and running them locally.

  • How it works: Each cluster agent independently pulls the desired state and reconciles locally
  • Pros: High scalability, eventual consistency, local operation continues even during central failures
  • Cons: Requires agent deployment on all clusters

Pattern Comparison Summary

AspectAPI ProxyMulti-cluster ControllerAgent-based Pull
OperationCentral → Cluster PushCentral Watch + CRD SyncCluster → Central Pull
ScalabilityLow (proportional to connections)Medium (~10 clusters)High (hundreds of clusters)
ComplexityLowHighMedium
SecurityWeak (many credentials)Weak (plaintext storage)Strong (agent local permissions)
Fault IsolationLowMediumHigh
Drift DetectionNonePartialBuilt-in
Recommended ForPoC, small scaleLegacy environmentsProduction (recommended)

Decision Flowchart


Uses a Git repository as the Single Source of Truth, with GitOps agents in each cluster independently pulling and reconciling.

Key Benefits:

  • Drift Detection: Automatically detects and recovers when cluster state differs from Git
  • Audit Trail: All change history is recorded as Git commits
  • Declarative Management: Define the desired state and let agents reconcile
  • Fault Isolation: An agent failure in one cluster does not affect others

Active-Active Configuration:

Both clusters independently pull from the same Git repo. DNS (Route 53) distributes traffic, and if one cluster fails, the remaining cluster immediately handles all traffic.

Active-Passive Configuration:

Only the Active cluster has its GitOps agent enabled. The Passive cluster keeps its agent in Suspended state, activating it during failover.

Option B: ArgoCD Hub-and-Spoke Model

Install ArgoCD on a Management Cluster and deploy to multiple workload clusters via ApplicationSets.

HA Strategies:

StrategyDescriptionSuitable Scenario
Active-Passive MirroringDeploy ArgoCD in two regions; Passive keeps controllers disabled. Manual Scale-Up during failoverEnvironments with low DR requirements
Active-Active Sync WindowsTwo ArgoCD instances sync during non-overlapping time windows (Sync Windows feature)Active-Active requiring conflict prevention
ApplicationSets Generator

Using ArgoCD ApplicationSets' Cluster Generator, applications can be automatically deployed to all clusters registered with ArgoCD. When a new cluster is added, replication starts immediately without additional configuration.

Option C: Custom Controller (MirrorController Pattern)

When fine-grained control over object replication is needed, develop a dedicated controller to manage synchronization between source and target clusters.

Use Cases:

  • Selective replication of only objects with specific Labels/Annotations
  • Object transformation during replication (e.g., Namespace changes, field modifications)
  • Custom conflict resolution logic

Pros and Cons:

ProsCons
Clear separation of concernsAdditional operational overhead
Reduced core logic complexityPotential synchronization delays
Fine-grained replication policy controlIncreased debugging complexity
Custom conflict resolutionRequires in-house development/maintenance

4. Active-Active vs Active-Passive Decision

Comparison Table

AspectActive-ActiveActive-Passive
Object SyncBoth clusters independently pull from same Git sourceOnly Active reconciles; Passive stands by
Failover TimeNear-zero (both already serving)Minutes (Passive activation required)
Conflict ResolutionWrite conflicts possible — prevention via Sync Windows neededNo conflicts — single writer
Operational ComplexityHigh (object IDs, DNS, state synchronization)Low (standard failover model)
CostHigh (full capacity on both sides)Low (Passive can run at reduced capacity)
Suitable ScenarioMulti-region HA, global load balancingDR, cost-sensitive HA

5. Supporting Tool Stack

Object replication alone cannot achieve complete Cross-Cluster HA. Combine the following tools to build the full stack.

ToolRoleNotes
Flux / ArgoCDK8s object replication (GitOps)Core replication mechanism
Route 53DNS-based failover/load balancingHealth Check + Failover Routing
Global AcceleratorAnycast IP-based global routingFor multi-region Active-Active
VeleroStateful object backup/restore (PV, etcd)Combined with S3 Cross-Region Replication
External Secrets OperatorSecret synchronizationAWS Secrets Manager → both clusters
Crossplane / ACKAWS resource definition syncManage IaC as K8s objects

Tool Combination Architecture


6. Current Limitations and Future Outlook

There are features in the EKS multi-cluster management space that are not yet available as managed services.

AreaCurrent StateAlternative
Managed ClusterSetsNot releasedRAM (Resource Access Manager) for Cross-Account grouping
Built-in Cross-Cluster ReplicationNot releasedGitOps (Flux/ArgoCD)
Multi-Region EKS ClusterNot releasedIndependent clusters per region + GitOps sync
Managed ArgoCDIn developmentSelf-managed ArgoCD installation
Practical Approach

Until these features are released, the GitOps + supporting tool stack combination is the most mature and proven approach. Already about 10% of EKS customers have adopted GitOps based on Flux/ArgoCD.


Final recommended tool combinations for eliminating single-cluster dependency.

PurposeRecommended ToolConfiguration
K8s Object ReplicationGitOps (Flux or ArgoCD)Both clusters pull from the same Git repo
Stateful Data ProtectionVelero + S3 Cross-Region ReplicationScheduled backup + cross-region replication
Secret SynchronizationExternal Secrets OperatorAWS Secrets Manager as shared source
DNS FailoverRoute 53 Health ChecksActive-Active or Failover Routing
CRD/Custom ResourcesInclude in GitOps repoManaged identically to standard K8s objects
AWS Resource DefinitionsCrossplane or ACKSync IaC natively in K8s

Implementation Priority

  1. P0: Deploy GitOps agents + design Git repo structure
  2. P1: Configure External Secrets Operator + Route 53 Health Checks
  3. P2: Establish Velero backup policies + S3 Cross-Region Replication
  4. P3: AWS resource sync with Crossplane/ACK (as needed)


9. References