Infrastructure Setup¶

The GenAI on EKS Starter Kit provisions AWS infrastructure and Kubernetes components using Terraform. All infrastructure code is located in the terraform/ directory.

AWS Services¶

The following AWS services are provisioned:

VPC and Networking¶

VPC: One Virtual Private Cloud with private and public subnets
NAT Gateway: Single NAT gateway for outbound internet access from private subnets
Subnets: Public and private subnets across multiple availability zones

Amazon EKS¶

EKS Cluster: One EKS cluster in Auto Mode (default) or Standard Mode
EKS_MODE: Configure via environment variable
- auto (default): EKS Auto Mode with managed compute
- standard: Standard EKS with self-managed node groups

EKS Auto Mode

EKS Auto Mode automatically provisions and manages compute resources, simplifying cluster operations and reducing operational overhead.

Amazon EFS¶

EFS File System: Shared file system for:
- Caching Hugging Face models
- Persistent storage for AI workloads
- Compiled Neuron models for AWS Inferentia

The EFS file system is mounted to pods that require persistent storage.

AWS Certificate Manager (ACM)¶

Wildcard Certificate: Automatically provisioned for your domain
Domain Validation: Uses DNS validation via Route 53
Coverage: *.<DOMAIN> allows all subdomains

Domain Requirement

ACM certificate is only created when DOMAIN is configured. Without a domain, services use HTTP with multiple ALBs.

Kubernetes Components¶

The following Kubernetes components are deployed on the EKS cluster:

Ingress and Load Balancing¶

AWS Load Balancer Controller

Provisions shared Application Load Balancer (ALB)
Manages ingress resources
Integrates with ACM for TLS termination

Ingress Configuration

With Domain: Single shared ALB with HTTPS
Without Domain: Multiple ALBs with HTTP per service

DNS Management¶

ExternalDNS

Automatically creates Route 53 DNS records for services
Syncs with Kubernetes ingress resources
Manages records for public-facing services

Automatic DNS Records

When you deploy LiteLLM, ExternalDNS automatically creates litellm.<DOMAIN> pointing to the ALB.

Authentication¶

Ingress NGINX Controller

Provides HTTP Basic authentication for services
Used for services requiring access control:
- Qdrant vector database
- Milvus vector database
- Monitoring dashboards

Domain Limitation

Without a domain, only one service with Nginx Ingress basic auth can be exposed due to multiple ALB limitations.

Storage¶

EFS CSI Driver

Enables EFS file system mounting in pods
Provides persistent storage for AI models
Supports ReadWriteMany access mode

Storage Classes

Two storage classes are configured:

Storage Class	Type	Use Case
`ebs-gp3`	EBS GP3	Block storage for individual pods
`efs`	EFS	Shared storage across pods

# Example: Using EFS storage
persistentVolumeClaim:
  storageClass: efs
  size: 100Gi

Infrastructure Management¶

Terraform Workspaces¶

Multiple EKS clusters are supported using Terraform workspaces:

Workspace name is derived from REGION and EKS_CLUSTER_NAME
Each cluster has isolated Terraform state
Switch clusters by changing environment variables

Apply Infrastructure Changes¶

Apply Terraform changes manually:

./cli terraform apply

View Infrastructure State¶

Check current Terraform state:

./cli terraform show

Destroy Infrastructure¶

Remove all provisioned infrastructure:

./cli cleanup-infra

Destructive Operation

This destroys all AWS resources. Ensure you've backed up any data and uninstalled components first.

GPU Instance Configuration¶

Default Configuration¶

Instance Families: g6e, g6, g5g
Purchasing Options: Spot and On-Demand

Customization¶

Modify terraform/0-common.tf to change:

GPU instance families
Purchasing options (Spot vs On-Demand)
Instance size and count limits

# Example: terraform/0-common.tf
variable "gpu_instance_families" {
  default = ["g6e", "g6", "g5g"]
}

After changes, apply the configuration:

./cli terraform apply

Node Selectors

Model deployment manifests use nodeSelector to target specific instance families. Update manifests accordingly when changing instance families.

AWS Neuron and Inferentia Support¶

For LLM inference on AWS Inferentia 2 instances:

Enable Neuron in configuration:

// config.json or config.local.json
{
  "enableNeuron": true
}

Reinstall the component:
```
./cli llm-model vllm install
```
This builds and pushes the vLLM Neuron container image to ECR (20-30 minutes).
First deployment compiles the model (20-30 minutes)
Compiled models are cached on EFS for subsequent deployments

INT8 Quantization on inf2.xlarge¶

Models like Llama-3.1-8B-Instruct support INT8 quantization for single inf2.xlarge, but require inf2.8xlarge for compilation:

Deploy with compilation enabled (uses inf2.8xlarge)
Change "compile": true to "compile": false in config
Delete and redeploy (uses inf2.xlarge)

ECR Pull Through Cache¶

Cache external container images in your private ECR registry.

Default: Disabled¶

ECR Pull Through Cache is disabled by default to avoid storage costs.

When to Enable¶

Hitting Docker Hub rate limits
Require faster, more reliable pulls from within AWS
Organization policies require private registry storage

Configuration¶

Get Docker Hub credentials:
- Create account at hub.docker.com
- Generate access token at Security Settings
Get GitHub credentials:
- Generate Personal Access Token at GitHub Settings
- Requires read:packages scope

Configure in config.local.json:

{
  "terraform": {
    "vars": {
      "enable_ecr_pull_through_cache": true,
      "dockerhub_username": "your-username",
      "dockerhub_access_token": "your-token",
      "github_username": "your-username",
      "github_token": "your-token"
    }
  }
}

Apply changes:
```
./cli terraform apply
```

Supported Registries¶

vllm/* → Docker Hub
lmsysorg/* → Docker Hub
ollama/* → Docker Hub
huggingface/* → GitHub Container Registry

Manual Cleanup Required

terraform destroy removes cache rules but not cached repositories. Manually delete repositories starting with vllm/, lmsysorg/, ollama/, or huggingface/ from ECR.

Multi-Cluster Management¶

Provision and manage multiple EKS clusters:

Change environment variables in .env.local:

REGION=us-east-1
EKS_CLUSTER_NAME=genai-dev
DOMAIN=dev.example.com

Run setup:
```
./cli demo-setup
```

Terraform workspace and kubectl context automatically use the configured values.

Next Steps¶

Return to Quick Start to deploy components
Follow the Demo Walkthrough to explore features
Check Security Considerations for best practices