Control Plane Debugging

Published 2026-04-21Updated 2026-06-3010 min read

Control Plane Log Types

The EKS control plane can send five log types to CloudWatch Logs.

📋 EKS 컨트롤 플레인 로그 타입

로그 그룹: /aws/eks/<cluster-name>/cluster

apikube-apiserver

API 요청/응답 기록kube-apiserver-audit-*

auditkube-apiserver-audit

감사 로그 (누가, 무엇을, 언제)kube-apiserver-audit-*

authenticatoraws-iam-authenticator

IAM 인증 이벤트authenticator-*

controllerManagerkube-controller-manager

컨트롤러 동작 로그kube-controller-manager-*

schedulerkube-scheduler

스케줄링 결정 및 실패scheduler-*

Enabling Logs

# Enable all control plane logs
aws eks update-cluster-config \
  --region <region> \
  --name <cluster-name> \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

Cost Optimization

Enabling all log types increases CloudWatch Logs cost. For production, enabling audit and authenticator as a baseline and only turning on the rest during debugging is recommended.

CloudWatch Logs Insights Queries

API Server Error (400+) Analysis

fields @timestamp, @message
| filter @logStream like /kube-apiserver-audit/
| filter responseStatus.code >= 400
| stats count() by responseStatus.code
| sort count desc

Authentication Failure Tracking

fields @timestamp, @message
| filter @logStream like /authenticator/
| filter @message like /error/ or @message like /denied/
| sort @timestamp desc

Detecting Changes to aws-auth ConfigMap

fields @timestamp, @message
| filter @logStream like /kube-apiserver-audit/
| filter objectRef.resource = "configmaps" and objectRef.name = "aws-auth"
| filter verb in ["update", "patch", "delete"]
| sort @timestamp desc

API Throttling Detection

fields @timestamp, @message
| filter @logStream like /kube-apiserver/
| filter @message like /throttle/ or @message like /rate limit/
| stats count() by bin(5m)

Unauthorized Access Attempts (Security Events)

fields @timestamp, @message
| filter @logStream like /kube-apiserver-audit/
| filter responseStatus.code = 403
| stats count() by user.username
| sort count desc

AuthN/AuthZ Debugging

IAM Authentication Check

# Check current IAM credentials
aws sts get-caller-identity

# Check cluster authentication mode
aws eks describe-cluster --name <cluster-name> \
  --query 'cluster.accessConfig.authenticationMode' --output text

aws-auth ConfigMap (CONFIG_MAP Mode)

# View aws-auth ConfigMap
kubectl describe configmap aws-auth -n kube-system

EKS Access Entries (API / API_AND_CONFIG_MAP Mode)

# Create an Access Entry
aws eks create-access-entry \
  --cluster-name <cluster-name> \
  --principal-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME \
  --type STANDARD

# List Access Entries
aws eks list-access-entries --cluster-name <cluster-name>

IRSA (IAM Roles for Service Accounts) Debugging Checklist

# 1. Check ServiceAccount annotations
kubectl get sa <sa-name> -n <namespace> -o yaml

# 2. Check AWS environment variables inside the Pod
kubectl exec -it <pod-name> -- env | grep AWS

# 3. Check the OIDC Provider
aws eks describe-cluster --name <cluster-name> \
  --query 'cluster.identity.oidc.issuer' --output text

# 4. Check the IAM Role Trust Policy for OIDC Provider ARN and conditions
aws iam get-role --role-name <role-name> \
  --query 'Role.AssumeRolePolicyDocument'

Common IRSA Mistakes

Typo in the role ARN on the ServiceAccount annotation
Mismatch of namespace/sa names in the IAM Role Trust Policy
OIDC Provider not linked with the cluster
Pod not configured to use the ServiceAccount (missing spec.serviceAccountName)

Service Account Token Expiration (HTTP 401 Unauthorized)

In Kubernetes 1.21+, service account tokens are valid for 1 hour by default and are automatically rotated by kubelet. However, applications using legacy SDKs lack token refresh logic, which can cause 401 Unauthorized errors in long-running workloads.

Symptoms:

After a certain period (typically 1 hour), Pods suddenly return HTTP 401 Unauthorized errors
After restart, operations work briefly and then 401 errors recur

Cause:

Projected Service Account Tokens expire after 1 hour by default
kubelet rotates tokens automatically, but applications that read the token file once and cache it will keep using the expired token

Minimum Required SDK Versions:

Language	SDK	Minimum Version
Go	client-go	v0.15.7+
Python	kubernetes	12.0.0+
Java	fabric8	5.0.0+

Token Refresh Verification

Verify that the SDK supports automatic token refresh. If not, the application must periodically re-read /var/run/secrets/kubernetes.io/serviceaccount/token.

EKS Pod Identity Debugging

EKS Pod Identity is an alternative to IRSA that grants AWS IAM permissions to Pods with simpler configuration.

# Check Pod Identity Associations
aws eks list-pod-identity-associations --cluster-name $CLUSTER
aws eks describe-pod-identity-association --cluster-name $CLUSTER \
  --association-id $ASSOC_ID

# Check Pod Identity Agent status
kubectl get pods -n kube-system -l app.kubernetes.io/name=eks-pod-identity-agent
kubectl logs -n kube-system -l app.kubernetes.io/name=eks-pod-identity-agent --tail=50

Pod Identity debugging checklist:

Is the eks-pod-identity-agent Add-on installed?
Is the Pod's ServiceAccount linked to the correct association?
Does the IAM Role trust policy include the pods.eks.amazonaws.com service principal?

Pod Identity vs IRSA

Pod Identity is simpler to configure than IRSA and makes cross-account access easier. For new workloads, Pod Identity is recommended.

EKS Add-on Troubleshooting

# List Add-ons
aws eks list-addons --cluster-name <cluster-name>

# Detailed Add-on status
aws eks describe-addon --cluster-name <cluster-name> --addon-name <addon-name>

# Update Add-on (resolve conflicts: PRESERVE keeps existing configuration)
aws eks update-addon --cluster-name <cluster-name> --addon-name <addon-name> \
  --addon-version <version> --resolve-conflicts PRESERVE

Add-on	Common Error Patterns	Diagnosis	Resolution
CoreDNS	Pod CrashLoopBackOff, DNS timeout	`kubectl logs -n kube-system -l k8s-app=kube-dns`	Inspect ConfigMap; `kubectl rollout restart deployment coredns -n kube-system`
kube-proxy	Service unreachable, iptables errors	`kubectl logs -n kube-system -l k8s-app=kube-proxy`	Check DaemonSet image version; `kubectl rollout restart daemonset kube-proxy -n kube-system`
VPC CNI	Pod IP allocation failures, ENI errors	`kubectl logs -n kube-system -l k8s-app=aws-node`	Check IPAMD logs, ENI/IP limits (see Networking doc)
EBS CSI	PVC Pending, volume attach failures	`kubectl logs -n kube-system -l app.kubernetes.io/name=aws-ebs-csi-driver`	Check IRSA permissions, AZ alignment (see Storage doc)

Cluster Health Issue Codes

When diagnosing infrastructure-level problems with the EKS cluster itself, check cluster health.

# Check cluster health issues
aws eks describe-cluster --name $CLUSTER \
  --query 'cluster.health' --output json

🏥 클러스터 헬스 이슈 코드

aws eks describe-cluster --query 'cluster.health'

SUBNET_NOT_FOUND

클러스터 서브넷이 삭제됨

→ 새 서브넷 연결 필요

⚠️ 조건부 복구

SECURITY_GROUP_NOT_FOUND

클러스터 보안그룹이 삭제됨

→ 보안그룹 재생성

⚠️ 조건부 복구

IP_NOT_AVAILABLE

서브넷에 IP 부족

→ 서브넷 추가/확장

✅ 복구 가능

VPC_NOT_FOUND

VPC가 삭제됨

→ 클러스터 재생성 필요

❌ 복구 불가

ASSUME_ROLE_ACCESS_DENIED

클러스터 IAM Role 권한 문제

→ IAM 정책 수정

✅ 복구 가능

KMS_KEY_DISABLED

Secrets 암호화 KMS 키 비활성화

→ KMS 키 재활성화

✅ 복구 가능

KMS_KEY_NOT_FOUND

KMS 키 삭제됨

→ 복구 불가

❌ 복구 불가

Unrecoverable Issues

VPC_NOT_FOUND and KMS_KEY_NOT_FOUND are unrecoverable. The cluster must be recreated.

RBAC / Pod Identity Debugging

ServiceAccount → IAM Role Mapping Failure

Symptoms:

Pods receive AccessDenied or UnauthorizedOperation errors when calling AWS APIs
IRSA or Pod Identity is used but permissions are not applied

Diagnosis:

# 1. Check ServiceAccount annotation (IRSA)
kubectl get sa <service-account> -n <namespace> -o jsonpath='{.metadata.annotations.eks\.amazonaws\.com/role-arn}'

# 2. Check Pod Identity Association
aws eks list-pod-identity-associations --cluster-name $CLUSTER \
  | jq '.associations[] | select(.serviceAccount=="<service-account>")'

# 3. Check whether environment variables are injected into the Pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.serviceAccountName}'
kubectl exec <pod-name> -n <namespace> -- env | grep AWS

# 4. Check IAM Role Trust Policy
aws iam get-role --role-name <role-name> \
  --query 'Role.AssumeRolePolicyDocument' --output json

Resolution:

For IRSA:

# Add annotation to ServiceAccount
kubectl annotate serviceaccount <sa-name> -n <namespace> \
  eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT:role/ROLE-NAME

# Pod restart required (annotations apply at Pod creation time)
kubectl rollout restart deployment/<deployment-name> -n <namespace>

For Pod Identity:

# Create Pod Identity Association
aws eks create-pod-identity-association \
  --cluster-name $CLUSTER \
  --namespace <namespace> \
  --service-account <service-account> \
  --role-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME

Mixing aws-auth ConfigMap with EKS Access Entries

Problem:

EKS 1.23+ introduces the Access Entries API which can replace aws-auth ConfigMap
Using both mechanisms together can produce unexpected authentication behavior

Check authentication mode:

# Check cluster authentication mode
aws eks describe-cluster --name <cluster-name> \
  --query 'cluster.accessConfig.authenticationMode' --output text

Authentication mode options:

Mode	Description	Recommended Use
`CONFIG_MAP`	Uses only aws-auth ConfigMap (legacy)	EKS 1.22 and earlier
`API`	Uses only Access Entries API	New clusters (EKS 1.23+)
`API_AND_CONFIG_MAP`	Both mechanisms allowed (default)	During migration

Migration guide:

# 1. Back up current aws-auth ConfigMap contents
kubectl get configmap aws-auth -n kube-system -o yaml > aws-auth-backup.yaml

# 2. Convert ConfigMap entries to Access Entries
aws eks create-access-entry \
  --cluster-name <cluster-name> \
  --principal-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME \
  --type STANDARD

# 3. Kubernetes RBAC mapping (as needed)
aws eks associate-access-policy \
  --cluster-name <cluster-name> \
  --principal-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME \
  --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \
  --access-scope type=cluster

# 4. After validation, switch authentication mode to API
aws eks update-cluster-config \
  --name <cluster-name> \
  --access-config authenticationMode=API

Caveats When Changing Authentication Mode

Switching from CONFIG_MAP to API causes aws-auth ConfigMap to be ignored. Migrate every IAM Principal to Access Entries before switching.

Permission Validation via kubectl auth can-i

# Check whether the current user has permission on a specific resource
kubectl auth can-i create deployments --namespace=production
kubectl auth can-i delete pods --namespace=kube-system

# Check permissions for a specific ServiceAccount
kubectl auth can-i list secrets --as=system:serviceaccount:default:my-sa

# List all permissions (current user)
kubectl auth can-i --list

# List all permissions in a specific namespace
kubectl auth can-i --list --namespace=production

Diagnosing Missing Pod Identity Association

Symptoms:

Pod Identity Agent is running normally but Pods lack AWS permissions
Pod environment variables do not include AWS_CONTAINER_CREDENTIALS_FULL_URI

Diagnosis:

# 1. Check Pod Identity Agent status
kubectl get daemonset eks-pod-identity-agent -n kube-system
kubectl get pods -n kube-system -l app.kubernetes.io/name=eks-pod-identity-agent

# 2. Check Associations
aws eks list-pod-identity-associations --cluster-name $CLUSTER

# 3. Check Association for a specific ServiceAccount
aws eks list-pod-identity-associations --cluster-name $CLUSTER \
  | jq --arg ns "default" --arg sa "my-service-account" \
    '.associations[] | select(.namespace==$ns and .serviceAccount==$sa)'

# 4. Check Association details
aws eks describe-pod-identity-association \
  --cluster-name $CLUSTER \
  --association-id <assoc-id>

Resolution:

# Create Pod Identity Association
aws eks create-pod-identity-association \
  --cluster-name $CLUSTER \
  --namespace <namespace> \
  --service-account <service-account> \
  --role-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME

# Restart Pod (Association applies at Pod creation time)
kubectl delete pod <pod-name> -n <namespace>

EKS Debugging Guide (Main) - Full debugging guide
Node Debugging - Node-level issue diagnosis
Workload Debugging - Pod and workload issue diagnosis
Networking Debugging - Network issue diagnosis

Control Plane Log Types​

Enabling Logs​

CloudWatch Logs Insights Queries​

API Server Error (400+) Analysis​

Authentication Failure Tracking​

Detecting Changes to aws-auth ConfigMap​

API Throttling Detection​

Unauthorized Access Attempts (Security Events)​

AuthN/AuthZ Debugging​

IAM Authentication Check​

aws-auth ConfigMap (CONFIG_MAP Mode)​

EKS Access Entries (API / API_AND_CONFIG_MAP Mode)​

IRSA (IAM Roles for Service Accounts) Debugging Checklist​

Service Account Token Expiration (HTTP 401 Unauthorized)​

EKS Pod Identity Debugging​

EKS Add-on Troubleshooting​

Cluster Health Issue Codes​

RBAC / Pod Identity Debugging​

ServiceAccount → IAM Role Mapping Failure​

Mixing aws-auth ConfigMap with EKS Access Entries​

Permission Validation via kubectl auth can-i​

Diagnosing Missing Pod Identity Association​

Related Documents​

Control Plane Log Types

Enabling Logs

CloudWatch Logs Insights Queries

API Server Error (400+) Analysis

Authentication Failure Tracking

Detecting Changes to aws-auth ConfigMap

API Throttling Detection

Unauthorized Access Attempts (Security Events)

AuthN/AuthZ Debugging

IAM Authentication Check

aws-auth ConfigMap (CONFIG_MAP Mode)

EKS Access Entries (API / API_AND_CONFIG_MAP Mode)

IRSA (IAM Roles for Service Accounts) Debugging Checklist

Service Account Token Expiration (HTTP 401 Unauthorized)

EKS Pod Identity Debugging

EKS Add-on Troubleshooting

Cluster Health Issue Codes

RBAC / Pod Identity Debugging

ServiceAccount → IAM Role Mapping Failure

Mixing aws-auth ConfigMap with EKS Access Entries

Permission Validation via kubectl auth can-i

Diagnosing Missing Pod Identity Association

Related Documents