The Complete Guide to Amazon EKS: Kubernetes on AWS with Advanced Container Orchestration#

Amazon Elastic Kubernetes Service (EKS) is AWS’s managed Kubernetes service that simplifies running Kubernetes clusters without needing to manage the Kubernetes control plane. This guide covers everything from cluster setup to advanced deployment patterns and operational best practices.

Introduction to EKS {#introduction}#

Amazon EKS is a fully managed Kubernetes service that provides a secure, reliable, and scalable way to run Kubernetes on AWS. It automatically manages the availability and scalability of the Kubernetes control plane nodes.

Key Benefits:#

Managed Control Plane: AWS manages Kubernetes masters
High Availability: Multi-AZ control plane deployment
Security: Integration with AWS IAM and VPC
Scalability: Automatic scaling and patching
AWS Integration: Native integration with AWS services

EKS Architecture {#architecture}#

Understanding EKS Components#

1
import boto3
2
import json
3
from datetime import datetime
4

5
# Initialize EKS client
6
eks = boto3.client('eks')
7
ec2 = boto3.client('ec2')
8

9
def eks_architecture_overview():
10
    """
11
    Overview of EKS architecture components
12
    """
13
    architecture = {
14
        "control_plane": {
15
            "description": "Managed by AWS",
16
            "components": [
17
                "API Server",
18
                "etcd",
19
                "Controller Manager",
20
                "Scheduler"
21
            ],
22
            "features": [
23
                "Multi-AZ deployment",
24
                "Automatic patching",
25
                "Built-in monitoring",
26
                "99.95% SLA"
27
            ]
28
        },
29
        "data_plane": {
30
            "description": "Customer managed worker nodes",
31
            "options": [
32
                "EC2 Self-managed nodes",
33
                "EKS Managed node groups",
34
                "AWS Fargate"
35
            ],
36
            "networking": [
37
                "VPC integration",
38
                "Subnet placement",
39
                "Security groups",
40
                "Load balancers"
41
            ]
42
        },
43
        "add_ons": {
44
            "core": [
45
                "kube-proxy",
46
                "CoreDNS",
47
                "Amazon VPC CNI"
48
            ],
49
            "optional": [
50
                "AWS Load Balancer Controller",
51
                "Amazon EBS CSI Driver",
52
                "Amazon EFS CSI Driver",
53
                "Cluster Autoscaler"
54
            ]
55
        }
56
    }
57

58
    return architecture
59

60
print("EKS Architecture Overview:")
61
print(json.dumps(eks_architecture_overview(), indent=2))

Cluster Setup and Configuration {#cluster-setup}#

Creating an EKS Cluster with Python#

1
import boto3
2
import time
3
import yaml
4

5
class EKSClusterManager:
6
    def __init__(self):
7
        self.eks = boto3.client('eks')
8
        self.ec2 = boto3.client('ec2')
9
        self.iam = boto3.client('iam')
10

11
    def create_cluster_role(self, role_name):
12
        """
13
        Create IAM role for EKS cluster
14
        """
15
        trust_policy = {
16
            "Version": "2012-10-17",
17
            "Statement": [
18
                {
19
                    "Effect": "Allow",
20
                    "Principal": {
21
                        "Service": "eks.amazonaws.com"
22
                    },
23
                    "Action": "sts:AssumeRole"
24
                }
25
            ]
26
        }
27

28
        try:
29
            response = self.iam.create_role(
30
                RoleName=role_name,
31
                AssumeRolePolicyDocument=json.dumps(trust_policy),
32
                Description='EKS Cluster Service Role'
33
            )
34

35
            # Attach required policies
36
            policies = [
37
                'arn:aws:iam::aws:policy/AmazonEKSClusterPolicy'
38
            ]
39

40
            for policy_arn in policies:
41
                self.iam.attach_role_policy(
42
                    RoleName=role_name,
43
                    PolicyArn=policy_arn
44
                )
45

46
            role_arn = response['Role']['Arn']
47
            print(f"EKS cluster role created: {role_arn}")
48
            return role_arn
49

50
        except Exception as e:
51
            print(f"Error creating cluster role: {e}")
52
            return None
53

54
    def create_node_group_role(self, role_name):
55
        """
56
        Create IAM role for EKS node group
57
        """
58
        trust_policy = {
59
            "Version": "2012-10-17",
60
            "Statement": [
61
                {
62
                    "Effect": "Allow",
63
                    "Principal": {
64
                        "Service": "ec2.amazonaws.com"
65
                    },
66
                    "Action": "sts:AssumeRole"
67
                }
68
            ]
69
        }
70

71
        try:
72
            response = self.iam.create_role(
73
                RoleName=role_name,
74
                AssumeRolePolicyDocument=json.dumps(trust_policy),
75
                Description='EKS Node Group Service Role'
76
            )
77

78
            # Attach required policies
79
            policies = [
80
                'arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy',
81
                'arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy',
82
                'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
83
            ]
84

85
            for policy_arn in policies:
86
                self.iam.attach_role_policy(
87
                    RoleName=role_name,
88
                    PolicyArn=policy_arn
89
                )
90

91
            role_arn = response['Role']['Arn']
92
            print(f"EKS node group role created: {role_arn}")
93
            return role_arn
94

95
        except Exception as e:
96
            print(f"Error creating node group role: {e}")
97
            return None
98

99
    def create_cluster(self, cluster_name, cluster_role_arn, subnet_ids, security_group_ids=None):
100
        """
101
        Create EKS cluster
102
        """
103
        try:
104
            cluster_config = {
105
                'name': cluster_name,
106
                'version': '1.28',
107
                'roleArn': cluster_role_arn,
108
                'resourcesVpcConfig': {
109
                    'subnetIds': subnet_ids
110
                },
111
                'logging': {
112
                    'enable': [
113
                        {
114
                            'types': ['api', 'audit', 'authenticator', 'controllerManager', 'scheduler']
115
                        }
116
                    ]
117
                }
118
            }
119

120
            if security_group_ids:
121
                cluster_config['resourcesVpcConfig']['securityGroupIds'] = security_group_ids
122

123
            response = self.eks.create_cluster(**cluster_config)
124

125
            cluster_arn = response['cluster']['arn']
126
            print(f"EKS cluster creation initiated: {cluster_arn}")
127

128
            # Wait for cluster to be active
129
            self.wait_for_cluster_active(cluster_name)
130

131
            return cluster_arn
132

133
        except Exception as e:
134
            print(f"Error creating cluster: {e}")
135
            return None
136

137
    def wait_for_cluster_active(self, cluster_name, timeout=1800):
138
        """
139
        Wait for cluster to become active
140
        """
141
        print(f"Waiting for cluster {cluster_name} to become active...")
142
        start_time = time.time()
143

144
        while time.time() - start_time < timeout:
145
            try:
146
                response = self.eks.describe_cluster(name=cluster_name)
147
                status = response['cluster']['status']
148

149
                if status == 'ACTIVE':
150
                    print(f"Cluster {cluster_name} is now active!")
151
                    return True
152
                elif status == 'FAILED':
153
                    print(f"Cluster {cluster_name} creation failed!")
154
                    return False
155
                else:
156
                    print(f"Cluster status: {status}")
157
                    time.sleep(30)
158

159
            except Exception as e:
160
                print(f"Error checking cluster status: {e}")
161
                time.sleep(30)
162

163
        print(f"Timeout waiting for cluster {cluster_name} to become active")
164
        return False
165

166
    def create_managed_node_group(self, cluster_name, node_group_name, node_role_arn, subnet_ids, instance_types=['t3.medium']):
167
        """
168
        Create EKS managed node group
169
        """
170
        try:
171
            response = self.eks.create_nodegroup(
172
                clusterName=cluster_name,
173
                nodegroupName=node_group_name,
174
                scalingConfig={
175
                    'minSize': 1,
176
                    'maxSize': 10,
177
                    'desiredSize': 3
178
                },
179
                diskSize=20,
180
                instanceTypes=instance_types,
181
                amiType='AL2_x86_64',
182
                nodeRole=node_role_arn,
183
                subnets=subnet_ids,
184
                remoteAccess={
185
                    'ec2SshKey': 'my-key-pair'  # Replace with your key pair
186
                },
187
                tags={
188
                    'Environment': 'production',
189
                    'ManagedBy': 'EKS'
190
                }
191
            )
192

193
            node_group_arn = response['nodegroup']['nodegroupArn']
194
            print(f"Node group creation initiated: {node_group_arn}")
195

196
            return node_group_arn
197

198
        except Exception as e:
199
            print(f"Error creating node group: {e}")
200
            return None
201

202
# Usage example
203
eks_manager = EKSClusterManager()
204

205
# Create IAM roles
206
cluster_role_arn = eks_manager.create_cluster_role('EKSClusterRole')
207
node_role_arn = eks_manager.create_node_group_role('EKSNodeGroupRole')
208

209
# Create cluster (replace with actual subnet IDs)
210
subnet_ids = ['subnet-12345678', 'subnet-87654321']
211
if cluster_role_arn:
212
    cluster_arn = eks_manager.create_cluster('my-eks-cluster', cluster_role_arn, subnet_ids)
213

214
    # Create managed node group
215
    if cluster_arn and node_role_arn:
216
        node_group_arn = eks_manager.create_managed_node_group(
217
            'my-eks-cluster',
218
            'my-node-group',
219
            node_role_arn,
220
            subnet_ids,
221
            ['t3.medium', 't3.large']
222
        )

Cluster Configuration with eksctl#

1
# Create cluster configuration file
2
cat > cluster-config.yaml << EOF
3
apiVersion: eksctl.io/v1alpha5
4
kind: ClusterConfig
5

6
metadata:
7
  name: production-cluster
8
  region: us-east-1
9
  version: "1.28"
10

11
# VPC Configuration
12
vpc:
13
  id: vpc-12345678  # Replace with your VPC ID
14
  subnets:
15
    private:
16
      us-east-1a: { id: subnet-12345678 }
17
      us-east-1b: { id: subnet-87654321 }
18
    public:
19
      us-east-1a: { id: subnet-abcdef12 }
20
      us-east-1b: { id: subnet-fedcba21 }
21

22
# Node Groups
23
managedNodeGroups:
24
  - name: worker-nodes
25
    instanceType: t3.medium
26
    minSize: 2
27
    maxSize: 10
28
    desiredCapacity: 3
29
    privateNetworking: true
30
    ssh:
31
      allow: true
32
      publicKeyName: my-key-pair
33
    labels:
34
      role: worker
35
    tags:
36
      Environment: production
37
      NodeGroup: worker-nodes
38
    iam:
39
      withAddonPolicies:
40
        imageBuilder: true
41
        autoScaler: true
42
        externalDNS: true
43
        certManager: true
44
        appMesh: true
45
        appMeshPreview: true
46
        ebs: true
47
        fsx: true
48
        efs: true
49
        albIngress: true
50
        xRay: true
51
        cloudWatch: true
52

53
# Add-ons
54
addons:
55
  - name: vpc-cni
56
    version: latest
57
  - name: coredns
58
    version: latest
59
  - name: kube-proxy
60
    version: latest
61
  - name: aws-ebs-csi-driver
62
    version: latest
63

64
# CloudWatch Logging
65
cloudWatch:
66
  clusterLogging:
67
    enableTypes:
68
      - api
69
      - audit
70
      - authenticator
71
      - controllerManager
72
      - scheduler
73
    logRetentionInDays: 30
74
EOF
75

76
# Create cluster
77
eksctl create cluster -f cluster-config.yaml
78

79
# Update kubeconfig
80
aws eks update-kubeconfig --region us-east-1 --name production-cluster

Node Groups and Fargate {#node-management}#

Managing Different Node Types#

1
class EKSNodeManager:
2
    def __init__(self):
3
        self.eks = boto3.client('eks')
4
        self.ec2 = boto3.client('ec2')
5

6
    def create_fargate_profile(self, cluster_name, profile_name, execution_role_arn, subnet_ids):
7
        """
8
        Create Fargate profile for serverless pods
9
        """
10
        try:
11
            response = self.eks.create_fargate_profile(
12
                fargateProfileName=profile_name,
13
                clusterName=cluster_name,
14
                podExecutionRoleArn=execution_role_arn,
15
                subnets=subnet_ids,
16
                selectors=[
17
                    {
18
                        'namespace': 'fargate-namespace',
19
                        'labels': {
20
                            'compute-type': 'fargate'
21
                        }
22
                    },
23
                    {
24
                        'namespace': 'kube-system',
25
                        'labels': {
26
                            'k8s-app': 'coredns'
27
                        }
28
                    }
29
                ],
30
                tags={
31
                    'Environment': 'production',
32
                    'ComputeType': 'fargate'
33
                }
34
            )
35

36
            profile_arn = response['fargateProfile']['fargateProfileArn']
37
            print(f"Fargate profile created: {profile_arn}")
38
            return profile_arn
39

40
        except Exception as e:
41
            print(f"Error creating Fargate profile: {e}")
42
            return None
43

44
    def create_self_managed_nodes(self, cluster_name, node_group_name, instance_type='t3.medium'):
45
        """
46
        Create self-managed node group using launch template
47
        """
48
        # User data script for EKS worker nodes
49
        user_data_script = f"""#!/bin/bash
50
/etc/eks/bootstrap.sh {cluster_name}
51
yum update -y
52
yum install -y amazon-cloudwatch-agent
53
"""
54

55
        try:
56
            # Create launch template
57
            response = self.ec2.create_launch_template(
58
                LaunchTemplateName=f"{node_group_name}-template",
59
                LaunchTemplateData={
60
                    'ImageId': 'ami-0c02fb55956c7d316',  # Replace with latest EKS-optimized AMI
61
                    'InstanceType': instance_type,
62
                    'KeyName': 'my-key-pair',
63
                    'SecurityGroupIds': ['sg-12345678'],  # Replace with appropriate security group
64
                    'UserData': user_data_script.encode('utf-8').decode('ascii'),
65
                    'IamInstanceProfile': {
66
                        'Name': 'EKSNodeInstanceProfile'  # Replace with your instance profile
67
                    },
68
                    'TagSpecifications': [
69
                        {
70
                            'ResourceType': 'instance',
71
                            'Tags': [
72
                                {'Key': 'Name', 'Value': f'{node_group_name}-node'},
73
                                {'Key': 'kubernetes.io/cluster/{cluster_name}', 'Value': 'owned'},
74
                                {'Key': 'k8s.io/cluster-autoscaler/{cluster_name}', 'Value': 'owned'},
75
                                {'Key': 'k8s.io/cluster-autoscaler/enabled', 'Value': 'true'}
76
                            ]
77
                        }
78
                    ]
79
                }
80
            )
81

82
            template_id = response['LaunchTemplate']['LaunchTemplateId']
83
            print(f"Launch template created: {template_id}")
84

85
            # Create Auto Scaling Group
86
            autoscaling = boto3.client('autoscaling')
87
            autoscaling.create_auto_scaling_group(
88
                AutoScalingGroupName=f"{node_group_name}-asg",
89
                LaunchTemplate={
90
                    'LaunchTemplateId': template_id,
91
                    'Version': '$Latest'
92
                },
93
                MinSize=1,
94
                MaxSize=10,
95
                DesiredCapacity=3,
96
                VPCZoneIdentifier='subnet-12345678,subnet-87654321',  # Replace with your subnets
97
                Tags=[
98
                    {
99
                        'Key': 'Name',
100
                        'Value': f'{node_group_name}-asg',
101
                        'PropagateAtLaunch': True,
102
                        'ResourceId': f"{node_group_name}-asg",
103
                        'ResourceType': 'auto-scaling-group'
104
                    }
105
                ]
106
            )
107

108
            print(f"Auto Scaling Group created: {node_group_name}-asg")
109
            return template_id
110

111
        except Exception as e:
112
            print(f"Error creating self-managed nodes: {e}")
113
            return None
114

115
    def configure_cluster_autoscaler(self, cluster_name):
116
        """
117
        Deploy cluster autoscaler configuration
118
        """
119
        cluster_autoscaler_yaml = f"""
120
apiVersion: apps/v1
121
kind: Deployment
122
metadata:
123
  name: cluster-autoscaler
124
  namespace: kube-system
125
  labels:
126
    app: cluster-autoscaler
127
spec:
128
  replicas: 1
129
  selector:
130
    matchLabels:
131
      app: cluster-autoscaler
132
  template:
133
    metadata:
134
      labels:
135
        app: cluster-autoscaler
136
    spec:
137
      serviceAccountName: cluster-autoscaler
138
      containers:
139
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
140
        name: cluster-autoscaler
141
        resources:
142
          limits:
143
            cpu: 100m
144
            memory: 300Mi
145
          requests:
146
            cpu: 100m
147
            memory: 300Mi
148
        command:
149
        - ./cluster-autoscaler
150
        - --v=4
151
        - --stderrthreshold=info
152
        - --cloud-provider=aws
153
        - --skip-nodes-with-local-storage=false
154
        - --expander=least-waste
155
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{cluster_name}
156
        - --balance-similar-node-groups
157
        - --skip-nodes-with-system-pods=false
158
        env:
159
        - name: AWS_REGION
160
          value: us-east-1
161
        volumeMounts:
162
        - name: ssl-certs
163
          mountPath: /etc/ssl/certs/ca-certificates.crt
164
          readOnly: true
165
      volumes:
166
      - name: ssl-certs
167
        hostPath:
168
          path: /etc/ssl/certs/ca-bundle.crt
169
---
170
apiVersion: v1
171
kind: ServiceAccount
172
metadata:
173
  labels:
174
    k8s-addon: cluster-autoscaler.addons.k8s.io
175
    k8s-app: cluster-autoscaler
176
  name: cluster-autoscaler
177
  namespace: kube-system
178
  annotations:
179
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/cluster-autoscaler-role
180
"""
181
        return cluster_autoscaler_yaml
182

183
    def get_node_group_info(self, cluster_name):
184
        """
185
        Get information about all node groups
186
        """
187
        try:
188
            response = self.eks.list_nodegroups(clusterName=cluster_name)
189
            node_groups = []
190

191
            for ng_name in response['nodegroups']:
192
                ng_detail = self.eks.describe_nodegroup(
193
                    clusterName=cluster_name,
194
                    nodegroupName=ng_name
195
                )
196

197
                node_group_info = {
198
                    'name': ng_name,
199
                    'status': ng_detail['nodegroup']['status'],
200
                    'instance_types': ng_detail['nodegroup']['instanceTypes'],
201
                    'capacity': ng_detail['nodegroup']['scalingConfig'],
202
                    'ami_type': ng_detail['nodegroup']['amiType'],
203
                    'node_role': ng_detail['nodegroup']['nodeRole'],
204
                    'subnets': ng_detail['nodegroup']['subnets']
205
                }
206
                node_groups.append(node_group_info)
207

208
            return node_groups
209

210
        except Exception as e:
211
            print(f"Error getting node group info: {e}")
212
            return []
213

214
# Node management examples
215
node_manager = EKSNodeManager()
216

217
# Get node group information
218
node_groups = node_manager.get_node_group_info('production-cluster')
219
print("Node Groups:")
220
for ng in node_groups:
221
    print(f"  Name: {ng['name']}")
222
    print(f"  Status: {ng['status']}")
223
    print(f"  Instance Types: {ng['instance_types']}")
224
    print(f"  Capacity: {ng['capacity']}")
225
    print()
226

227
# Create Fargate profile
228
# fargate_profile = node_manager.create_fargate_profile(
229
#     'production-cluster',
230
#     'fargate-profile',
231
#     'arn:aws:iam::123456789012:role/EKSFargateRole',
232
#     ['subnet-12345678', 'subnet-87654321']
233
# )
234

235
# Deploy cluster autoscaler
236
autoscaler_yaml = node_manager.configure_cluster_autoscaler('production-cluster')
237
print("Cluster Autoscaler YAML:")
238
print(autoscaler_yaml)

Application Deployment {#application-deployment}#

Kubernetes Deployment Examples#

1
import yaml
2
from kubernetes import client, config
3

4
class KubernetesDeploymentManager:
5
    def __init__(self):
6
        # Load kube config
7
        try:
8
            config.load_kube_config()
9
            self.v1 = client.CoreV1Api()
10
            self.apps_v1 = client.AppsV1Api()
11
            self.networking_v1 = client.NetworkingV1Api()
12
            print("Kubernetes client initialized successfully")
13
        except Exception as e:
14
            print(f"Error initializing Kubernetes client: {e}")
15

16
    def create_namespace(self, namespace_name):
17
        """
18
        Create a Kubernetes namespace
19
        """
20
        namespace = client.V1Namespace(
21
            metadata=client.V1ObjectMeta(
22
                name=namespace_name,
23
                labels={
24
                    'name': namespace_name,
25
                    'managed-by': 'python-client'
26
                }
27
            )
28
        )
29

30
        try:
31
            response = self.v1.create_namespace(namespace)
32
            print(f"Namespace '{namespace_name}' created successfully")
33
            return response
34
        except Exception as e:
35
            print(f"Error creating namespace: {e}")
36

37
    def create_deployment(self, name, namespace, image, replicas=3, port=80):
38
        """
39
        Create a Kubernetes deployment
40
        """
41
        # Define deployment
42
        deployment = client.V1Deployment(
43
            metadata=client.V1ObjectMeta(
44
                name=name,
45
                namespace=namespace
46
            ),
47
            spec=client.V1DeploymentSpec(
48
                replicas=replicas,
49
                selector=client.V1LabelSelector(
50
                    match_labels={'app': name}
51
                ),
52
                template=client.V1PodTemplateSpec(
53
                    metadata=client.V1ObjectMeta(
54
                        labels={'app': name}
55
                    ),
56
                    spec=client.V1PodSpec(
57
                        containers=[
58
                            client.V1Container(
59
                                name=name,
60
                                image=image,
61
                                ports=[client.V1ContainerPort(container_port=port)],
62
                                resources=client.V1ResourceRequirements(
63
                                    requests={'cpu': '100m', 'memory': '128Mi'},
64
                                    limits={'cpu': '500m', 'memory': '512Mi'}
65
                                ),
66
                                liveness_probe=client.V1Probe(
67
                                    http_get=client.V1HTTPGetAction(
68
                                        path='/health',
69
                                        port=port
70
                                    ),
71
                                    initial_delay_seconds=30,
72
                                    period_seconds=10
73
                                ),
74
                                readiness_probe=client.V1Probe(
75
                                    http_get=client.V1HTTPGetAction(
76
                                        path='/ready',
77
                                        port=port
78
                                    ),
79
                                    initial_delay_seconds=5,
80
                                    period_seconds=5
81
                                )
82
                            )
83
                        ]
84
                    )
85
                )
86
            )
87
        )
88

89
        try:
90
            response = self.apps_v1.create_namespaced_deployment(
91
                namespace=namespace,
92
                body=deployment
93
            )
94
            print(f"Deployment '{name}' created in namespace '{namespace}'")
95
            return response
96
        except Exception as e:
97
            print(f"Error creating deployment: {e}")
98

99
    def create_service(self, name, namespace, port=80, target_port=80, service_type='ClusterIP'):
100
        """
101
        Create a Kubernetes service
102
        """
103
        service = client.V1Service(
104
            metadata=client.V1ObjectMeta(
105
                name=name,
106
                namespace=namespace
107
            ),
108
            spec=client.V1ServiceSpec(
109
                selector={'app': name},
110
                ports=[client.V1ServicePort(
111
                    port=port,
112
                    target_port=target_port,
113
                    protocol='TCP'
114
                )],
115
                type=service_type
116
            )
117
        )
118

119
        try:
120
            response = self.v1.create_namespaced_service(
121
                namespace=namespace,
122
                body=service
123
            )
124
            print(f"Service '{name}' created in namespace '{namespace}'")
125
            return response
126
        except Exception as e:
127
            print(f"Error creating service: {e}")
128

129
    def create_ingress(self, name, namespace, host, service_name, service_port=80):
130
        """
131
        Create a Kubernetes ingress
132
        """
133
        ingress = client.V1Ingress(
134
            metadata=client.V1ObjectMeta(
135
                name=name,
136
                namespace=namespace,
137
                annotations={
138
                    'kubernetes.io/ingress.class': 'alb',
139
                    'alb.ingress.kubernetes.io/scheme': 'internet-facing',
140
                    'alb.ingress.kubernetes.io/target-type': 'ip',
141
                    'alb.ingress.kubernetes.io/certificate-arn': 'arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012'
142
                }
143
            ),
144
            spec=client.V1IngressSpec(
145
                rules=[
146
                    client.V1IngressRule(
147
                        host=host,
148
                        http=client.V1HTTPIngressRuleValue(
149
                            paths=[
150
                                client.V1HTTPIngressPath(
151
                                    path='/',
152
                                    path_type='Prefix',
153
                                    backend=client.V1IngressBackend(
154
                                        service=client.V1IngressServiceBackend(
155
                                            name=service_name,
156
                                            port=client.V1ServiceBackendPort(number=service_port)
157
                                        )
158
                                    )
159
                                )
160
                            ]
161
                        )
162
                    )
163
                ]
164
            )
165
        )
166

167
        try:
168
            response = self.networking_v1.create_namespaced_ingress(
169
                namespace=namespace,
170
                body=ingress
171
            )
172
            print(f"Ingress '{name}' created in namespace '{namespace}'")
173
            return response
174
        except Exception as e:
175
            print(f"Error creating ingress: {e}")
176

177
    def create_horizontal_pod_autoscaler(self, name, namespace, deployment_name, min_replicas=2, max_replicas=10, cpu_target=70):
178
        """
179
        Create Horizontal Pod Autoscaler
180
        """
181
        hpa_yaml = f"""
182
apiVersion: autoscaling/v2
183
kind: HorizontalPodAutoscaler
184
metadata:
185
  name: {name}
186
  namespace: {namespace}
187
spec:
188
  scaleTargetRef:
189
    apiVersion: apps/v1
190
    kind: Deployment
191
    name: {deployment_name}
192
  minReplicas: {min_replicas}
193
  maxReplicas: {max_replicas}
194
  metrics:
195
  - type: Resource
196
    resource:
197
      name: cpu
198
      target:
199
        type: Utilization
200
        averageUtilization: {cpu_target}
201
  - type: Resource
202
    resource:
203
      name: memory
204
      target:
205
        type: Utilization
206
        averageUtilization: 80
207
"""
208
        return hpa_yaml
209

210
# Deployment examples
211
k8s_manager = KubernetesDeploymentManager()
212

213
# Create application namespace
214
k8s_manager.create_namespace('my-app')
215

216
# Deploy web application
217
k8s_manager.create_deployment(
218
    name='web-app',
219
    namespace='my-app',
220
    image='nginx:1.21',
221
    replicas=3,
222
    port=80
223
)
224

225
# Create service
226
k8s_manager.create_service(
227
    name='web-app',
228
    namespace='my-app',
229
    port=80,
230
    target_port=80,
231
    service_type='ClusterIP'
232
)
233

234
# Create ingress
235
k8s_manager.create_ingress(
236
    name='web-app-ingress',
237
    namespace='my-app',
238
    host='myapp.example.com',
239
    service_name='web-app',
240
    service_port=80
241
)
242

243
# Generate HPA YAML
244
hpa_yaml = k8s_manager.create_horizontal_pod_autoscaler(
245
    'web-app-hpa',
246
    'my-app',
247
    'web-app'
248
)
249

250
print("Horizontal Pod Autoscaler YAML:")
251
print(hpa_yaml)

Helm Chart Deployment#

1
import subprocess
2
import yaml
3

4
class HelmManager:
5
    def __init__(self):
6
        self.helm_binary = 'helm'
7

8
    def add_repository(self, repo_name, repo_url):
9
        """
10
        Add Helm repository
11
        """
12
        try:
13
            result = subprocess.run([
14
                self.helm_binary, 'repo', 'add', repo_name, repo_url
15
            ], capture_output=True, text=True, check=True)
16

17
            print(f"Repository '{repo_name}' added successfully")
18
            return True
19
        except subprocess.CalledProcessError as e:
20
            print(f"Error adding repository: {e.stderr}")
21
            return False
22

23
    def update_repositories(self):
24
        """
25
        Update Helm repositories
26
        """
27
        try:
28
            result = subprocess.run([
29
                self.helm_binary, 'repo', 'update'
30
            ], capture_output=True, text=True, check=True)
31

32
            print("Repositories updated successfully")
33
            return True
34
        except subprocess.CalledProcessError as e:
35
            print(f"Error updating repositories: {e.stderr}")
36
            return False
37

38
    def install_chart(self, release_name, chart_name, namespace='default', values_file=None, set_values=None):
39
        """
40
        Install Helm chart
41
        """
42
        cmd = [self.helm_binary, 'install', release_name, chart_name, '--namespace', namespace, '--create-namespace']
43

44
        if values_file:
45
            cmd.extend(['--values', values_file])
46

47
        if set_values:
48
            for key, value in set_values.items():
49
                cmd.extend(['--set', f'{key}={value}'])
50

51
        try:
52
            result = subprocess.run(cmd, capture_output=True, text=True, check=True)
53
            print(f"Chart '{chart_name}' installed as release '{release_name}'")
54
            return True
55
        except subprocess.CalledProcessError as e:
56
            print(f"Error installing chart: {e.stderr}")
57
            return False
58

59
    def create_aws_load_balancer_controller_values(self):
60
        """
61
        Create values file for AWS Load Balancer Controller
62
        """
63
        values = {
64
            'clusterName': 'production-cluster',
65
            'serviceAccount': {
66
                'create': True,
67
                'annotations': {
68
                    'eks.amazonaws.com/role-arn': 'arn:aws:iam::123456789012:role/aws-load-balancer-controller-role'
69
                }
70
            },
71
            'region': 'us-east-1',
72
            'vpcId': 'vpc-12345678'
73
        }
74

75
        with open('aws-load-balancer-controller-values.yaml', 'w') as f:
76
            yaml.dump(values, f, default_flow_style=False)
77

78
        return 'aws-load-balancer-controller-values.yaml'
79

80
# Helm deployment examples
81
helm_manager = HelmManager()
82

83
# Add EKS chart repository
84
helm_manager.add_repository('eks', 'https://aws.github.io/eks-charts')
85

86
# Add Kubernetes dashboard
87
helm_manager.add_repository('kubernetes-dashboard', 'https://kubernetes.github.io/dashboard/')
88

89
# Update repositories
90
helm_manager.update_repositories()
91

92
# Install AWS Load Balancer Controller
93
values_file = helm_manager.create_aws_load_balancer_controller_values()
94
helm_manager.install_chart(
95
    'aws-load-balancer-controller',
96
    'eks/aws-load-balancer-controller',
97
    'kube-system',
98
    values_file=values_file
99
)
100

101
# Install Kubernetes Dashboard
102
helm_manager.install_chart(
103
    'kubernetes-dashboard',
104
    'kubernetes-dashboard/kubernetes-dashboard',
105
    'kubernetes-dashboard'
106
)
107

108
# Install Prometheus and Grafana for monitoring
109
helm_manager.add_repository('prometheus-community', 'https://prometheus-community.github.io/helm-charts')
110
helm_manager.update_repositories()
111

112
# Install kube-prometheus-stack
113
helm_manager.install_chart(
114
    'prometheus',
115
    'prometheus-community/kube-prometheus-stack',
116
    'monitoring',
117
    set_values={
118
        'grafana.adminPassword': 'admin123',
119
        'prometheus.prometheusSpec.retention': '30d'
120
    }
121
)

Monitoring and Logging {#monitoring-logging}#

Container Insights and CloudWatch Integration#

1
class EKSMonitoringManager:
2
    def __init__(self):
3
        self.cloudwatch = boto3.client('cloudwatch')
4
        self.logs = boto3.client('logs')
5

6
    def setup_container_insights(self, cluster_name, region='us-east-1'):
7
        """
8
        Set up Container Insights for EKS cluster
9
        """
10
        # CloudWatch Agent ConfigMap
11
        cloudwatch_config = f"""
12
apiVersion: v1
13
kind: ConfigMap
14
metadata:
15
  name: cwagentconfig
16
  namespace: amazon-cloudwatch
17
data:
18
  cwagentconfig.json: |
19
    {{
20
      "logs": {{
21
        "metrics_collected": {{
22
          "kubernetes": {{
23
            "cluster_name": "{cluster_name}",
24
            "metrics_collection_interval": 60
25
          }}
26
        }},
27
        "force_flush_interval": 5
28
      }}
29
    }}
30
---
31
apiVersion: v1
32
kind: ServiceAccount
33
metadata:
34
  name: cloudwatch-agent
35
  namespace: amazon-cloudwatch
36
  annotations:
37
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/CloudWatchAgentServerRole
38
---
39
apiVersion: apps/v1
40
kind: DaemonSet
41
metadata:
42
  name: cloudwatch-agent
43
  namespace: amazon-cloudwatch
44
spec:
45
  selector:
46
    matchLabels:
47
      name: cloudwatch-agent
48
  template:
49
    metadata:
50
      labels:
51
        name: cloudwatch-agent
52
    spec:
53
      containers:
54
      - name: cloudwatch-agent
55
        image: amazon/cloudwatch-agent:1.247348.0b251780
56
        ports:
57
        - containerPort: 8125
58
          hostPort: 8125
59
          protocol: UDP
60
        resources:
61
          limits:
62
            cpu: 200m
63
            memory: 200Mi
64
          requests:
65
            cpu: 200m
66
            memory: 200Mi
67
        env:
68
        - name: HOST_IP
69
          valueFrom:
70
            fieldRef:
71
              fieldPath: status.hostIP
72
        - name: HOST_NAME
73
          valueFrom:
74
            fieldRef:
75
              fieldPath: spec.nodeName
76
        - name: K8S_NAMESPACE
77
          valueFrom:
78
            fieldRef:
79
              fieldPath: metadata.namespace
80
        volumeMounts:
81
        - name: cwagentconfig
82
          mountPath: /etc/cwagentconfig
83
        - name: rootfs
84
          mountPath: /rootfs
85
          readOnly: true
86
        - name: dockersock
87
          mountPath: /var/run/docker.sock
88
          readOnly: true
89
        - name: varlibdocker
90
          mountPath: /var/lib/docker
91
          readOnly: true
92
        - name: sys
93
          mountPath: /sys
94
          readOnly: true
95
        - name: devdisk
96
          mountPath: /dev/disk
97
          readOnly: true
98
      volumes:
99
      - name: cwagentconfig
100
        configMap:
101
          name: cwagentconfig
102
      - name: rootfs
103
        hostPath:
104
          path: /
105
      - name: dockersock
106
        hostPath:
107
          path: /var/run/docker.sock
108
      - name: varlibdocker
109
        hostPath:
110
          path: /var/lib/docker
111
      - name: sys
112
        hostPath:
113
          path: /sys
114
      - name: devdisk
115
        hostPath:
116
          path: /dev/disk
117
      terminationGracePeriodSeconds: 60
118
      serviceAccountName: cloudwatch-agent
119
"""
120
        return cloudwatch_config
121

122
    def create_custom_metrics_dashboard(self, cluster_name):
123
        """
124
        Create CloudWatch dashboard for EKS metrics
125
        """
126
        dashboard_body = {
127
            "widgets": [
128
                {
129
                    "type": "metric",
130
                    "x": 0,
131
                    "y": 0,
132
                    "width": 12,
133
                    "height": 6,
134
                    "properties": {
135
                        "metrics": [
136
                            ["AWS/ContainerInsights", "cluster_node_count", "ClusterName", cluster_name],
137
                            [".", "cluster_failed_node_count", ".", "."],
138
                        ],
139
                        "view": "timeSeries",
140
                        "stacked": False,
141
                        "region": "us-east-1",
142
                        "title": "Cluster Node Status",
143
                        "period": 300
144
                    }
145
                },
146
                {
147
                    "type": "metric",
148
                    "x": 12,
149
                    "y": 0,
150
                    "width": 12,
151
                    "height": 6,
152
                    "properties": {
153
                        "metrics": [
154
                            ["AWS/ContainerInsights", "pod_cpu_utilization", "ClusterName", cluster_name],
155
                            [".", "pod_memory_utilization", ".", "."],
156
                        ],
157
                        "view": "timeSeries",
158
                        "stacked": False,
159
                        "region": "us-east-1",
160
                        "title": "Pod Resource Utilization",
161
                        "period": 300
162
                    }
163
                },
164
                {
165
                    "type": "metric",
166
                    "x": 0,
167
                    "y": 6,
168
                    "width": 24,
169
                    "height": 6,
170
                    "properties": {
171
                        "metrics": [
172
                            ["AWS/ContainerInsights", "service_number_of_running_pods", "ClusterName", cluster_name]
173
                        ],
174
                        "view": "timeSeries",
175
                        "stacked": False,
176
                        "region": "us-east-1",
177
                        "title": "Running Pods",
178
                        "period": 300
179
                    }
180
                }
181
            ]
182
        }
183

184
        try:
185
            response = self.cloudwatch.put_dashboard(
186
                DashboardName=f'EKS-{cluster_name}-Overview',
187
                DashboardBody=json.dumps(dashboard_body)
188
            )
189
            print(f"Dashboard created for cluster {cluster_name}")
190
            return response
191
        except Exception as e:
192
            print(f"Error creating dashboard: {e}")
193

194
    def setup_log_aggregation(self, cluster_name):
195
        """
196
        Set up Fluent Bit for log aggregation
197
        """
198
        fluent_bit_config = f"""
199
apiVersion: v1
200
kind: ConfigMap
201
metadata:
202
  name: fluent-bit-config
203
  namespace: amazon-cloudwatch
204
  labels:
205
    k8s-app: fluent-bit
206
data:
207
  fluent-bit.conf: |
208
    [SERVICE]
209
        Flush                     5
210
        Log_Level                 info
211
        Daemon                    off
212
        Parsers_File              parsers.conf
213
        HTTP_Server               On
214
        HTTP_Listen               0.0.0.0
215
        HTTP_Port                 2020
216
        storage.path              /var/fluent-bit/state/flb-storage/
217
        storage.sync              normal
218
        storage.checksum          off
219
        storage.backlog.mem_limit 5M
220

221
    @INCLUDE application-log.conf
222
    @INCLUDE dataplane-log.conf
223
    @INCLUDE host-log.conf
224

225
  application-log.conf: |
226
    [INPUT]
227
        Name                tail
228
        Tag                 application.*
229
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*
230
        Path                /var/log/containers/*.log
231
        multiline.parser    docker, cri
232
        DB                  /var/fluent-bit/state/flb_container.db
233
        Mem_Buf_Limit       50MB
234
        Skip_Long_Lines     On
235
        Refresh_Interval    10
236
        Rotate_Wait         30
237
        storage.type        filesystem
238
        Read_from_Head      Off
239

240
    [FILTER]
241
        Name                kubernetes
242
        Match               application.*
243
        Kube_URL            https://kubernetes.default.svc:443
244
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
245
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
246
        Kube_Tag_Prefix     application.var.log.containers.
247
        Merge_Log           On
248
        Merge_Log_Key       log_processed
249
        K8S-Logging.Parser  On
250
        K8S-Logging.Exclude Off
251
        Labels              Off
252
        Annotations         Off
253

254
    [OUTPUT]
255
        Name                cloudwatch_logs
256
        Match               application.*
257
        region              us-east-1
258
        log_group_name      /aws/containerinsights/{cluster_name}/application
259
        log_stream_prefix   ${{kubernetes['pod_name']}}
260
        auto_create_group   true
261
        extra_user_agent    container-insights
262

263
  parsers.conf: |
264
    [PARSER]
265
        Name   apache
266
        Format regex
267
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
268
        Time_Key time
269
        Time_Format %d/%b/%Y:%H:%M:%S %z
270

271
    [PARSER]
272
        Name   apache2
273
        Format regex
274
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
275
        Time_Key time
276
        Time_Format %d/%b/%Y:%H:%M:%S %z
277
---
278
apiVersion: apps/v1
279
kind: DaemonSet
280
metadata:
281
  name: fluent-bit
282
  namespace: amazon-cloudwatch
283
  labels:
284
    k8s-app: fluent-bit
285
    version: v1
286
    kubernetes.io/cluster-service: "true"
287
spec:
288
  selector:
289
    matchLabels:
290
      k8s-app: fluent-bit
291
  template:
292
    metadata:
293
      labels:
294
        k8s-app: fluent-bit
295
        version: v1
296
        kubernetes.io/cluster-service: "true"
297
    spec:
298
      containers:
299
      - name: fluent-bit
300
        image: amazon/aws-for-fluent-bit:2.28.4
301
        imagePullPolicy: Always
302
        env:
303
        - name: AWS_REGION
304
          valueFrom:
305
            configMapKeyRef:
306
              name: fluent-bit-cluster-info
307
              key: cluster.region
308
        - name: CLUSTER_NAME
309
          valueFrom:
310
            configMapKeyRef:
311
              name: fluent-bit-cluster-info
312
              key: cluster.name
313
        - name: HTTP_SERVER
314
          value: "On"
315
        - name: HTTP_PORT
316
          value: "2020"
317
        - name: READ_FROM_HEAD
318
          value: "Off"
319
        - name: READ_FROM_TAIL
320
          value: "On"
321
        - name: HOST_NAME
322
          valueFrom:
323
            fieldRef:
324
              fieldPath: spec.nodeName
325
        - name: HOSTNAME
326
          valueFrom:
327
            fieldRef:
328
              apiVersion: v1
329
              fieldPath: metadata.name
330
        - name: CI_VERSION
331
          value: "k8s/1.3.12"
332
        resources:
333
          limits:
334
            cpu: 500m
335
            memory: 200Mi
336
          requests:
337
            cpu: 500m
338
            memory: 200Mi
339
        volumeMounts:
340
        - name: fluentbitstate
341
          mountPath: /var/fluent-bit/state
342
        - name: varlog
343
          mountPath: /var/log
344
          readOnly: true
345
        - name: varlibdockercontainers
346
          mountPath: /var/lib/docker/containers
347
          readOnly: true
348
        - name: fluent-bit-config
349
          mountPath: /fluent-bit/etc/
350
        - name: runlogjournal
351
          mountPath: /run/log/journal
352
          readOnly: true
353
        - name: dmesg
354
          mountPath: /var/log/dmesg
355
          readOnly: true
356
      terminationGracePeriodSeconds: 10
357
      volumes:
358
      - name: fluentbitstate
359
        hostPath:
360
          path: /var/fluent-bit/state
361
      - name: varlog
362
        hostPath:
363
          path: /var/log
364
      - name: varlibdockercontainers
365
        hostPath:
366
          path: /var/lib/docker/containers
367
      - name: fluent-bit-config
368
        configMap:
369
          name: fluent-bit-config
370
      - name: runlogjournal
371
        hostPath:
372
          path: /run/log/journal
373
      - name: dmesg
374
        hostPath:
375
          path: /var/log/dmesg
376
      serviceAccountName: fluent-bit
377
      tolerations:
378
      - key: node-role.kubernetes.io/master
379
        operator: Exists
380
        effect: NoSchedule
381
      - operator: "Exists"
382
        effect: "NoExecute"
383
      - operator: "Exists"
384
        effect: "NoSchedule"
385
"""
386
        return fluent_bit_config
387

388
# Monitoring setup examples
389
monitoring_manager = EKSMonitoringManager()
390

391
# Set up Container Insights
392
container_insights_config = monitoring_manager.setup_container_insights('production-cluster')
393

394
# Create CloudWatch dashboard
395
monitoring_manager.create_custom_metrics_dashboard('production-cluster')
396

397
# Set up log aggregation
398
fluent_bit_config = monitoring_manager.setup_log_aggregation('production-cluster')
399

400
print("Container Insights Configuration:")
401
print(container_insights_config)

Best Practices {#best-practices}#

EKS Security and Operational Best Practices#

1
class EKSBestPractices:
2
    def __init__(self):
3
        self.eks = boto3.client('eks')
4

5
    def security_best_practices(self):
6
        """
7
        Implement EKS security best practices
8
        """
9
        security_practices = {
10
            'cluster_security': {
11
                'enable_endpoint_private_access': True,
12
                'disable_endpoint_public_access_if_possible': True,
13
                'restrict_public_access_cidrs': ['YOUR_OFFICE_IP/32'],
14
                'enable_cluster_logging': ['api', 'audit', 'authenticator', 'controllerManager', 'scheduler']
15
            },
16
            'node_security': {
17
                'use_latest_ami': 'Always use latest EKS-optimized AMI',
18
                'implement_pod_security_standards': 'Use Pod Security Standards',
19
                'resource_quotas': 'Implement resource quotas and limits',
20
                'network_policies': 'Use Kubernetes network policies'
21
            },
22
            'rbac_configuration': self.generate_rbac_examples(),
23
            'secrets_management': self.secrets_management_practices(),
24
            'container_security': self.container_security_practices()
25
        }
26

27
        return security_practices
28

29
    def generate_rbac_examples(self):
30
        """
31
        Generate RBAC configuration examples
32
        """
33
        rbac_configs = {
34
            'developer_role': """
35
apiVersion: rbac.authorization.k8s.io/v1
36
kind: Role
37
metadata:
38
  namespace: development
39
  name: developer
40
rules:
41
- apiGroups: [""]
42
  resources: ["pods", "services", "configmaps", "secrets"]
43
  verbs: ["get", "list", "create", "update", "patch", "delete"]
44
- apiGroups: ["apps"]
45
  resources: ["deployments", "replicasets"]
46
  verbs: ["get", "list", "create", "update", "patch", "delete"]
47
---
48
apiVersion: rbac.authorization.k8s.io/v1
49
kind: RoleBinding
50
metadata:
51
  name: developer-binding
52
  namespace: development
53
subjects:
54
- kind: User
55
  name: developer@company.com
56
  apiGroup: rbac.authorization.k8s.io
57
roleRef:
58
  kind: Role
59
  name: developer
60
  apiGroup: rbac.authorization.k8s.io
61
""",
62
            'readonly_role': """
63
apiVersion: rbac.authorization.k8s.io/v1
64
kind: ClusterRole
65
metadata:
66
  name: readonly
67
rules:
68
- apiGroups: [""]
69
  resources: ["*"]
70
  verbs: ["get", "list"]
71
- apiGroups: ["apps"]
72
  resources: ["*"]
73
  verbs: ["get", "list"]
74
---
75
apiVersion: rbac.authorization.k8s.io/v1
76
kind: ClusterRoleBinding
77
metadata:
78
  name: readonly-binding
79
subjects:
80
- kind: User
81
  name: readonly@company.com
82
  apiGroup: rbac.authorization.k8s.io
83
roleRef:
84
  kind: ClusterRole
85
  name: readonly
86
  apiGroup: rbac.authorization.k8s.io
87
"""
88
        }
89

90
        return rbac_configs
91

92
    def secrets_management_practices(self):
93
        """
94
        Secrets management best practices
95
        """
96
        practices = {
97
            'external_secrets_operator': {
98
                'description': 'Use External Secrets Operator with AWS Secrets Manager',
99
                'example_yaml': """
100
apiVersion: external-secrets.io/v1beta1
101
kind: SecretStore
102
metadata:
103
  name: aws-secrets-manager
104
  namespace: default
105
spec:
106
  provider:
107
    aws:
108
      service: SecretsManager
109
      region: us-east-1
110
      auth:
111
        jwt:
112
          serviceAccountRef:
113
            name: external-secrets-sa
114
---
115
apiVersion: external-secrets.io/v1beta1
116
kind: ExternalSecret
117
metadata:
118
  name: database-secret
119
  namespace: default
120
spec:
121
  refreshInterval: 1h
122
  secretStoreRef:
123
    name: aws-secrets-manager
124
    kind: SecretStore
125
  target:
126
    name: database-secret
127
    creationPolicy: Owner
128
  data:
129
  - secretKey: username
130
    remoteRef:
131
      key: prod/database
132
      property: username
133
  - secretKey: password
134
    remoteRef:
135
      key: prod/database
136
      property: password
137
"""
138
            },
139
            'sealed_secrets': {
140
                'description': 'Use Sealed Secrets for GitOps workflows',
141
                'installation': 'kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml'
142
            }
143
        }
144

145
        return practices
146

147
    def container_security_practices(self):
148
        """
149
        Container security best practices
150
        """
151
        practices = {
152
            'pod_security_standards': """
153
apiVersion: v1
154
kind: Namespace
155
metadata:
156
  name: secure-namespace
157
  labels:
158
    pod-security.kubernetes.io/enforce: restricted
159
    pod-security.kubernetes.io/audit: restricted
160
    pod-security.kubernetes.io/warn: restricted
161
""",
162
            'security_context_example': """
163
apiVersion: apps/v1
164
kind: Deployment
165
metadata:
166
  name: secure-app
167
spec:
168
  replicas: 3
169
  selector:
170
    matchLabels:
171
      app: secure-app
172
  template:
173
    metadata:
174
      labels:
175
        app: secure-app
176
    spec:
177
      securityContext:
178
        runAsNonRoot: true
179
        runAsUser: 1000
180
        fsGroup: 2000
181
        seccompProfile:
182
          type: RuntimeDefault
183
      containers:
184
      - name: app
185
        image: myapp:latest
186
        securityContext:
187
          allowPrivilegeEscalation: false
188
          readOnlyRootFilesystem: true
189
          runAsNonRoot: true
190
          runAsUser: 1000
191
          capabilities:
192
            drop:
193
            - ALL
194
        resources:
195
          requests:
196
            cpu: 100m
197
            memory: 128Mi
198
          limits:
199
            cpu: 500m
200
            memory: 512Mi
201
        volumeMounts:
202
        - name: tmp
203
          mountPath: /tmp
204
        - name: var-run
205
          mountPath: /var/run
206
      volumes:
207
      - name: tmp
208
        emptyDir: {}
209
      - name: var-run
210
        emptyDir: {}
211
""",
212
            'network_policy_example': """
213
apiVersion: networking.k8s.io/v1
214
kind: NetworkPolicy
215
metadata:
216
  name: deny-all
217
  namespace: production
218
spec:
219
  podSelector: {}
220
  policyTypes:
221
  - Ingress
222
  - Egress
223
---
224
apiVersion: networking.k8s.io/v1
225
kind: NetworkPolicy
226
metadata:
227
  name: allow-web-to-api
228
  namespace: production
229
spec:
230
  podSelector:
231
    matchLabels:
232
      app: api
233
  policyTypes:
234
  - Ingress
235
  ingress:
236
  - from:
237
    - podSelector:
238
        matchLabels:
239
          app: web
240
    ports:
241
    - protocol: TCP
242
      port: 8080
243
"""
244
        }
245

246
        return practices
247

248
    def operational_best_practices(self):
249
        """
250
        Operational best practices for EKS
251
        """
252
        practices = {
253
            'resource_management': {
254
                'resource_quotas': """
255
apiVersion: v1
256
kind: ResourceQuota
257
metadata:
258
  name: compute-quota
259
  namespace: development
260
spec:
261
  hard:
262
    requests.cpu: "4"
263
    requests.memory: 8Gi
264
    limits.cpu: "8"
265
    limits.memory: 16Gi
266
    persistentvolumeclaims: "10"
267
    services: "5"
268
    secrets: "10"
269
    configmaps: "10"
270
""",
271
                'limit_ranges': """
272
apiVersion: v1
273
kind: LimitRange
274
metadata:
275
  name: mem-limit-range
276
  namespace: development
277
spec:
278
  limits:
279
  - default:
280
      memory: "512Mi"
281
      cpu: "500m"
282
    defaultRequest:
283
      memory: "256Mi"
284
      cpu: "100m"
285
    type: Container
286
"""
287
            },
288
            'cluster_maintenance': {
289
                'upgrade_strategy': 'Always test upgrades in non-production first',
290
                'backup_strategy': 'Regular etcd backups and configuration backups',
291
                'monitoring': 'Comprehensive monitoring and alerting setup',
292
                'logging': 'Centralized logging with retention policies'
293
            },
294
            'cost_optimization': {
295
                'right_sizing': 'Regular review of resource requests and limits',
296
                'spot_instances': 'Use spot instances for non-critical workloads',
297
                'cluster_autoscaler': 'Implement cluster autoscaler for dynamic scaling',
298
                'vertical_pod_autoscaler': 'Use VPA for automatic resource optimization'
299
            }
300
        }
301

302
        return practices
303

304
    def disaster_recovery_practices(self):
305
        """
306
        Disaster recovery best practices
307
        """
308
        dr_practices = {
309
            'backup_strategies': {
310
                'velero_backup': """
311
# Install Velero for cluster backup
312
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts/
313
helm install velero vmware-tanzu/velero \\
314
  --namespace velero \\
315
  --create-namespace \\
316
  --set-file credentials.secretContents.cloud=./credentials-velero \\
317
  --set configuration.provider=aws \\
318
  --set configuration.backupStorageLocation.bucket=my-backup-bucket \\
319
  --set configuration.backupStorageLocation.config.region=us-east-1 \\
320
  --set configuration.volumeSnapshotLocation.config.region=us-east-1 \\
321
  --set initContainers[0].name=velero-plugin-for-aws \\
322
  --set initContainers[0].image=velero/velero-plugin-for-aws:v1.5.0 \\
323
  --set initContainers[0].volumeMounts[0].mountPath=/target \\
324
  --set initContainers[0].volumeMounts[0].name=plugins
325
""",
326
                'automated_backups': """
327
# Schedule automated backups
328
apiVersion: velero.io/v1
329
kind: Schedule
330
metadata:
331
  name: daily-backup
332
  namespace: velero
333
spec:
334
  schedule: "0 2 * * *"  # Daily at 2 AM
335
  template:
336
    includedNamespaces:
337
    - production
338
    - staging
339
    storageLocation: default
340
    ttl: "720h"  # 30 days retention
341
"""
342
            },
343
            'multi_region_setup': {
344
                'description': 'Set up clusters in multiple regions for high availability',
345
                'considerations': [
346
                    'Cross-region networking setup',
347
                    'Data replication strategy',
348
                    'DNS failover configuration',
349
                    'Application state management'
350
                ]
351
            }
352
        }
353

354
        return dr_practices
355

356
# Best practices implementation
357
best_practices = EKSBestPractices()
358

359
# Get security best practices
360
security_practices = best_practices.security_best_practices()
361
print("EKS Security Best Practices:")
362
print(json.dumps(security_practices, indent=2))
363

364
# Get operational best practices
365
operational_practices = best_practices.operational_best_practices()
366
print("\nEKS Operational Best Practices:")
367
print(json.dumps(operational_practices, indent=2, default=str))
368

369
# Get disaster recovery practices
370
dr_practices = best_practices.disaster_recovery_practices()
371
print("\nEKS Disaster Recovery Best Practices:")
372
print(json.dumps(dr_practices, indent=2))

Cost Optimization {#cost-optimization}#

EKS Cost Management Strategies#

1
class EKSCostOptimizer:
2
    def __init__(self):
3
        self.eks = boto3.client('eks')
4
        self.ce = boto3.client('ce')  # Cost Explorer
5
        self.ec2 = boto3.client('ec2')
6

7
    def analyze_eks_costs(self, cluster_name, start_date, end_date):
8
        """
9
        Analyze EKS cluster costs
10
        """
11
        try:
12
            response = self.ce.get_cost_and_usage(
13
                TimePeriod={
14
                    'Start': start_date.strftime('%Y-%m-%d'),
15
                    'End': end_date.strftime('%Y-%m-%d')
16
                },
17
                Granularity='MONTHLY',
18
                Metrics=['BlendedCost', 'UsageQuantity'],
19
                GroupBy=[
20
                    {
21
                        'Type': 'DIMENSION',
22
                        'Key': 'USAGE_TYPE'
23
                    }
24
                ],
25
                Filter={
26
                    'And': [
27
                        {
28
                            'Dimensions': {
29
                                'Key': 'SERVICE',
30
                                'Values': ['Amazon Elastic Kubernetes Service']
31
                            }
32
                        },
33
                        {
34
                            'Dimensions': {
35
                                'Key': 'RESOURCE_ID',
36
                                'Values': [cluster_name]
37
                            }
38
                        }
39
                    ]
40
                }
41
            )
42

43
            cost_breakdown = {}
44
            for result in response['ResultsByTime']:
45
                for group in result['Groups']:
46
                    usage_type = group['Keys'][0]
47
                    cost = float(group['Metrics']['BlendedCost']['Amount'])
48
                    usage = float(group['Metrics']['UsageQuantity']['Amount'])
49

50
                    cost_breakdown[usage_type] = {
51
                        'cost': cost,
52
                        'usage': usage
53
                    }
54

55
            return cost_breakdown
56

57
        except Exception as e:
58
            print(f"Error analyzing EKS costs: {e}")
59
            return {}
60

61
    def optimize_node_groups(self, cluster_name):
62
        """
63
        Analyze and optimize node group configurations
64
        """
65
        try:
66
            response = self.eks.list_nodegroups(clusterName=cluster_name)
67
            optimizations = []
68

69
            for ng_name in response['nodegroups']:
70
                ng_detail = self.eks.describe_nodegroup(
71
                    clusterName=cluster_name,
72
                    nodegroupName=ng_name
73
                )
74

75
                nodegroup = ng_detail['nodegroup']
76
                recommendations = []
77

78
                # Check instance types
79
                instance_types = nodegroup['instanceTypes']
80
                if len(instance_types) == 1 and 't2.' in instance_types[0]:
81
                    recommendations.append("Consider using newer generation instances (t3, t4g)")
82

83
                # Check scaling configuration
84
                scaling = nodegroup['scalingConfig']
85
                if scaling['minSize'] == scaling['desiredSize']:
86
                    recommendations.append("Enable auto-scaling by setting minSize < desiredSize")
87

88
                # Check capacity type
89
                capacity_type = nodegroup.get('capacityType', 'ON_DEMAND')
90
                if capacity_type == 'ON_DEMAND':
91
                    recommendations.append("Consider using SPOT instances for cost savings")
92

93
                # Check disk size
94
                disk_size = nodegroup.get('diskSize', 20)
95
                if disk_size > 50:
96
                    recommendations.append("Large disk size - consider using separate EBS volumes")
97

98
                if recommendations:
99
                    optimizations.append({
100
                        'nodegroup_name': ng_name,
101
                        'current_config': {
102
                            'instance_types': instance_types,
103
                            'capacity_type': capacity_type,
104
                            'scaling_config': scaling,
105
                            'disk_size': disk_size
106
                        },
107
                        'recommendations': recommendations
108
                    })
109

110
            return optimizations
111

112
        except Exception as e:
113
            print(f"Error optimizing node groups: {e}")
114
            return []
115

116
    def implement_spot_instances(self, cluster_name, node_group_name):
117
        """
118
        Create spot instance node group
119
        """
120
        spot_node_group_config = {
121
            'clusterName': cluster_name,
122
            'nodegroupName': f"{node_group_name}-spot",
123
            'scalingConfig': {
124
                'minSize': 0,
125
                'maxSize': 10,
126
                'desiredSize': 3
127
            },
128
            'instanceTypes': ['t3.medium', 't3.large', 't3a.medium', 't3a.large'],
129
            'capacityType': 'SPOT',
130
            'amiType': 'AL2_x86_64',
131
            'nodeRole': 'arn:aws:iam::123456789012:role/NodeInstanceRole',
132
            'subnets': ['subnet-12345678', 'subnet-87654321'],
133
            'labels': {
134
                'node-type': 'spot',
135
                'cost-optimization': 'enabled'
136
            },
137
            'taints': [
138
                {
139
                    'key': 'spot-instance',
140
                    'value': 'true',
141
                    'effect': 'NO_SCHEDULE'
142
                }
143
            ],
144
            'tags': {
145
                'NodeType': 'Spot',
146
                'CostOptimization': 'Enabled'
147
            }
148
        }
149

150
        return spot_node_group_config
151

152
    def setup_cluster_autoscaler_with_cost_optimization(self, cluster_name):
153
        """
154
        Configure cluster autoscaler for cost optimization
155
        """
156
        autoscaler_config = f"""
157
apiVersion: apps/v1
158
kind: Deployment
159
metadata:
160
  name: cluster-autoscaler
161
  namespace: kube-system
162
  labels:
163
    app: cluster-autoscaler
164
spec:
165
  replicas: 1
166
  selector:
167
    matchLabels:
168
      app: cluster-autoscaler
169
  template:
170
    metadata:
171
      labels:
172
        app: cluster-autoscaler
173
      annotations:
174
        prometheus.io/scrape: 'true'
175
        prometheus.io/port: '8085'
176
    spec:
177
      priorityClassName: system-cluster-critical
178
      securityContext:
179
        runAsNonRoot: true
180
        runAsUser: 65534
181
        fsGroup: 65534
182
      serviceAccountName: cluster-autoscaler
183
      containers:
184
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
185
        name: cluster-autoscaler
186
        resources:
187
          limits:
188
            cpu: 100m
189
            memory: 600Mi
190
          requests:
191
            cpu: 100m
192
            memory: 600Mi
193
        command:
194
        - ./cluster-autoscaler
195
        - --v=4
196
        - --stderrthreshold=info
197
        - --cloud-provider=aws
198
        - --skip-nodes-with-local-storage=false
199
        - --expander=least-waste
200
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{cluster_name}
201
        - --balance-similar-node-groups
202
        - --skip-nodes-with-system-pods=false
203
        - --scale-down-enabled=true
204
        - --scale-down-delay-after-add=10m
205
        - --scale-down-unneeded-time=10m
206
        - --scale-down-delay-after-delete=10s
207
        - --scale-down-delay-after-failure=3m
208
        - --scale-down-utilization-threshold=0.5
209
        - --max-node-provision-time=15m
210
        env:
211
        - name: AWS_REGION
212
          value: us-east-1
213
        - name: AWS_STS_REGIONAL_ENDPOINTS
214
          value: regional
215
        volumeMounts:
216
        - name: ssl-certs
217
          mountPath: /etc/ssl/certs/ca-certificates.crt
218
          readOnly: true
219
      volumes:
220
      - name: ssl-certs
221
        hostPath:
222
          path: /etc/ssl/certs/ca-bundle.crt
223
      nodeSelector:
224
        kubernetes.io/os: linux
225
"""
226
        return autoscaler_config
227

228
    def setup_vertical_pod_autoscaler(self):
229
        """
230
        Set up VPA for automatic resource optimization
231
        """
232
        vpa_examples = {
233
            'vpa_installation': """
234
# Install VPA
235
git clone https://github.com/kubernetes/autoscaler.git
236
cd autoscaler/vertical-pod-autoscaler/
237
./hack/vpa-install.sh
238
""",
239
            'vpa_example': """
240
apiVersion: autoscaling.k8s.io/v1
241
kind: VerticalPodAutoscaler
242
metadata:
243
  name: web-app-vpa
244
  namespace: production
245
spec:
246
  targetRef:
247
    apiVersion: apps/v1
248
    kind: Deployment
249
    name: web-app
250
  updatePolicy:
251
    updateMode: "Auto"
252
  resourcePolicy:
253
    containerPolicies:
254
    - containerName: web-app
255
      maxAllowed:
256
        cpu: 1
257
        memory: 2Gi
258
      minAllowed:
259
        cpu: 100m
260
        memory: 128Mi
261
      controlledResources: ["cpu", "memory"]
262
      controlledValues: RequestsAndLimits
263
""",
264
            'vpa_monitoring': """
265
# Check VPA recommendations
266
kubectl describe vpa web-app-vpa
267

268
# Get current resource usage
269
kubectl top pods -n production
270

271
# View VPA status
272
kubectl get vpa -A
273
"""
274
        }
275

276
        return vpa_examples
277

278
    def generate_cost_optimization_report(self, cluster_name):
279
        """
280
        Generate comprehensive cost optimization report
281
        """
282
        from datetime import datetime, timedelta
283

284
        end_date = datetime.utcnow()
285
        start_date = end_date - timedelta(days=30)
286

287
        report = {
288
            'cluster_name': cluster_name,
289
            'report_date': datetime.utcnow().isoformat(),
290
            'cost_analysis': self.analyze_eks_costs(cluster_name, start_date, end_date),
291
            'node_group_optimizations': self.optimize_node_groups(cluster_name),
292
            'recommendations': {
293
                'immediate': [
294
                    'Implement cluster autoscaler with cost-optimized settings',
295
                    'Enable spot instances for non-critical workloads',
296
                    'Right-size resource requests and limits',
297
                    'Set up VPA for automatic optimization'
298
                ],
299
                'medium_term': [
300
                    'Consider Fargate for irregular workloads',
301
                    'Implement pod disruption budgets',
302
                    'Use reserved instances for predictable workloads',
303
                    'Optimize storage costs with appropriate storage classes'
304
                ],
305
                'long_term': [
306
                    'Evaluate multi-arch (ARM-based) instances',
307
                    'Implement comprehensive monitoring for cost tracking',
308
                    'Consider cluster consolidation opportunities',
309
                    'Implement automated cost reporting and alerting'
310
                ]
311
            },
312
            'estimated_savings': {
313
                'spot_instances': '60-90% on compute costs',
314
                'right_sizing': '20-30% on resource costs',
315
                'cluster_autoscaler': '15-25% on unused capacity',
316
                'fargate_for_batch': '30-50% on sporadic workloads'
317
            }
318
        }
319

320
        return report
321

322
# Cost optimization examples
323
cost_optimizer = EKSCostOptimizer()
324

325
# Generate cost optimization report
326
report = cost_optimizer.generate_cost_optimization_report('production-cluster')
327

328
print("EKS Cost Optimization Report")
329
print("=" * 40)
330
print(json.dumps(report, indent=2, default=str))
331

332
# Get spot instance configuration
333
spot_config = cost_optimizer.implement_spot_instances('production-cluster', 'worker-nodes')
334
print("\nSpot Instance Node Group Configuration:")
335
print(json.dumps(spot_config, indent=2))
336

337
# Get cluster autoscaler configuration
338
autoscaler_config = cost_optimizer.setup_cluster_autoscaler_with_cost_optimization('production-cluster')
339
print("\nCluster Autoscaler Configuration:")
340
print(autoscaler_config)
341

342
# Get VPA setup
343
vpa_setup = cost_optimizer.setup_vertical_pod_autoscaler()
344
print("\nVertical Pod Autoscaler Setup:")
345
print(json.dumps(vpa_setup, indent=2))

Conclusion#

Amazon EKS provides a robust, managed Kubernetes platform that simplifies container orchestration on AWS. Key takeaways:

Essential Components:#

Managed Control Plane: AWS handles Kubernetes master components
Flexible Node Options: Choose between managed node groups, self-managed nodes, or Fargate
AWS Integration: Native integration with VPC, IAM, and other AWS services
Add-ons: Managed add-ons for core Kubernetes functionality

Advanced Capabilities:#

Multiple Compute Options: EC2, Fargate, and Spot instances
Comprehensive Networking: VPC CNI, load balancers, and network policies
Security Integration: IAM roles, Pod Security Standards, and secrets management
Observability: Container Insights, logging, and monitoring integration
Auto-scaling: Cluster autoscaler, HPA, and VPA for dynamic scaling

Best Practices:#

Implement comprehensive security controls with RBAC and Pod Security Standards
Use infrastructure as code for cluster and application management
Implement proper monitoring, logging, and alerting
Follow cost optimization strategies with spot instances and right-sizing
Maintain disaster recovery and backup strategies
Keep clusters and node groups updated with latest patches

Cost Optimization Strategies:#

Leverage spot instances for non-critical workloads (60-90% savings)
Implement cluster autoscaler for dynamic scaling
Use Vertical Pod Autoscaler for right-sizing
Consider Fargate for sporadic or batch workloads
Monitor and optimize resource requests and limits

EKS enables organizations to focus on application development while AWS manages the complexity of Kubernetes infrastructure, providing a secure, scalable, and cost-effective container orchestration platform.