Guidance for Automated Provisioning of Amazon Elastic Kubernetes Service (EKS) using Terraform

This Guidance demonstrates how to provision EKS clusters, making it easier for customers to adopt Amazon Elastic Kubernetes Service (Amazon EKS) with required operational software. Use this Guidance with EKS Blueprints for Terraform Infrastructure as Code to quickly deploy and operate containerized workloads using a GitOps approach.

Architecture Diagram

Main Architecture
New Amazon VPC
Argo CD Add-on
Apache Spark Add-on

Main Architecture
Please note: This is the main architecture. For add-on steps, open the other tabs.

Step 1
Administrator or DevOps user commit infrastructure as code (IaC) changes to Amazon Elastic Kubernetes Service (Amazon EKS) blueprint and to Git repository.

Step 2
Deployment workflow is triggered.

Step 3
Terraform starts state reconciliation processes against target AWS environment.

Step 4
Required AWS Identity and Access Management (IAM) roles and polices are created.

Step 5
Customer’s Amazon Virtual Private Cloud (Amazon VPC) is deployed across Availability Zones (AZ).

Step 6
Subnets, route tables, Internet and NAT gateways, and other networking components are deployed into customer’s Amazon VPC.

Step 7
EKS control plane is deployed into managed Amazon VPC.

Step 8
EKS managed compute node group is deployed per Blueprint specification.

Step 9
EKS cluster is available, Kubernetes (K8s) API is operational and accessible via command line tool (CLI) kubectl and other API calls.
New Amazon VPC
Please note: This is an add-on to the main architecture.

Download the architecture diagram PDF

For steps 1-9, open the Main Architecture tab.

Step 10
Amazon EKS managed cluster add-ons: Kubernetes container network interface (CNI) plugin (VPC-CNI), CoreDNS, Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver, and kube-proxy are deployed through AWS API calls.

Step 11
Terraform Helm provider is deployed.

Step 12
EKS external cluster add-ons including: Metrics server, Kubecost, Gatekeeper, Autoscaler, Cert-manager, and metrics for Amazon CloudWatch are deployed by Terraform Helm provider.
Argo CD Add-on
Please note: This is an add-on to the main architecture.

Download the architecture diagram PDF

For steps 1-9, open the Main Architecture tab.

Step 10
Amazon EKS managed cluster add-ons: VPC-CNI, CoreDNS, EBS CSI driver, and kube-proxy are deployed through AWS API calls.

Step 11
Terraform Helm provider is deployed and Argo CD is deployed by Terraform Helm provider.

Step 12
Amazon EKS external cluster add-ons: Prometheus, Karpenter, Kubernetes Event-driven Autoscaling (Keda), Metrics server, Cert-manager, Fluent Bit, Yunikorn, Argo Rollouts, Traefik, and Vertical Pod Autoscaler are deployed through a GitOps bridge and Argo CD applications.

Step 13
aws-Auth ConfigMap, sample application teams, service accounts, roles, and application workloads are deployed via API calls and Argo CD applications.
Apache Spark Add-on
Please note: This is an add-on to the main architecture.

Download the architecture diagram PDF

For steps 1-9, open the Main Architecture tab.

Step 10
Amazon EKS managed cluster add-ons: VPC-CNI, CoreDNS, EBS CSI driver, and kube-proxy are deployed through AWS API calls.

Step 11
Terraform Helm provider is deployed.

Step 12
Amazon EKS external cluster add-ons: Apache Spark, Yunikorn, Karpenter, Metrics server, Fluent Bit, Prometheus, Grafana, Vertical Pod Autoscaler, Cert-manager, and metrics for CloudWatch are deployed.

Step 13
aws-Auth ConfigMap, teams, service accounts, roles, and sample Spark application workloads are deployed.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

This Guidance uses services that allow full visibility through monitoring and logging, providing businesses with reliable, stable, and dependable applications. Administrator/DevOps team users receive alerts from metrics defined in this Guidance to monitor the health of the workloads and minimize the impact from incidents.

The Implement Feedback Loops document describes how feedback loops are set up and how they work.

In the unlikely, yet possible event of a complete AWS Region outage, the Guidance can be redeployed to another AWS Region with minor changes to the Terraform modules configuration.

Currently, Guidance deployment/configuration change workflows are initiated manually, in the future they can be automated using GitHub repository push events when IaC code is updated.

Read the Operational Excellence whitepaper
Security

The Principle of least privilege is applied throughout this Guidance, limiting each resource’s access to only what is required. All resources in this Guidance, including data storage volumes, are deployed into private subnets. The EKS cluster is deployed in a separate VPC and can be accessed only through designated and protected endpoints front-ended by load balancers. External access to the EKS cluster API endpoint is enforced through HTTPS traffic; SSL certificates are attached to the ingress controllers. Pod security and network policies are used to allow or disallow specific traffic between pods or services running in different Kubernetes namespaces. The Amazon EKS managed control plane is an important resource to protect and is secured in its own VPC. As core system and application data is stored on Amazon EBS, their security is assured by the overall infrastructure security on AWS as well as EKS cluster security outlined in best practices guides.

EKS Blueprints generate IAM Roles for Service Accounts (IRSA) and policies on the EKS cluster. AWS IAM Authenticator for Kubernetes uses a webhook that is used to validate caller identities. When aws-auth ConfigMap is deployed, it enables the AWS IAM Authenticator to validate the IAM identity (role) that is mapped to service accounts, providing a fine-grained access control to cloud native applications.

As with any Kubernetes-based solution, it is highly recommended to apply AWS recommended security patches to EKS clusters.

Read the Security whitepaper
Reliability

To support reliable workloads, application-level add-ons are deployed by this Guidance in the EKS cluster. Kubernetes microservices provide the advantages of loosely coupled dependencies, assurance of the required number of replicas, and service level internal and ingress-based external load balancing.

Through the spread of private subnets in different Availability Zones, if compute node in one Availability Zone collapses, the Autoscaling group attached to the Amazon EKS cluster would spin up a replacement instance in another healthy Availability Zone.

Because components logs and metrics are key resources for troubleshooting the Guidance, Kubernetes (also called K8s) authenticator and scheduler logs from the control plane are implemented with CloudWatch. Additional log event integration for external systems is available via a FluentBit add-on. Metric-based monitoring and alerting are available through Prometheus add-on. If any failure events occur, alerts are delivered to administrators and/or DevOps team through various notification channels to avoid undetected issues.

Read the Reliability whitepaper
Performance Efficiency

Amazon EKS is a native service. This Guidance focuses on cost-efficient ways to deploy and configure it with selected resources so that users can achieve a reliable Kubernetes application platform with high availability and low operational costs. Optimization can be performed based on CPU and memory usage metrics as well as network traffic, input/output operations per second (IOPS), and other metrics. With this Guidance, users are capable of provisioning EKS clusters with customizable resource parameters adjusted for optimal performance automatically with minimal time.

Amazon EKS architecture is spanned across multiple Availability Zones in order to get highly available architecture. While some traffic will exist between subnets deployed into Availability Zones, its latency should not make any significant performance impact.

Read the Performance Efficiency whitepaper
Cost Optimization

Automation and scalability are cost-saving features this Guidance utilizes with Terraform and AWS Node auto-scaling groups. A centralized administration solution is implemented through the AWS Console, AWS Command Line Interface (AWS CLI), and with the Argo CD GitOps add-on. These features allow for early detection and correction of defects in the design process, which reduces total costs of development efforts and schedule overruns.

Because of the highly configurable autoscaling minimum, maximum, and desired number of compute nodes, as well as their Amazon Elastic Compute Cloud (Amazon EC2) parameters, resources are managed efficiently. There are related metrics and alerts that provide insights into AWS resource utilization available to administrators/DevOps teams.

A significant factor for data transfer costs within EKS Kubernetes clusters are calls to Kubernetes services from external clients going via Application Load Balancers. The data transfer costs when calling services are mapped to communications between pods running in different AWS Availability Zones.

Read the Cost Optimization whitepaper
Sustainability

This Guidance provisions and deploys workloads on an Amazon EKS cluster located in the AWS Cloud - there is no need to procure any physical hardware. Capacity providers and autoscaling groups keep virtual “hardware” provisioning to a minimum, along with minimal necessary adjustment to scaling events, should the workloads demand it.

Every pod running on the Kubernetes platform, including the EKS cluster, will consume memory, CPU, I/O and other resources. With performance driven auto-scaling enabled on the platform (via cluster auto-scaler) and application (including the host protected area), levels in both directions of resource utilization and resource consumption update automatically. EKS cluster administrators/DevOps teams can monitor resource utilization through metrics and events and perform direct configuration updates where needed.

Data is accessed through Kubernetes services that are exposed with secure endpoints. Using Amazon EBS, CSI drivers and related storage classes assures loosely coupled, scalable, and efficient data access patterns.

Read the Sustainability whitepaper

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open implementation guide

Open sample code on GitHub: New VPC and Argo CD

Open sample code on Github: Apache Spark

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Title

Disclaimer

Guidance for Automated Provisioning of Amazon Elastic Kubernetes Service (EKS) using Terraform

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Title

Disclaimer

Ending Support for Internet Explorer