Creating a Kubernetes Cluster in the Cloud
Kubernetes has become the de facto standard for container orchestration, and many organizations are adopting this technology to manage their containers. However, setting up a Kubernetes cluster can be a challenging task. This is where Terraform comes into play.
Terraform is an open-source infrastructure as code (IAC) tool that allows you to define infrastructure using declarative configuration files. In this article, we will show you how to use Terraform to create a Kubernetes cluster in AWS, GCP and Azure.
Prerequisites
Before we dive into creating a Kubernetes cluster with Terraform, there are some prerequisites that need to be fulfilled:
- An account on AWS/GCP/Azure
- A working knowledge of Kubernetes
- A basic understanding of Terraform
Once these prerequisites have been met, we can proceed with creating our K8s Cluster.
Setting up the environment
Firstly, make sure that you have installed all necessary tools such as kubectl
, terraform
, and any cloud-specific CLI tools like awscli
.
Next step would be cloning the Terraform provider repository from HashiCorp’s GitHub account.
git clone https://github.com/hashicorp/terraform-provider-kubernetes.git
After cloning it successfully move into the cloned directory by running:
cd terraform-provider-kubernetes/
To build the provider run:
go build -o ~/.terraform.d/plugins/<OS>_amd64/terraform-provider-kubernetes
Replace <OS>
with your operating system type like “linux” or “windows”. You may also update your .bashrc
file if needed.
Now let’s set up variables.tf file which will contain information about our k8s cluster resources such as number of nodes or instance types etc..
Example variables.tf content:
variable "region" { default = "us-east-1" } variable "node_count" { default = 3 } variable "node_type" { default = "t2.micro" }
This file defines three variables: region specifies which region should host our K8s master node; node_count sets how many worker nodes should exist within each availability zone defined within this region; node_type defines what EC2 instance type workers should run on.
Creating Infrastructure
We’ll start by writing main.tf which contains most important part of terrafrom configs.
AWS
provider "kubernetes" {} data "template_file" "userdata" { template = <<EOF #!/bin/bash sudo yum install docker -y && sudo systemctl enable docker.service && sudo systemctl start docker.service EOF } resource "random_pet" "name" { length = 4 } resource "aws_instance" "master-node" { ami = var.master_ami_id instance_type = var.instance_type subnet_id = data.aws_subnet_ids.default.ids[0] vpc_security_group_ids = [var.security_group_id] key_name = var.key_pair_name iam_instance_profile = aws_iam_instance_profile.kubelet_nodes.name associate_public_ip_address = true user_data = data.template_file.userdata.rendered tags = { Name = "${var.cluster-name}-master-${random_pet.name}", Role = "Master" } provisioner "remote-exec" { inline = [ "sudo swapoff -a" ] } } resource "aws_key_pair" "key_master_node" { public_key = file(var.public_key_path) key_name = var.key_pair_name depends_on = [null_resource.generate_ssh_keys] }
GCP
provider "google" { project = var.project_id region = var.region } data "template_file" "userdata" { template = <<EOF #!/bin/bash sudo apt-get update && sudo apt-get install -y docker.io EOF } resource "random_pet" "name" { length = 4 } resource "google_compute_instance" "master-node" { name = "${var.cluster_name}-master-${random_pet.name}" machine_type = var.instance_type zone = var.zone tags = ["kubernetes"] boot_disk { initialize_params { image = var.master_image_id } } network_interface { network = "default" } metadata_startup_script = data.template_file.userdata.rendered metadata = { role = "master" } provisioner "remote-exec" { inline = [ "sudo swapoff -a" ] } lifecycle { ignore_changes = [ "boot_disk" ] } } resource "google_compute_project_metadata_item" "ssh-keys" { key = "ssh-keys" value = var.public_key }
Azure
provider "azurerm" { features {} } data "template_file" "userdata" { template = <<EOF #!/bin/bash sudo apt-get update && sudo apt-get install -y docker.io EOF } resource "random_pet" "name" { length = 4 } resource "azurerm_network_interface" "master-node-nic" { name = "${var.cluster_name}-master-${random_pet.name}-nic" location = var.location resource_group_name = var.resource_group_name ip_configuration { name = "${var.cluster_name}-master-${random_pet.name}-ipconfig" subnet_id = azurerm_subnet.default.id private_ip_address_allocation = "dynamic" } } resource "azurerm_linux_virtual_machine" "master-node" { name = "${var.cluster_name}-master-${random_pet.name}" location = var.location resource_group_name = var.resource_group_name size = var.instance_type admin_username = var.admin_username network_interface_ids = [azurerm_network_interface.master-node-nic.id] os_disk { name = "${var.cluster_name}-master-${random_pet.name}-osdisk" caching = "ReadWrite" storage_account_type = "Standard_LRS" } source_image_reference { publisher = "Canonical" offer = "UbuntuServer" sku = "18.04-LTS" version = "latest" } custom_data = base64encode(data.template_file.userdata.rendered) tags = { Name = "${var.cluster_name}-master-${random_pet.name}" Role = "Master" } provisioner "remote-exec" { inline = [ "sudo swapoff -a" ] } } resource "azurerm_subnet" "default" { name = "${var.cluster_name}-subnet" resource_group_name = var.resource_group_name virtual_network_name = azurerm_virtual_network.default.name address_prefixes = ["10.0.2.0/24"] } resource "azurerm_virtual_network" "default" { name = "${var.cluster_name}-vnet" location = var.location resource_group_name = var.resource_group_name address_space = ["10.0.0.0/16"] }
In above example config we define several resources:
- Provider block declares us to use kubectl command line utility for managing creation/destruction/updating k8s objects.
- Data template resource creates user data script via bash commands used later during instances provisioning process. Resource random pet name generates random names based on length parameter value given above. The aws instance master-node resource describes ec2 instances used as control plane nodes aka masters including security groups networking settings etc.. Provisioner remote exec executes shell scripts over ssh connection after bootstrap process finishes i.e., disables swapping memory utilization under Linux OS images provided by AWS AMIs. And finally resource aws key pair creates ssh keys required during authentication process while connecting through SSH protocol method between hosts.
Similarly for Node pool creation below mentioned code snippet could be used :
AWS
module "machine_pools" { source = "kubernets-machine-pool/aws" cluster-name = var.cluster-name count = var.node_pool_counts security-group-id = aws_security_group.worker_sg.id instance-type = var.instance_types subnet-id = element(data.aws_subnet_ids.private.ids, count.index) availability-zone-prefixes = ["${substr(data.aws_availability_zones.available.names[count.index], 0, -1)}"] } output "nodes_ips" { value = machine_pools.instances_private_ips_list description = "List of private IPs assigned to worker pools." } output "public_dns_names" { value = machine_pools.instances_public_dns_names_list description = "List of public DNS names assigned to worker pools." } output "private_dns_names" { value = machine_pools.instances_private_dns_names_list description = "List of private DNS names assigned to worker pools." }
GCP
module "machine_pools" { source = "GoogleCloudPlatform/k8s-engine/google//modules/node_pool" name = var.cluster-name node_count = var.node_pool_counts machine_type = var.instance_types subnetwork = element(data.google_compute_subnetwork.private.self_link, count.index) zone = data.google_compute_subnetwork.private.region auto_repair = true auto_upgrade = true } output "nodes_ips" { value = module.machine_pools.private_node_ips description = "List of private IPs assigned to worker pools." } output "public_dns_names" { value = module.machine_pools.public_node_ips description = "List of public DNS names assigned to worker pools." } output "private_dns_names" { value = module.machine_pools.private_node_ips description = "List of private DNS names assigned to worker pools." }
Azure
module "machine_pools" { source = "Azure/kubernetes/azurerm" resource_group_name = var.resource_group_name kubernetes_cluster_name = var.cluster_name agent_count = var.node_pool_counts agent_vm_size = var.instance_types agent_vnet_subnet_id = element(data.azurerm_subnet.private.id, count.index) agent_availability_zones = ["${substr(data.azurerm_availability_zones.available.names[count.index], 0, -1)}"] agent_os_disk_size_gb = 30 agent_storage_profile = "ManagedDisks" agent_os_type = "Linux" agent_auto_upgrade = true } output "nodes_ips" { value = module.machine_pools.private_agent_ips description = "List of private IPs assigned to worker pools." } output "public_dns_names" { value = module.machine_pools.public_agent_dns_names description = "List of public DNS names assigned to worker pools." } output "private_dns_names" { value = module.machine_pools.private_agent_dns_names description = "List of private DNS names assigned to worker pools." }
With Instance group added now its time configuring Autoscaling Groups so that scaling happens automatically depending upon incoming workload increasing capacity when needed but without human intervention required every time there is surge in traffic demands.
Conclusion
Using Terraform makes it easy for DevOps engineers and SRE teams alike can quickly spin up new clusters across multiple cloud providers without having deep expertise in each platform they want work with i.e., AWS,GCP,Azure etc.. With simple copy paste configurations developers & sysadmins could deploy legacy applications stack ,containerized environments alongside modern microservices architectures backed Elasticsearch monitoring offered SimpleOps.us among other services.