Creating a Kubernetes Cluster in the Cloud

Kubernetes has become the de facto standard for container orchestration, and many organizations are adopting this technology to manage their containers. However, setting up a Kubernetes cluster can be a challenging task. This is where Terraform comes into play.

Terraform is an open-source infrastructure as code (IAC) tool that allows you to define infrastructure using declarative configuration files. In this article, we will show you how to use Terraform to create a Kubernetes cluster in AWS, GCP and Azure.

Prerequisites

Before we dive into creating a Kubernetes cluster with Terraform, there are some prerequisites that need to be fulfilled:

An account on AWS/GCP/Azure
A working knowledge of Kubernetes
A basic understanding of Terraform

Once these prerequisites have been met, we can proceed with creating our K8s Cluster.

Setting up the environment

Firstly, make sure that you have installed all necessary tools such as kubectl, terraform, and any cloud-specific CLI tools like awscli.

Next step would be cloning the Terraform provider repository from HashiCorp’s GitHub account.

git clone https://github.com/hashicorp/terraform-provider-kubernetes.git

After cloning it successfully move into the cloned directory by running:

cd terraform-provider-kubernetes/

To build the provider run:

go build -o ~/.terraform.d/plugins/<OS>_amd64/terraform-provider-kubernetes

Replace <OS> with your operating system type like “linux” or “windows”. You may also update your .bashrc file if needed.

Now let’s set up variables.tf file which will contain information about our k8s cluster resources such as number of nodes or instance types etc..

Example variables.tf content:

variable "region" {
  default = "us-east-1"
}

variable "node_count" {
  default = 3
}

variable "node_type" {
  default = "t2.micro"
}

This file defines three variables: region specifies which region should host our K8s master node; node_count sets how many worker nodes should exist within each availability zone defined within this region; node_type defines what EC2 instance type workers should run on.

Creating Infrastructure

We’ll start by writing main.tf which contains most important part of terrafrom configs.

AWS

provider "kubernetes" {}
data "template_file" "userdata" {
  template = <<EOF
   #!/bin/bash
   sudo yum install docker -y && sudo systemctl enable docker.service && sudo systemctl start docker.service
   EOF
}

resource "random_pet" "name" {
  length = 4
}

resource "aws_instance" "master-node" {
  ami                         = var.master_ami_id
  instance_type               = var.instance_type
  subnet_id                   = data.aws_subnet_ids.default.ids[0]
  vpc_security_group_ids      = [var.security_group_id]
  key_name                    = var.key_pair_name
  iam_instance_profile        = aws_iam_instance_profile.kubelet_nodes.name
  associate_public_ip_address = true
  user_data                   = data.template_file.userdata.rendered
  tags = {
    Name = "${var.cluster-name}-master-${random_pet.name}",
    Role = "Master"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo swapoff -a"
    ]
  }
}

resource "aws_key_pair" "key_master_node" {
  public_key = file(var.public_key_path)
  key_name   = var.key_pair_name
  depends_on = [null_resource.generate_ssh_keys]
}

GCP

provider "google" {
  project = var.project_id
  region  = var.region
}

data "template_file" "userdata" {
  template = <<EOF
#!/bin/bash
sudo apt-get update && sudo apt-get install -y docker.io
EOF
}

resource "random_pet" "name" {
  length = 4
}

resource "google_compute_instance" "master-node" {
  name         = "${var.cluster_name}-master-${random_pet.name}"
  machine_type = var.instance_type
  zone         = var.zone
  tags         = ["kubernetes"]

  boot_disk {
    initialize_params {
      image = var.master_image_id
    }
  }

  network_interface {
    network = "default"
  }

  metadata_startup_script = data.template_file.userdata.rendered

  metadata = {
    role = "master"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo swapoff -a"
    ]
  }

  lifecycle {
    ignore_changes = [
      "boot_disk"
    ]
  }
}

resource "google_compute_project_metadata_item" "ssh-keys" {
  key   = "ssh-keys"
  value = var.public_key
}

Azure

provider "azurerm" {
  features {}
}

data "template_file" "userdata" {
  template = <<EOF
#!/bin/bash
sudo apt-get update && sudo apt-get install -y docker.io
EOF
}

resource "random_pet" "name" {
  length = 4
}

resource "azurerm_network_interface" "master-node-nic" {
  name                = "${var.cluster_name}-master-${random_pet.name}-nic"
  location            = var.location
  resource_group_name = var.resource_group_name

  ip_configuration {
    name                          = "${var.cluster_name}-master-${random_pet.name}-ipconfig"
    subnet_id                     = azurerm_subnet.default.id
    private_ip_address_allocation = "dynamic"
  }
}

resource "azurerm_linux_virtual_machine" "master-node" {
  name                  = "${var.cluster_name}-master-${random_pet.name}"
  location              = var.location
  resource_group_name   = var.resource_group_name
  size                  = var.instance_type
  admin_username        = var.admin_username
  network_interface_ids = [azurerm_network_interface.master-node-nic.id]

  os_disk {
    name                 = "${var.cluster_name}-master-${random_pet.name}-osdisk"
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "18.04-LTS"
    version   = "latest"
  }

  custom_data = base64encode(data.template_file.userdata.rendered)

  tags = {
    Name = "${var.cluster_name}-master-${random_pet.name}"
    Role = "Master"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo swapoff -a"
    ]
  }
}

resource "azurerm_subnet" "default" {
  name                 = "${var.cluster_name}-subnet"
  resource_group_name  = var.resource_group_name
  virtual_network_name = azurerm_virtual_network.default.name
  address_prefixes     = ["10.0.2.0/24"]
}

resource "azurerm_virtual_network" "default" {
  name                = "${var.cluster_name}-vnet"
  location            = var.location
  resource_group_name = var.resource_group_name
  address_space       = ["10.0.0.0/16"]
}

In above example config we define several resources:

Provider block declares us to use kubectl command line utility for managing creation/destruction/updating k8s objects.
Data template resource creates user data script via bash commands used later during instances provisioning process. Resource random pet name generates random names based on length parameter value given above. The aws instance master-node resource describes ec2 instances used as control plane nodes aka masters including security groups networking settings etc.. Provisioner remote exec executes shell scripts over ssh connection after bootstrap process finishes i.e., disables swapping memory utilization under Linux OS images provided by AWS AMIs. And finally resource aws key pair creates ssh keys required during authentication process while connecting through SSH protocol method between hosts.

Similarly for Node pool creation below mentioned code snippet could be used :

AWS

module "machine_pools" {
  source                     = "kubernets-machine-pool/aws"
  cluster-name               = var.cluster-name
  count                      = var.node_pool_counts
  security-group-id          = aws_security_group.worker_sg.id
  instance-type              = var.instance_types
  subnet-id                  = element(data.aws_subnet_ids.private.ids, count.index)
  availability-zone-prefixes = ["${substr(data.aws_availability_zones.available.names[count.index], 0, -1)}"]
}

output "nodes_ips" {
  value       = machine_pools.instances_private_ips_list
  description = "List of private IPs assigned to worker pools."
}

output "public_dns_names" {
  value       = machine_pools.instances_public_dns_names_list
  description = "List of public DNS names assigned to worker pools."
}

output "private_dns_names" {
  value       = machine_pools.instances_private_dns_names_list
  description = "List of private DNS names assigned to worker pools."
}

GCP

module "machine_pools" {
  source       = "GoogleCloudPlatform/k8s-engine/google//modules/node_pool"
  name         = var.cluster-name
  node_count   = var.node_pool_counts
  machine_type = var.instance_types
  subnetwork   = element(data.google_compute_subnetwork.private.self_link, count.index)
  zone         = data.google_compute_subnetwork.private.region
  auto_repair  = true
  auto_upgrade = true
}

output "nodes_ips" {
  value       = module.machine_pools.private_node_ips
  description = "List of private IPs assigned to worker pools."
}

output "public_dns_names" {
  value       = module.machine_pools.public_node_ips
  description = "List of public DNS names assigned to worker pools."
}

output "private_dns_names" {
  value       = module.machine_pools.private_node_ips
  description = "List of private DNS names assigned to worker pools."
}

Azure

module "machine_pools" {
  source                   = "Azure/kubernetes/azurerm"
  resource_group_name      = var.resource_group_name
  kubernetes_cluster_name  = var.cluster_name
  agent_count              = var.node_pool_counts
  agent_vm_size            = var.instance_types
  agent_vnet_subnet_id     = element(data.azurerm_subnet.private.id, count.index)
  agent_availability_zones = ["${substr(data.azurerm_availability_zones.available.names[count.index], 0, -1)}"]
  agent_os_disk_size_gb    = 30
  agent_storage_profile    = "ManagedDisks"
  agent_os_type            = "Linux"
  agent_auto_upgrade       = true
}

output "nodes_ips" {
  value       = module.machine_pools.private_agent_ips
  description = "List of private IPs assigned to worker pools."
}

output "public_dns_names" {
  value       = module.machine_pools.public_agent_dns_names
  description = "List of public DNS names assigned to worker pools."
}

output "private_dns_names" {
  value       = module.machine_pools.private_agent_dns_names
  description = "List of private DNS names assigned to worker pools."
}

With Instance group added now its time configuring Autoscaling Groups so that scaling happens automatically depending upon incoming workload increasing capacity when needed but without human intervention required every time there is surge in traffic demands.

Conclusion

Using Terraform makes it easy for DevOps engineers and SRE teams alike can quickly spin up new clusters across multiple cloud providers without having deep expertise in each platform they want work with i.e., AWS,GCP,Azure etc.. With simple copy paste configurations developers & sysadmins could deploy legacy applications stack ,containerized environments alongside modern microservices architectures backed Elasticsearch monitoring offered SimpleOps.us among other services.