Beginner's guide to Kubernetes
The objective of this article is to provide an introduction and a general understanding about Kubernetes. We will look into architectural concepts, advantages of using Kubernetes, how to set up Kubernetes, and how to run commands on a local machine using Minikube.
What is Kubernetes?
Before we dive deep into Kubernetes, let's first understand the concept of containers. Containerization is the process of packaging code and dependencies of an application into a single image, and running it in a single computing environment called the container.
Docker revolutionized containerization by packaging software into a virtual container that can run on any environment. With containers, one can deploy applications onto multiple platforms like AWS, GCP, or Digital Ocean.
But challenges arise when the number of containers increases. It becomes difficult to deploy and maintain hundreds of containers across many servers or VM instances. This led to the evolution of the container orchestration software.
An orchestrator’s role is to coordinate a set of container workloads across a series of servers or nodes. This includes things like making sure the containers have the right resources, the correct number of containers are available at all times, rolling deploys to prevent uptime, and more. Kubernetes is the most popular container orchestrator.
Kubernetes, also known as k8s, was released in 2015 by Google and is now maintained by an open-source community (of which Google is also a part of). Kubernetes provides a set of APIs and command-line interfaces to manage containers deployed across servers. It automates the deployment, scaling, and management of the containers across clusters of hosts. This makes it a popular choice to host microservice-based implementations as it addresses many concerns of microservice implementations like configuration management, service discovery, and job scheduling.
Apart from the core Kubernetes implementation , there are many versions/flavors of Kubernetes implementation provided by various cloud services and distributions that conform to the Kubernetes spec.
Let’s look at the components of Kubernetes architecture
We communicate with Kubernetes through its API. The popular tool that is used for this is called Kubectl (pronounced cube control or cube CTL).
A single server in a Kubernetes cluster is known as a node. Workloads are assigned from the control plane, or master to the nodes. The various components of a node server are:
- Container Runtime: The container runtime runs and manages applications present in the containers. The container runtime is usually Docker although it supports other container runtimes like Containerd.
- Kubelet: Kubelet is a container that runs as an agent on each node to communicate with the Kubernetes control plane (master node). Kubelet also makes sure that the pods (basic unit of deployment) running in a node are healthy and running based on the YAML file called Podspec.
- Kube-proxy: A proxy service that runs on each of the node servers that performs tasks like forwarding correct requests to containers and make services available to other components.
The Control plane, also known as the master, is in charge of managing the Kubernetes cluster. It is a group of containers and each one of them does a single job. These components make the decisions that are applicable to the whole cluster like scheduling, detecting and responding to cluster events. The various components of the control plane are
- etcd: An important concern of a microservice architecture is that the configuration data should be kept isolated from the code and be accessible. etcd is the Kubernetes component which is a persistent, lightweight, and distributed key-value data store which stores configuration data that represents the state of the cluster.
- API server: As mentioned, Kubernetes consists of a set of APIs. This API is served using JSON over HTTP by the component in the control plane known as the API server. It provides interfaces to Kubernetes and is the API server that updates the states of API objects in etcd, allowing clients to configure workloads and containers across nodes
- Scheduler: The scheduler is the component of the control plane which assigns unassigned workloads(pods) to the nodes in the cluster. The scheduler tracks the states of the nodes and ensures that workload is equally distributed.
- Controller manager: The controllers are the components that work to turn the state of the cluster to the user’s desire. It creates updates and deletes resources in the controller. These controllers include ReplicationController, JobsController, and DaemonSet Controller. The controller manager is the component that manages all these controllers.
Kubernetes provides layers of abstractions over the container to provide mechanisms that deploy, maintain and scale applications. So the users will interact with the primitives provided by the Kubernetes object instead of directly communicating with the containers. Let’s take a look at the different types of objects available in Kubernetes.
A pod is made up of one or more containers running in a node. It is the basic unit of a deployment. Containers are not directly deployed in Kubernetes. They are deployed as pods. A pod is usually one or more containers that are controlled as a single application. Each pod is assigned a unique IP address known as Pod IP. The containers inside the pod can address each other on localhost, but a container has to use a Pod IP to address a container inside a different pod. Various operations can be performed on pods by the controllers that are managed by the Controller manager
When working with Kubernetes, we may have to replicate our pods and manage them for scaling purposes. For example, we may decide that we want three instances of our API running at all times. This is achieved by using ReplicaSets. A ReplicaSet is a grouping set that maintains replicas that are declared for a pod by the user. ReplicaSet maintains a stable set of copies of a pod that are always available.
A set of pods that work together is known as a Service in Kubernetes. The pods are defined by a semantic tag called a Label. The Service discovery component assigns an IP address and DNS name to the service, and load balances traffic into pods that match the selector label. Service discovery can be based on environmental variables or using Kubernetes DNS. An example for this is backend pods grouped into a service with requests from frontend load-balanced among them.
Deployments are one of the most important objects in Kubernetes. Deployment does the process of changing the actual state of the objects to the desired state of the user at a controlled rate. Deployments can adjust the replica sets, change the versions of the applications, etc by changing the configurations of the cluster.
There are two ways to deploy pods, by using the command line and by using a YAML file. We will look into this in detail in a later section.
By default, an ephemeral storage is provided by filesystem in Kubernetes, meaning a pod restart will wipe out all the data in the containers. This is a problem for applications that require persistent storage.
A volume provides persistent storage for the data in the pods. Volumes can be mounted at a specific path within containers, by defining them in the pod configuration. The same volume can be shared between all containers in the same pod.
One of the main challenges faced by a container orchestrator is the preservation of the state. In case of a pod restart or if the application is scaled up or down, the state may need to be redistributed. Also, the ordering of instances is important. An example of a stateful workload is a database. StatefulSets controller is used for managing the stateful applications and enforces properties of uniqueness and ordering amongst instances of a pod.
Usually, the location where pods are run is determined by the scheduler. But some pods need to be run on every single node of the cluster for use cases like log collection and storage services. This kind of pod scheduling is implemented by the feature called DaemonSet.
Kubernetes Local Install
Because Kubernetes defines a set of APIs and expected behavior, there are multiple implementations, one of which is a tool called minikube. Minikube allows you to run a Kubernetes cluster on your local machine, which makes it easy to spin up complex services or even test your deployments locally. Before we start the installation, we can quickly look at some of the other options available.
Docker Desktop: If you are already using Docker Desktop, Kubernetes can be easily enabled in it by clicking the ‘Enable Kubernetes’ checkbox in the Kubernetes tab.
MicroK8s: MicroK8S is developed by Canonical. Although it was made for Ubuntu, it now supports various other Linux distros and recently started support in Windows and Mac.
Minkube is the version we will be running. Minikube will run a single node Kubernetes cluster in a virtual machine.
Steps to Install Minikube
- As a first step make sure that Docker is properly installed in your system, as Docker is used for creating, managing, and controlling the containers.
- The next step is that we need a virtual machine or hypervisor in our system like Virtualbox, HyperKit, or KVM..
Install Virtualbox for Mac using Homebrew
3. The next step is to make sure that the CLI tool Kubectl is installed.
Download the latest version
Make the kubectl binary executable.
Move the binary into your PATH:
4. Now let's install Minikube
Add Minikube as executable to the path
Everything should work. Now start Minikube by the command
Running Kubernetes Commands
Let us start with checking the version of Kubectl
We can see that the command displays the versions of both client and server.
There are two ways to deploy a pod in Kubernetes
- Via Commands
- Via YAML file
We will be making use of the Nginx image to build our pod.
Let's start by creating a deployment named my-nginx from the Nginx image.
You can see that we have created a deployment object named my-nginx. If the image is present in the local machine it will be used, otherwise, the image will be pulled from the remote registry.
Now let us take a look at the pod that got created using the command
We can see that one pod is created for the deployment.
Let's look at the other objects that are created, using the command
We can see all the objects that are created. The Kubectl has created a deployment object and replica set along with the pod. What happens under the hood is that when we run the create command, a deployment controller is created. This creates a replication controller, which in turn creates the pods. By default, the replica set will create a single pod.
Now let's scale the number of replicas to 3
We can see that our deployment got scaled.
Let’s take a look at the objects again.
Here we can see that 2 more pods are added, and the desired state of our deployment is changed to 3 pods.
Now let’s see what happens if one of these pods gets deleted. Let's delete the first pod using the following command
We can see that the specified pod got deleted. Let's see how this changed our deployment.
Even though the pod we specified got deleted, we can see a new pod got automatically created.
This represents the fact, if a pod gets destroyed in the cluster, the replication controller will create new ones to make the deployment transition to the desired state.
Let's finish by deleting the created deployment
Via YAML file
The usage of commands is fine when we are learning and exploring Kubernetes, but when we move to production the more effective way to create resources is via a configuration file. The YAML files can be specified for different types of resource types like pod, Service, Deployment etc. For brevity, let's focus on a YAML file for the deployment.
We will configure the desired state of the Kubernetes deployment as a YAML file. The Kubectl will then create a deployment based on this file.
A sample YAML file is
There are four parts to this manifest
- apiVersion: The API version is the version of Kubernetes API for the object kind we specified.
- kind: The Kind can be Kubernetes resource type we need to create. In this case, it is deployment.
- metadata: The metadata can also be different for the different resource types. In our case, we specify the name of our deployment and the label for the pods.
- spec: Various specifications of the resource type like the number of replicas, the container image, selector labels are specified in the spec section.
Now let's create the deployment for the file using the apply command
We can see all the objects created
Great! We can see that this is the same as the deployment we created using the create command.
can be used to run all the YAML files in a folder.
Kubernetes is an orchestrator that helps to scale our application by managing containers. The main advantage is that it does not limit us to a single cloud or platform. Many platforms provide Kubernetes-first support. There are many options of Kubernetes to select from like cloud vendors or distributions like Docker Enterprise, Rancher, Openshift, Canonical VMware PKs. Since everyone supports it, it has the widest adoption and biggest community among the various container orchestrators.
It should be noted that not all solutions require container orchestration. Orchestration is designed to automate the changes in scaled applications. Single server applications that do not have a very high rate of changes do not normally require this kind of orchestration.Instead, they can make use of the abstractions present out of the box in their cloud platform for orchestration.
I hope this article was helpful in understanding the basic concepts of Kubernetes. There are many more capabilities in Kubernetes. Do check out the official documentation, and play with various available commands to fully leverage the system.