Kubernetes: The Technology Revolution

Kubernetes: The Technology Revolution


13 min read

It’s no secret that the popularity of running containerized applications has exploded over the past several years. Being able to iterate and release an application by provisioning its dependencies through code is a big win. According to Gartner, “More than 75% of global organizations will be running containerized applications in production” by 2022.

For organizations that operate at a massive scale, a single Linux container instance isn’t enough to satisfy all of their applications’ needs. It’s not uncommon for sufficiently complex applications, such as ones that communicate through microservices, to require multiple Linux containers that communicate with each other. That architecture introduces a new scaling problem: how do you manage all those individual containers? Developers will still need to take care of scheduling the deployment of containers to specific machines, managing the networking between them, growing the resources allocated under heavy load, and much more.

Here Enters Kubernetes , Kubernetes is an open-source container orchestration platform that automates many of the manual processes involved in deploying, managing, and scaling containerized applications.

In other words, you can cluster together groups of hosts running Linux containers, and Kubernetes helps you easily and efficiently manage those clusters.

Kubernetes clusters can span hosts across on-premise, public, private, or hybrid clouds. For this reason, Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Kubernetes was originally developed and designed by engineers at Google. Google was one of the early contributors to Linux container technology and has talked publicly about how everything at Google runs in containers. (This is the technology behind Google’s cloud services.)

kubernetes share.jpg

In Kubernetes, several containers running the same application are grouped together. These containers act as replicas, and serve to load balance incoming requests. A container orchestrator, then, supervises these groups, ensuring that they are operating correctly.


A container orchestrator is essentially an administrator in charge of operating a fleet of containerized applications. If a container needs to be restarted or acquire more resources, the orchestrator takes care of it for you.

Let’s take a deeper look at all the specific components of Kubernetes that make this happen.


Container Image:-

A container image represents binary data that encapsulates an application and all its software dependencies (Think in terms of a blueprint). Container images are executable software bundles that can run standalone and that make very well-defined assumptions about their runtime environment.


Each container that you run is repeatable; the standardization from having dependencies included means that you get the same behavior wherever you run it. Containers decouple applications from underlying host infrastructure. This makes deployment easier in different cloud or OS environments. Containers represent a runtime instance of a container image.


A Kubernetes pod is a group of containers, and is the smallest unit that Kubernetes administers. Pods have a single IP address that is applied to every container within the pod. Containers in a pod share the same resources such as memory and storage. This allows the individual Linux containers inside a pod to be treated collectively as a single application, as if all the containerized processes were running together on the same host in more traditional workloads. It’s quite common to have a pod with only a single container, when the application or service is a single process that needs to run. But when things get more complicated, and multiple processes need to work together using the same shared data volumes for correct operation, multi-container pods ease deployment configuration compared to setting up shared resources between containers on your own.


Kubernetes deployments define the scale at which you want to run your application by letting you set the details of how you would like pods replicated on your Kubernetes nodes. Deployments describe the number of desired identical pod replicas to run and the preferred update strategy used when updating the deployment. Kubernetes will track pod health, and will remove or add pods as needed to bring your application deployment to the desired state.


The lifetime of an individual pod cannot be relied upon; everything from their IP addresses to their very existence are prone to change. Kubernetes doesn’t treat its pods as unique, long-running instances; if a pod encounters an issue and dies, it’s Kubernetes’ job to replace it so that the application doesn’t experience any downtime.

A service is an abstraction over the pods, and essentially, the only interface the various application consumers interact with. As pods are replaced, their internal names and IPs might change. A service exposes a single machine name or IP address mapped to pods whose underlying names and numbers are unreliable. A service ensures that, to the outside network, everything appears to be unchanged.

Nodes :-

A Kubernetes node manages and runs pods; it’s the machine (whether virtualized or physical) that performs the given work. Just as pods collect individual containers that operate together, a node collects entire pods that function together. When you’re operating at scale, you want to be able to hand work over to a node whose pods are free to take it.


A cluster is all of the above components put together as a single unit.

Kubernetes components:-

With a general idea of how Kubernetes is assembled, it’s time to take a look at the various software components that make sure everything runs smoothly. Both the control plane and individual worker nodes have three main components each.

Control plane:-

API Server:-

The API server exposes a REST interface to the Kubernetes cluster. All operations against pods, services, and so forth, are executed programmatically by communicating with the endpoints provided by it.


The scheduler is responsible for assigning work to the various nodes. It keeps watch over the resource capacity and ensures that a worker node’s performance is within an appropriate threshold.

Controller manager:-

The controller-manager is responsible for making sure that the shared state of the cluster is operating as expected. More accurately, the controller manager oversees various controllers which respond to events (e.g., if a node goes down).

Worker node components:-


A Kubelet tracks the state of a pod to ensure that all the containers are running. It provides a heartbeat message every few seconds to the control plane. If a replication controller does not receive that message, the node is marked as unhealthy.

Kube proxy:-

The Kube proxy routes traffic coming into a node from the service. It forwards requests for work to the correct containers.


etcd is a distributed key-value store that Kubernetes uses to share information about the overall state of a cluster. Additionally, nodes can refer to the global configuration data stored there to set themselves up whenever they are regenerated.

How Kubernetes is helping Pinterest:-


Pinterest has become a household name, with more than 200 million active monthly users and 100 billion objects saved. Underneath the hood, there are 1,000 microservices running and hundreds of thousands of data jobs.

With such growth came layers of infrastructure and diverse set-up tools and platforms for the different workloads, resulting in an inconsistent and complex end-to-end developer experience, and ultimately less velocity to get to production. So in 2016, the company launched a roadmap toward a new compute platform, led by the vision of having the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.

The first phase involved moving to Docker. "Pinterest has been heavily running on virtual machines, on EC2 instances directly, for the longest time," says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group. "To solve the problem around packaging software and not make engineers own portions of the fleet and those kinds of challenges, we standardized the packaging mechanism and then moved that to the container on top of the VM. Not many drastic changes. We didn't want to boil the ocean at that point."

In July 2017, after an eight-week evaluation period, the team chose Kubernetes over other orchestration platforms. "Kubernetes lacked certain things at the time—for example, we wanted Spark on Kubernetes," says Benedict. "But we realized that the dev cycles we would put in to even try building that is well worth the outcome, both for Pinterest as well as the community. We've been in those conversations in the Big Data SIG. We realized that by the time we get to productionizing many of those things, we'll be able to leverage what the community is doing."

At the beginning of 2018, the team began onboarding its first use case into the Kubernetes system: Jenkins workloads. They ramped up the cluster, and working with a team of four people, got the Jenkins Kubernetes cluster ready for production. By the end of Q1 2018, the team successfully migrated Jenkins Master to run natively on Kubernetes and also collaborated on the Jenkins Kubernetes Plugin to manage the lifecycle of workers.

"We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do." — MICHEAL BENEDICT, PRODUCT MANAGER FOR THE CLOUD AND THE DATA INFRASTRUCTURE GROUP AT PINTEREST

After years of being a cloud native pioneer, Pinterest is eager to share its ongoing journey. "We are in the position to run things at scale, in a public cloud environment, and test things out in way that a lot of people might not be able to do," says Benedict. "We're in a great position to contribute back some of those learnings."