Friday, March 31, 2023
HomeOnline BusinessProactive Scaling for Kubernetes Clusters

Proactive Scaling for Kubernetes Clusters


This put up is a part of our Scaling Kubernetes Collection. Register to look at reside or entry the recording, and take a look at our different posts on this collection:

When your cluster runs low on assets, the Cluster Autoscaler provisions a brand new node and provides it to the cluster. Should you’re already a Kubernetes consumer, you might need observed that creating and including a node to the cluster takes a number of minutes.

Throughout this time, your app can simply be overwhelmed with connections as a result of it can’t scale additional.

Screenshot showing expected scaling based on requests per second (RPS) versus the actual scaling plateau that occurs while relying just on the Cluster Autoscaler.
It’d take a number of minutes to provision a digital machine. Throughout this time, you may not be capable of scale your apps.

How will you repair the lengthy ready time?

Proactive scaling, or: 

  • understanding how the cluster autoscaler works and maximizing its usefulness;
  • utilizing Kubernetes scheduler to assign pods to a node; and
  • provisioning employee nodes proactively to keep away from poor scaling.

Should you favor to learn the code for this tutorial, you could find that on the LearnK8s GitHub.

How the Cluster Autoscaler Works in Kubernetes

The Cluster Autoscaler doesn’t take a look at reminiscence or CPU availability when it triggers the autoscaling. As an alternative, the Cluster Autoscaler reacts to occasions and checks for any unschedulable pods. A pod is unschedulable when the scheduler can’t discover a node that may accommodate it.

Let’s check this by making a cluster.

bash
$ linode-cli lke cluster-create 
 --label learnk8s 
 --region eu-west 
 --k8s_version 1.23 
 --node_pools.depend 1 
 --node_pools.sort g6-standard-2 
 --node_pools.autoscaler.enabled enabled 
 --node_pools.autoscaler.max 10 
 --node_pools.autoscaler.min 1 
 
$ linode-cli lke kubeconfig-view "insert cluster id right here" --text | tail +2 | base64 -d > kubeconfig

It’s best to take note of the next particulars:

  • every node has 4GB reminiscence and a couple of vCPU (i.e. `g6-standard-2`);
  • there’s a single node within the cluster; and
  • the cluster autoscaler is configured to develop from 1 to 10 nodes.

You possibly can confirm that the set up is profitable with:

bash
$ kubectl get pods -A --kubeconfig=kubeconfig

Exporting the kubeconfig file with an setting variable is normally extra handy.

You are able to do so with:

bash
$ export KUBECONFIG=${PWD}/kubeconfig
$ kubectl get pods

Wonderful!

Deploying an Software
Let’s deploy an software that requires 1GB of reminiscence and 250m* of CPU.
Word: m = thousandth of a core, so 250m = 25% of the CPU

yaml
apiVersion: apps/v1
sort: Deployment
metadata:
 title: podinfo
spec:
 replicas: 1
 selector:
   matchLabels:
     app: podinfo
 template:
   metadata:
     labels:
       app: podinfo
   spec:
     containers:
       - title: podinfo
         picture: stefanprodan/podinfo
         ports:
           - containerPort: 9898
         assets:
           requests:
             reminiscence: 1G
             cpu: 250m

You possibly can submit the useful resource to the cluster with:

bash
$ kubectl apply -f podinfo.yaml

As quickly as you do this, you would possibly discover just a few issues. First, three pods are virtually instantly operating, and one is pending.

Diagram showing three pods active on one node, and a pending pod outside of that node.

After which:

  • after a couple of minutes, the autoscaler creates an additional node; and
  • the fourth pod is deployed within the new node.
Diagram showing three pods on one node, and the fourth pod deployed into a new node.
Ultimately, the fourth pod is deployed into a brand new node.

Why is the fourth pod not deployed within the first node? Let’s dig into allocatable assets.

Allocatable Assets in Kubernetes Nodes

Pods deployed in your Kubernetes cluster eat reminiscence, CPU, and storage assets.

Nevertheless, on the identical node, the working system and the kubelet require reminiscence and CPU.

In a Kubernetes employee node, reminiscence and CPU are divided into:

  1. Assets wanted to run the working system and system daemons akin to SSH, systemd, and many others.
  2. Assets essential to run Kubernetes brokers such because the Kubelet, the container runtime, node drawback detector, and many others.
  3. Assets accessible to Pods.
  4. Assets reserved for the eviction threshold.
Resources allocated and reserved in a Kubernetes node, consisting of 1. Eviction threshold; 2. Memory and CPU left to pods; 3. Memory and CPU reserved to the kubelet; 4. Memory and CPU reserved to the OS
Assets allotted and reserved in a Kubernetes node.

In case your cluster runs a DaemonSet akin to kube-proxy, it is best to additional cut back the accessible reminiscence and CPU.

So let’s decrease the necessities to guarantee that all pods can match right into a single node:

yaml
apiVersion: apps/v1
sort: Deployment
metadata:
 title: podinfo
spec:
 replicas: 4
 selector:
   matchLabels:
     app: podinfo
 template:
   metadata:
     labels:
       app: podinfo
   spec:
     containers:
       - title: podinfo
         picture: stefanprodan/podinfo
         ports:
           - containerPort: 9898
         assets:
           requests:
             reminiscence: 0.8G # <- decrease reminiscence
             cpu: 200m    # <- decrease CPU

You possibly can amend the deployment with:

bash
$ kubectl apply -f podinfo.yaml

Deciding on the correct amount of CPU and reminiscence to optimize your situations may very well be tough. The Learnk8s software calculator would possibly enable you do that extra rapidly.

You mounted one difficulty, however what concerning the time it takes to create a brand new node?

In the end, you should have greater than 4 replicas. Do you actually have to attend a couple of minutes earlier than the brand new pods are created?

The quick reply is sure.

Linode has to create a digital machine from scratch, provision it, and join it to the cluster. The method might simply take greater than two minutes.

However there’s another.

You might proactively create already provisioned nodes while you want them.

For instance: you can configure the autoscaler to at all times have one spare node. When the pods are deployed within the spare node, the autoscaler can proactively create extra. Sadly, the autoscaler doesn’t have this built-in performance, however you may simply recreate it.

You possibly can create a pod that has requests equal to the useful resource of the node:

yaml
apiVersion: apps/v1
sort: Deployment
metadata:
 title: overprovisioning
spec:
 replicas: 1
 selector:
   matchLabels:
     run: overprovisioning
 template:
   metadata:
     labels:
       run: overprovisioning
   spec:
     containers:
       - title: pause
         picture: k8s.gcr.io/pause
         assets:
           requests:
             cpu: 900m
             reminiscence: 3.8G

You possibly can submit the useful resource to the cluster with:

bash
kubectl apply -f placeholder.yaml

This pod does completely nothing.

Diagram showing how a placeholder pod is used to secure all the resources on the node.
A placeholder pod is used to safe all of the assets on the node.

It simply retains the node absolutely occupied.

The subsequent step is to guarantee that the placeholder pod is evicted as quickly as there’s a workload that wants scaling.

For that, you need to use a Precedence Class.

yaml
apiVersion: scheduling.k8s.io/v1
sort: PriorityClass
metadata:
 title: overprovisioning
worth: -1
globalDefault: false
description: "Precedence class utilized by overprovisioning."
---
apiVersion: apps/v1
sort: Deployment
metadata:
 title: overprovisioning
spec:
 replicas: 1
 selector:
   matchLabels:
     run: overprovisioning
 template:
   metadata:
     labels:
       run: overprovisioning
   spec:
     priorityClassName: overprovisioning # <--
     containers:
       - title: pause
         picture: k8s.gcr.io/pause
         assets:
           requests:
             cpu: 900m
             reminiscence: 3.8G

And resubmit it to the cluster with:

bash
kubectl apply -f placeholder.yaml

Now the setup is full.

You would possibly want to attend a bit for the autoscaler to create the node, however at this level , it is best to have two nodes:

  1. A node with 4 pods.
  2. One other with a placeholder pod.

What occurs while you scale the deployment to five replicas? Will you must watch for the autoscaler to create a brand new node?

Let’s check with:

bash
kubectl scale deployment/podinfo --replicas=5

It’s best to observe:

  1. The fifth pod is created instantly, and it’s within the Working state in lower than 10 seconds.
  2. The placeholder pod was evicted to create space for the pod.
Diagram showing how the placeholder pod is evicted to make space for regular pods.
The placeholder pod is evicted to create space for normal pods.

After which:

  1. The cluster autoscaler observed the pending placeholder pod and provisioned a brand new node.
  2. The placeholder pod is deployed within the newly created node.
Diagram showing how the pending pod triggers the cluster autoscaler that creates a new node.
The pending pod triggers the cluster autoscaler that creates a brand new node.

Why proactively create a single node when you can have extra?

You possibly can scale the placeholder pod to a number of replicas. Every reproduction will pre-provision a Kubernetes node prepared to simply accept normal workloads. Nevertheless, these nodes nonetheless depend towards your cloud invoice however sit idle and do nothing. So, try to be cautious and never create too a lot of them.

Combining the Cluster Autoscaler with the Horizontal Pod Autoscaler

To grasp this method’s implication, let’s mix the cluster autoscaler with the Horizontal Pod Autoscaler (HPA). The HPA is designed to extend the replicas in your deployments.

As your software receives extra site visitors, you can have the autoscaler regulate the variety of replicas to deal with extra requests.

When the pods exhaust all accessible assets, the cluster autoscaler will set off creating a brand new node in order that the HPA can proceed creating extra replicas.

Let’s check this by creating a brand new cluster:

bash
$ linode-cli lke cluster-create 
 --label learnk8s-hpa 
 --region eu-west 
 --k8s_version 1.23 
 --node_pools.depend 1 
 --node_pools.sort g6-standard-2 
 --node_pools.autoscaler.enabled enabled 
 --node_pools.autoscaler.max 10 
 --node_pools.autoscaler.min 3 
 
$ linode-cli lke kubeconfig-view "insert cluster id right here" --text | tail +2 | base64 -d > kubeconfig-hpa

You possibly can confirm that the set up is profitable with:

bash
$ kubectl get pods -A --kubeconfig=kubeconfig-hpa

Exporting the kubeconfig file with an setting variable is extra handy.

You are able to do so with:

bash
$ export KUBECONFIG=${PWD}/kubeconfig-hpa
$ kubectl get pods

Wonderful!

Let’s use Helm to put in Prometheus and scrape metrics from the deployments.
You could find the directions on how one can set up Helm on their official web site.

bash
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm set up prometheus prometheus-community/prometheus

Kubernetes provides the HPA a controller to extend and reduce replicas dynamically.

Sadly, the HPA has just a few drawbacks:

  1. It doesn’t work out of the field. You want to set up a Metrics Server to combination and expose the metrics.
  2. You possibly can’t use PromQL queries out of the field.

Thankfully, you need to use KEDA, which extends the HPA controller with some additional options (together with studying metrics from Prometheus).

KEDA is an autoscaler manufactured from three parts:

  • A Scaler
  • A Metrics Adapter
  • A Controller
Diagram showing KEDA architecture
KEDA structure.

You possibly can set up KEDA with Helm:

bash
$ helm repo add kedacore https://kedacore.github.io/charts
$ helm set up keda kedacore/keda

Now that Prometheus and KEDA are put in, let’s create a deployment.

For this experiment, you’ll use an app designed to deal with a hard and fast variety of requests per second. 

Every pod can course of at most ten requests per second. If the pod receives the eleventh request, it can depart the request pending and course of it later.

yaml
apiVersion: apps/v1
sort: Deployment
metadata:
 title: podinfo
spec:
 replicas: 4
 selector:
   matchLabels:
     app: podinfo
 template:
   metadata:
     labels:
       app: podinfo
     annotations:
       prometheus.io/scrape: "true"
   spec:
     containers:
       - title: podinfo
         picture: learnk8s/rate-limiter:1.0.0
         imagePullPolicy: All the time
         args: ["/app/index.js", "10"]
         ports:
           - containerPort: 8080
         assets:
           requests:
             reminiscence: 0.9G
---
apiVersion: v1
sort: Service
metadata:
 title: podinfo
spec:
 ports:
   - port: 80
     targetPort: 8080
 selector:
   app: podinfo

You possibly can submit the useful resource to the cluster with:

bash
$ kubectl apply -f rate-limiter.yaml

To generate some site visitors, you’ll use Locust.

The next YAML definition creates a distributed load testing cluster:

yaml
apiVersion: v1
sort: ConfigMap
metadata:
 title: locust-script
knowledge:
 locustfile.py: |-
   from locust import HttpUser, process, between
 
   class QuickstartUser(HttpUser):
       @process
       def hello_world(self):
           self.shopper.get("/", headers={"Host": "instance.com"})
---
apiVersion: apps/v1
sort: Deployment
metadata:
 title: locust
spec:
 selector:
   matchLabels:
     app: locust-primary
 template:
   metadata:
     labels:
       app: locust-primary
   spec:
     containers:
       - title: locust
         picture: locustio/locust
         args: ["--master"]
         ports:
           - containerPort: 5557
             title: comm
           - containerPort: 5558
             title: comm-plus-1
           - containerPort: 8089
             title: web-ui
         volumeMounts:
           - mountPath: /residence/locust
             title: locust-script
     volumes:
       - title: locust-script
         configMap:
           title: locust-script
---
apiVersion: v1
sort: Service
metadata:
 title: locust
spec:
 ports:
   - port: 5557
     title: communication
   - port: 5558
     title: communication-plus-1
   - port: 80
     targetPort: 8089
     title: web-ui
 selector:
   app: locust-primary
 sort: LoadBalancer
---
apiVersion: apps/v1
sort: DaemonSet
metadata:
 title: locust
spec:
 selector:
   matchLabels:
     app: locust-worker
 template:
   metadata:
     labels:
       app: locust-worker
   spec:
     containers:
       - title: locust
         picture: locustio/locust
         args: ["--worker", "--master-host=locust"]
         volumeMounts:
           - mountPath: /residence/locust
             title: locust-script
     volumes:
       - title: locust-script
         configMap:
           title: locust-script

You possibly can submit it to the cluster with:

bash
$ kubectl locust.yaml

Locust reads the next locustfile.py, which is saved in a ConfigMap:

py
from locust import HttpUser, process, between
 
class QuickstartUser(HttpUser):
 
   @process
   def hello_world(self):
       self.shopper.get("/")

The file doesn’t do something particular aside from making a request to a URL. To connect with the Locust dashboard, you want the IP deal with of its load balancer.

You possibly can retrieve it with the next command:

bash
$ kubectl get service locust -o jsonpath="{.standing.loadBalancer.ingress[0].ip}"

Open your browser and enter that IP deal with.

Wonderful!

There’s one piece lacking: the Horizontal Pod Autoscaler.
The KEDA autoscaler wraps the Horizontal Autoscaler with a selected object referred to as ScaledObject.

yaml
apiVersion: keda.sh/v1alpha1
sort: ScaledObject
metadata:
title: podinfo
spec:
scaleTargetRef:
  sort: Deployment
  title: podinfo
minReplicaCount: 1
maxReplicaCount: 30
cooldownPeriod: 30
pollingInterval: 1
triggers:
- sort: prometheus
  metadata:
    serverAddress: http://prometheus-server
    metricName: connections_active_keda
    question: |
      sum(enhance(http_requests_total{app="podinfo"}[60s]))
    threshold: "480" # 8rps * 60s

KEDA bridges the metrics collected by Prometheus and feeds them to Kubernetes.

Lastly, it creates a Horizontal Pod Autoscaler (HPA) with these metrics.

You possibly can manually examine the HPA with:

bash
$ kubectl get hpa
$ kubectl describe hpa keda-hpa-podinfo

You possibly can submit the item with:

bash
$ kubectl apply -f scaled-object.yaml

It’s time to check if the scaling works.

Within the Locust dashboard, launch an experiment with the next settings:

Gif of screen recording that demonstrates scaling with pending pods using autoscaler.
Combining the cluster and horizontal pod autoscaler.

The variety of replicas is rising!

Wonderful! However did you discover?

After the deployment scales to eight pods, it has to attend a couple of minutes earlier than extra pods are created within the new node.

On this interval, the requests per second stagnate as a result of the present eight replicas can solely deal with ten requests every.

Let’s scale down and repeat the experiment:

bash
kubectl scale deployment/podinfo --replicas=4 # or watch for the autoscaler to take away pods

This time, let’s overprovision the node with the placeholder pod:

yaml
apiVersion: scheduling.k8s.io/v1
sort: PriorityClass
metadata:
 title: overprovisioning
worth: -1
globalDefault: false
description: "Precedence class utilized by overprovisioning."
---
apiVersion: apps/v1
sort: Deployment
metadata:
 title: overprovisioning
spec:
 replicas: 1
 selector:
   matchLabels:
     run: overprovisioning
 template:
   metadata:
     labels:
       run: overprovisioning
   spec:
     priorityClassName: overprovisioning
     containers:
       - title: pause
         picture: k8s.gcr.io/pause
         assets:
           requests:
             cpu: 900m
             reminiscence: 3.9G

You possibly can submit it to the cluster with:

bash
kubectl apply -f placeholder.yaml

Open the Locust dashboard and repeat the experiment with the next settings:

Gif
Combining the cluster and horizontal pod autoscaler with overprovisioning.

This time, new nodes are created within the background and the requests per second enhance with out flattening. Nice job!

Let’s recap what you realized on this put up:

  • the cluster autoscaler doesn’t observe CPU or reminiscence consumption. As an alternative, it screens pending pods;
  • you may create a pod that makes use of the overall reminiscence and CPU accessible to provision a Kubernetes node proactively;
  • Kubernetes nodes have reserved assets for kubelet, working system, and eviction threshold; and
  • you may mix Prometheus with KEDA to scale your pod with a PromQL question.

Need to comply with together with our Scaling Kubernetes webinar collection? Register to get began, and be taught extra about utilizing KEDA to scale Kubernetes clusters to zero.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments