This put up is a part of our Scaling Kubernetes Collection. Register to look at reside or entry the recording, and take a look at our different posts on this collection:
When your cluster runs low on assets, the Cluster Autoscaler provisions a brand new node and provides it to the cluster. Should you’re already a Kubernetes consumer, you might need observed that creating and including a node to the cluster takes a number of minutes.
Throughout this time, your app can simply be overwhelmed with connections as a result of it can’t scale additional.

How will you repair the lengthy ready time?
Proactive scaling, or:
- understanding how the cluster autoscaler works and maximizing its usefulness;
- utilizing Kubernetes scheduler to assign pods to a node; and
- provisioning employee nodes proactively to keep away from poor scaling.
Should you favor to learn the code for this tutorial, you could find that on the LearnK8s GitHub.
How the Cluster Autoscaler Works in Kubernetes
The Cluster Autoscaler doesn’t take a look at reminiscence or CPU availability when it triggers the autoscaling. As an alternative, the Cluster Autoscaler reacts to occasions and checks for any unschedulable pods. A pod is unschedulable when the scheduler can’t discover a node that may accommodate it.
Let’s check this by making a cluster.
bash
$ linode-cli lke cluster-create
--label learnk8s
--region eu-west
--k8s_version 1.23
--node_pools.depend 1
--node_pools.sort g6-standard-2
--node_pools.autoscaler.enabled enabled
--node_pools.autoscaler.max 10
--node_pools.autoscaler.min 1
$ linode-cli lke kubeconfig-view "insert cluster id right here" --text | tail +2 | base64 -d > kubeconfig
It’s best to take note of the next particulars:
- every node has 4GB reminiscence and a couple of vCPU (i.e. `g6-standard-2`);
- there’s a single node within the cluster; and
- the cluster autoscaler is configured to develop from 1 to 10 nodes.
You possibly can confirm that the set up is profitable with:
bash
$ kubectl get pods -A --kubeconfig=kubeconfig
Exporting the kubeconfig file with an setting variable is normally extra handy.
You are able to do so with:
bash
$ export KUBECONFIG=${PWD}/kubeconfig
$ kubectl get pods
Wonderful!
Deploying an Software
Let’s deploy an software that requires 1GB of reminiscence and 250m* of CPU.Word: m = thousandth of a core, so 250m = 25% of the CPU
yaml
apiVersion: apps/v1
sort: Deployment
metadata:
title: podinfo
spec:
replicas: 1
selector:
matchLabels:
app: podinfo
template:
metadata:
labels:
app: podinfo
spec:
containers:
- title: podinfo
picture: stefanprodan/podinfo
ports:
- containerPort: 9898
assets:
requests:
reminiscence: 1G
cpu: 250m
You possibly can submit the useful resource to the cluster with:
bash
$ kubectl apply -f podinfo.yaml
As quickly as you do this, you would possibly discover just a few issues. First, three pods are virtually instantly operating, and one is pending.

After which:
- after a couple of minutes, the autoscaler creates an additional node; and
- the fourth pod is deployed within the new node.

Why is the fourth pod not deployed within the first node? Let’s dig into allocatable assets.
Allocatable Assets in Kubernetes Nodes
Pods deployed in your Kubernetes cluster eat reminiscence, CPU, and storage assets.
Nevertheless, on the identical node, the working system and the kubelet require reminiscence and CPU.
In a Kubernetes employee node, reminiscence and CPU are divided into:
- Assets wanted to run the working system and system daemons akin to SSH, systemd, and many others.
- Assets essential to run Kubernetes brokers such because the Kubelet, the container runtime, node drawback detector, and many others.
- Assets accessible to Pods.
- Assets reserved for the eviction threshold.

In case your cluster runs a DaemonSet akin to kube-proxy, it is best to additional cut back the accessible reminiscence and CPU.
So let’s decrease the necessities to guarantee that all pods can match right into a single node:
yaml
apiVersion: apps/v1
sort: Deployment
metadata:
title: podinfo
spec:
replicas: 4
selector:
matchLabels:
app: podinfo
template:
metadata:
labels:
app: podinfo
spec:
containers:
- title: podinfo
picture: stefanprodan/podinfo
ports:
- containerPort: 9898
assets:
requests:
reminiscence: 0.8G # <- decrease reminiscence
cpu: 200m # <- decrease CPU
You possibly can amend the deployment with:
bash
$ kubectl apply -f podinfo.yaml
Deciding on the correct amount of CPU and reminiscence to optimize your situations may very well be tough. The Learnk8s software calculator would possibly enable you do that extra rapidly.
You mounted one difficulty, however what concerning the time it takes to create a brand new node?
In the end, you should have greater than 4 replicas. Do you actually have to attend a couple of minutes earlier than the brand new pods are created?
The quick reply is sure.
Linode has to create a digital machine from scratch, provision it, and join it to the cluster. The method might simply take greater than two minutes.
However there’s another.
You might proactively create already provisioned nodes while you want them.
For instance: you can configure the autoscaler to at all times have one spare node. When the pods are deployed within the spare node, the autoscaler can proactively create extra. Sadly, the autoscaler doesn’t have this built-in performance, however you may simply recreate it.
You possibly can create a pod that has requests equal to the useful resource of the node:
yaml
apiVersion: apps/v1
sort: Deployment
metadata:
title: overprovisioning
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
containers:
- title: pause
picture: k8s.gcr.io/pause
assets:
requests:
cpu: 900m
reminiscence: 3.8G
You possibly can submit the useful resource to the cluster with:
bash
kubectl apply -f placeholder.yaml
This pod does completely nothing.

It simply retains the node absolutely occupied.
The subsequent step is to guarantee that the placeholder pod is evicted as quickly as there’s a workload that wants scaling.
For that, you need to use a Precedence Class.
yaml
apiVersion: scheduling.k8s.io/v1
sort: PriorityClass
metadata:
title: overprovisioning
worth: -1
globalDefault: false
description: "Precedence class utilized by overprovisioning."
---
apiVersion: apps/v1
sort: Deployment
metadata:
title: overprovisioning
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
priorityClassName: overprovisioning # <--
containers:
- title: pause
picture: k8s.gcr.io/pause
assets:
requests:
cpu: 900m
reminiscence: 3.8G
And resubmit it to the cluster with:
bash
kubectl apply -f placeholder.yaml
Now the setup is full.
You would possibly want to attend a bit for the autoscaler to create the node, however at this level , it is best to have two nodes:
- A node with 4 pods.
- One other with a placeholder pod.
What occurs while you scale the deployment to five replicas? Will you must watch for the autoscaler to create a brand new node?
Let’s check with:
bash
kubectl scale deployment/podinfo --replicas=5
It’s best to observe:
- The fifth pod is created instantly, and it’s within the Working state in lower than 10 seconds.
- The placeholder pod was evicted to create space for the pod.

After which:
- The cluster autoscaler observed the pending placeholder pod and provisioned a brand new node.
- The placeholder pod is deployed within the newly created node.

Why proactively create a single node when you can have extra?
You possibly can scale the placeholder pod to a number of replicas. Every reproduction will pre-provision a Kubernetes node prepared to simply accept normal workloads. Nevertheless, these nodes nonetheless depend towards your cloud invoice however sit idle and do nothing. So, try to be cautious and never create too a lot of them.
Combining the Cluster Autoscaler with the Horizontal Pod Autoscaler
To grasp this method’s implication, let’s mix the cluster autoscaler with the Horizontal Pod Autoscaler (HPA). The HPA is designed to extend the replicas in your deployments.
As your software receives extra site visitors, you can have the autoscaler regulate the variety of replicas to deal with extra requests.
When the pods exhaust all accessible assets, the cluster autoscaler will set off creating a brand new node in order that the HPA can proceed creating extra replicas.
Let’s check this by creating a brand new cluster:
bash
$ linode-cli lke cluster-create
--label learnk8s-hpa
--region eu-west
--k8s_version 1.23
--node_pools.depend 1
--node_pools.sort g6-standard-2
--node_pools.autoscaler.enabled enabled
--node_pools.autoscaler.max 10
--node_pools.autoscaler.min 3
$ linode-cli lke kubeconfig-view "insert cluster id right here" --text | tail +2 | base64 -d > kubeconfig-hpa
You possibly can confirm that the set up is profitable with:
bash
$ kubectl get pods -A --kubeconfig=kubeconfig-hpa
Exporting the kubeconfig file with an setting variable is extra handy.
You are able to do so with:
bash
$ export KUBECONFIG=${PWD}/kubeconfig-hpa
$ kubectl get pods
Wonderful!
Let’s use Helm to put in Prometheus and scrape metrics from the deployments.
You could find the directions on how one can set up Helm on their official web site.
bash
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm set up prometheus prometheus-community/prometheus
Kubernetes provides the HPA a controller to extend and reduce replicas dynamically.
Sadly, the HPA has just a few drawbacks:
- It doesn’t work out of the field. You want to set up a Metrics Server to combination and expose the metrics.
- You possibly can’t use PromQL queries out of the field.
Thankfully, you need to use KEDA, which extends the HPA controller with some additional options (together with studying metrics from Prometheus).
KEDA is an autoscaler manufactured from three parts:
- A Scaler
- A Metrics Adapter
- A Controller

You possibly can set up KEDA with Helm:
bash
$ helm repo add kedacore https://kedacore.github.io/charts
$ helm set up keda kedacore/keda
Now that Prometheus and KEDA are put in, let’s create a deployment.
For this experiment, you’ll use an app designed to deal with a hard and fast variety of requests per second.
Every pod can course of at most ten requests per second. If the pod receives the eleventh request, it can depart the request pending and course of it later.
yaml
apiVersion: apps/v1
sort: Deployment
metadata:
title: podinfo
spec:
replicas: 4
selector:
matchLabels:
app: podinfo
template:
metadata:
labels:
app: podinfo
annotations:
prometheus.io/scrape: "true"
spec:
containers:
- title: podinfo
picture: learnk8s/rate-limiter:1.0.0
imagePullPolicy: All the time
args: ["/app/index.js", "10"]
ports:
- containerPort: 8080
assets:
requests:
reminiscence: 0.9G
---
apiVersion: v1
sort: Service
metadata:
title: podinfo
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: podinfo
You possibly can submit the useful resource to the cluster with:
bash
$ kubectl apply -f rate-limiter.yaml
To generate some site visitors, you’ll use Locust.
The next YAML definition creates a distributed load testing cluster:
yaml
apiVersion: v1
sort: ConfigMap
metadata:
title: locust-script
knowledge:
locustfile.py: |-
from locust import HttpUser, process, between
class QuickstartUser(HttpUser):
@process
def hello_world(self):
self.shopper.get("/", headers={"Host": "instance.com"})
---
apiVersion: apps/v1
sort: Deployment
metadata:
title: locust
spec:
selector:
matchLabels:
app: locust-primary
template:
metadata:
labels:
app: locust-primary
spec:
containers:
- title: locust
picture: locustio/locust
args: ["--master"]
ports:
- containerPort: 5557
title: comm
- containerPort: 5558
title: comm-plus-1
- containerPort: 8089
title: web-ui
volumeMounts:
- mountPath: /residence/locust
title: locust-script
volumes:
- title: locust-script
configMap:
title: locust-script
---
apiVersion: v1
sort: Service
metadata:
title: locust
spec:
ports:
- port: 5557
title: communication
- port: 5558
title: communication-plus-1
- port: 80
targetPort: 8089
title: web-ui
selector:
app: locust-primary
sort: LoadBalancer
---
apiVersion: apps/v1
sort: DaemonSet
metadata:
title: locust
spec:
selector:
matchLabels:
app: locust-worker
template:
metadata:
labels:
app: locust-worker
spec:
containers:
- title: locust
picture: locustio/locust
args: ["--worker", "--master-host=locust"]
volumeMounts:
- mountPath: /residence/locust
title: locust-script
volumes:
- title: locust-script
configMap:
title: locust-script
You possibly can submit it to the cluster with:
bash
$ kubectl locust.yaml
Locust reads the next locustfile.py
, which is saved in a ConfigMap:
py
from locust import HttpUser, process, between
class QuickstartUser(HttpUser):
@process
def hello_world(self):
self.shopper.get("/")
The file doesn’t do something particular aside from making a request to a URL. To connect with the Locust dashboard, you want the IP deal with of its load balancer.
You possibly can retrieve it with the next command:
bash
$ kubectl get service locust -o jsonpath="{.standing.loadBalancer.ingress[0].ip}"
Open your browser and enter that IP deal with.
Wonderful!
There’s one piece lacking: the Horizontal Pod Autoscaler.
The KEDA autoscaler wraps the Horizontal Autoscaler with a selected object referred to as ScaledObject.
yaml
apiVersion: keda.sh/v1alpha1
sort: ScaledObject
metadata:
title: podinfo
spec:
scaleTargetRef:
sort: Deployment
title: podinfo
minReplicaCount: 1
maxReplicaCount: 30
cooldownPeriod: 30
pollingInterval: 1
triggers:
- sort: prometheus
metadata:
serverAddress: http://prometheus-server
metricName: connections_active_keda
question: |
sum(enhance(http_requests_total{app="podinfo"}[60s]))
threshold: "480" # 8rps * 60s
KEDA bridges the metrics collected by Prometheus and feeds them to Kubernetes.
Lastly, it creates a Horizontal Pod Autoscaler (HPA) with these metrics.
You possibly can manually examine the HPA with:
bash
$ kubectl get hpa
$ kubectl describe hpa keda-hpa-podinfo
You possibly can submit the item with:
bash
$ kubectl apply -f scaled-object.yaml
It’s time to check if the scaling works.
Within the Locust dashboard, launch an experiment with the next settings:

The variety of replicas is rising!
Wonderful! However did you discover?
After the deployment scales to eight pods, it has to attend a couple of minutes earlier than extra pods are created within the new node.
On this interval, the requests per second stagnate as a result of the present eight replicas can solely deal with ten requests every.
Let’s scale down and repeat the experiment:
bash
kubectl scale deployment/podinfo --replicas=4 # or watch for the autoscaler to take away pods
This time, let’s overprovision the node with the placeholder pod:
yaml
apiVersion: scheduling.k8s.io/v1
sort: PriorityClass
metadata:
title: overprovisioning
worth: -1
globalDefault: false
description: "Precedence class utilized by overprovisioning."
---
apiVersion: apps/v1
sort: Deployment
metadata:
title: overprovisioning
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
priorityClassName: overprovisioning
containers:
- title: pause
picture: k8s.gcr.io/pause
assets:
requests:
cpu: 900m
reminiscence: 3.9G
You possibly can submit it to the cluster with:
bash
kubectl apply -f placeholder.yaml
Open the Locust dashboard and repeat the experiment with the next settings:

This time, new nodes are created within the background and the requests per second enhance with out flattening. Nice job!
Let’s recap what you realized on this put up:
- the cluster autoscaler doesn’t observe CPU or reminiscence consumption. As an alternative, it screens pending pods;
- you may create a pod that makes use of the overall reminiscence and CPU accessible to provision a Kubernetes node proactively;
- Kubernetes nodes have reserved assets for kubelet, working system, and eviction threshold; and
- you may mix Prometheus with KEDA to scale your pod with a PromQL question.
Need to comply with together with our Scaling Kubernetes webinar collection? Register to get began, and be taught extra about utilizing KEDA to scale Kubernetes clusters to zero.