- Kubernetes Cluster
- Application Deployed into the Cluster (See Part 2)
The first time we deploy workloads to a Kubernetes cluster, we can't be certain about the resources required and how the requirements may change depending on user traffic, external dependencies, etc.
Horizontal Pod Autoscaling (HPA) helps to ensure that workloads function consistently in different situations. It's also a way to manage costs and ensure that resources aren't wasted during low traffic.
The HPA automatically scales pods up and down based on some of the metrics mentioned below:
- Actual Resource Usage: When a Pod's CPU and/or memory usage exceeds a specified boundary, it can be expressed in raw value or percentage.
- Custom Metrics: It can also scale based on metrics reported by a Kubernetes Object in the cluster, like request rate and disk I/O writes/s. (Helpful if the app is vulnerable to network/disk bottlenecks)
- External Metrics: Based on the metrics from an app or service external to the cluster.
How Does It Work?
A Horizontal Pod Autoscaler checks a given pod's CPU/MEMORY usage from time to time and compares it against the target threshold. If the values are found to have crossed the threshold, then the replicas are increased or decreased accordingly.
It accomplishes this using data from the metrics server. It's a monitoring service provided by the maintainers of Kubernetes. The metrics server keeps track of the resource usage of each pod in the cluster, stores the data in etcd, and provides data to the HPA when requested via the metrics server API.
Installing Metrics Server
The metrics server can be installed using the YAML(s) provided below. You can save and deploy them using the following command:
kubectl apply -f <File-Name>.yaml
The ServiceAccount will be required for all operations to be performed in the cluster.
The ClusterRole will provide the necessary permissions.
The Role Binding attaches the Service Account with the ClusterRole we created.
ClusterRoleBinding is also the same as Role Binding, but the main difference is that it is Cluster wide rather than namespace bound, which is the case for RoleBinding.
The following contains YAML Manifest for the Metrics Server Deployment.
APIService extends the Kubernetes API for compatibility with Metrics Server. This is the main point of communication for metrics-related requests like HPA, VPA, Viewing Metrics, etc.
Once all the above are installed, we can start using the metrics collected by Metrics Server. A maximum of 5 minutes is required for the metrics server to be initiated and function properly after installation.
The Metrics Server will monitor the pods and nodes and update their metrics every 15 seconds. Its output for both pods and nodes can be viewed using the following commands:
kubectl top node
kubectl top po
Deploying the Auto Scaler
The Horizontal Pod Autoscaler (HPA) takes input from data provided by the Metrics Server to scale applications up and down according to the thresholds mentioned in the YAML Manifest.
Save and deploy the HPA YAML provided below.
This is the YAML Definition for HPA. It works in the following ways:
scaleTargetRef: Details for the deployment to be scaled
Min Replica: Minimum number of pods to be kept
Max Replica: Maximum number of pods to be deployed
targetCPUUtilizationPercentage: Specifies the Load Threshold when load value exceeds the threshold, new replicas are added and vice-versa.
We can check for deployed HPA's by using the following command:
kubectl get hpa
Simulating User Traffic
To simulate user traffic, we can use different tools depending on the technology used by the application. Apache AB can be used for HTTP/HTTPS-based Web applications, and Artillery can be used for socket.io-based Web applications.
Detailed Guide on using these tools are linked below:
apt-get install apache2-utils
We'll use Apache AB to simulate traffic:
ab -n 1000 -c 100 https://localhost:3000/
This command sends 1000 requests at the rate of 100/s to the endpoint specified.
Let's check the status of the HPA.
Now let's check how many new replicas have been added.
After Apache AB finishes the test, we can check the status of the HPA.
According to the HPA status, the applications should have scaled down now.
So in this way, we can scale our applications up and down according to user load.
Another component, the Vertical Pod Autoscaler (VPA), works by increasing the POD CPU/RAM request and limits. This is very different from HPA, which actually provisions more pods to handle the requests rather than increasing the amount of CPU/RAM provided to the pods.
Application Not Scaling
- The application you've deployed may not be receiving the requests being sent due to a misconfigured service.
- The HPA threshold may have been set too high and may take some time to start scaling.
- HPA doesn't work instantly. It takes a few seconds before starting the scale up/down.
Apache AB Load Test Error
You may face an error regarding an invalid URL. Check the URL and Port (Copy Paste from browser works best).
/in the end. Otherwise, it will not recognize it.
We've now installed Metrics Server in the Cluster, deployed Horizontal Pod Autoscaler and successfully load-tested the application using Apache AB.
By the end of this three-part tutorial, you should be able to dockerize a web application, deploy it into Kubernetes, perform load testing and implement autoscaling mechanisms.
It's recommended to test the limits of a single pod before deploying an HPA so that you know the threshold upon which the application performance is bottlenecked due to the lack of resources.
Doing so makes the scaling resource-efficient and cost-effective.
This much for this series on Dockerizing and Deploying to Kubernetes. Stay tuned for more! Subscribe and leave a comment below.