This is the fourth blog in our blog series where we dive deep into the CNCF Sandbox. Our last blog entry in the series showcased using Ambassadors’s Telepresence v2 to:

  • enable a team-based local development workflow
  • locally develop applications against a remote Kubernetes cluster
  • debug applications within a Kubernetes cluster

This time, Telepresence will be making a sneaky guest appearance as we explore and demonstrate using Kubernetes Event-driven Autoscaling (KEDA) to autoscale workloads (e.g. applications) within a Kubernetes cluster.

With KEDA we’ll be able to easily trigger the automatic scaling of a workload (up or down) using events/data from various vendors, databases, messaging, systems, CI/CD and more. Examples include; RabbitMQ, Postgresql, MongoDB, AWS SQS Queue, Azure Storage Queue, etc but for this blog, I’ve decided to go with Prometheus and Redis due to the simplicity for setting up the demos.

The list of event sources can be found here.

HPA and KEDA

Kubernetes does offer a built-in solution for autoscaling in the form of the Horizontal Pod Autoscaler (HPA). However, it lacks certain features and has several limitations. KEDA extends the functionality of the HPA; enhancing it with additional functionality and resolving the HPA limitations. As you use KEDA you may notice a HPA object in your Kubernetes cluster, created by KEDA.

Behind a KEDA installation is KEDA fetching data from an event source and sending the data to Kubernetes and the HPA. Once HPA has the data from KEDA it’ll autoscale the target workload.

  • You may read about these details in this CNCF Blog by members of the Alibaba Cloud team.
  • You can find more information around the HPA here.

KEDA Use Cases: External/Custom Metrics & Scaling Jobs

The typical use case for autoscaling would be to scale up if an application has received a sudden spike in web traffic and to scale down when the amount of web traffic is low enough to save costs and resources. CPU and memory metrics are the typical indicators used to determine traffic levels.

However, there are cases where something else, other than the amount of web traffic, is affecting the performance of an application. In which case, you’ll want to use external or custom metrics (e.g. from Prometheus).

An example is an application in charge of processing items in a list or queue, where the performance would be based on how quickly each item can be processed and how quickly the list/queue can be emptied. Unfortunately, CPU and memory metrics aren’t the best indicators that’ll help you prevent a list/queue from getting too large. Instead, KEDA can be used to create (i.e. scale up) a new Kubernetes Job each time a new item is added to the list/queue (i.e. an event is triggered).

KEDA CRDs: ScaledObjects vs ScaledJobs

KEDA comes with two CRDs called ScaledObjects and ScaledJobs.

  • The ScaledObject is used for scaling a Kubernetes Deployment, StatefulSet or custom resource.
  • The ScaledJob is used to run and scale Kubernetes Jobs.

Note: The Underlying HPA Object

One noticeable difference between the two is that deploying a ScaledObject will also result in an HPA object being created to handle the autoscaling of the workload. However, deploying a ScaledJob object will NOT result in an HPA object being created, instead it has the Jobs specification defined inside it that is used to create a new Kubernetes Job each time the defined event is triggered.

Note: HPA’s Custom and External Metrics Limitation

The HPA does support using external and custom metrics for autoscaling, so you don’t necessarily have to use KEDA if you want external and custom metrics. However, a few requirements must be met to enable the HPA to do so. These requirements include:

  • enabling the API aggregation layer
  • registering the custom.metrics.k8s.io and external.metrics.k8s.io APIs
  • Unsetting --horizontal-pod-autoscaler-use-rest-clients or setting it to true

You’ll typically want your cluster administrator(s) to set those up. You can find more details about these requirements here. We also wrote a blog about how you can use the HPA with Prometheus. I’d recommend reading it and comparing it how it can be done with KEDA.

A problem you’ll find with this approach is despite being able to choose from a variety of metric adapters to fetch external and custom metrics from only one metric server can be run inside a cluster meaning you’ll only be able to use one metric adapter.

Fortunately, this is one of the limitations solved by KEDA.

Note: Scaling Custom Resources

If using the ScaledObject to autoscale a custom resource, the object’s Custom Resource Definition (CRD) must define the /scale subresource otherwise KEDA will not be able to scale the custom resource. You can confirm if the CRD defines the /scale subresource by running kubectl get crd CRD_NAME -o yaml > CRD_SPEC.yaml and checking if .spec.versions[X].subresources.scale exists:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
...
spec
  ...
  versions
    - name: v1
      ...
      subresources:
        scale:
          ...

An example where you might choose to scale a custom resource is if you’re using a service that is using the Kubernetes Operator Pattern. This pattern often involves monitoring a custom resource that declares how many replicas the Operator should create and manage. The flow is usually as follows:

  1. Deploy the Operator (e.g. in the form of a Deployment).
  2. Deploy the custom resource object with the number of replicas declared.
  3. The Operator detects the custom resource and examines its contents.
  4. The Operator creates the workloads based on the specification declared in the custom resource object, including the number of replicas.

In this situation, if you just autoscaled the workload (a Deployment or Statefulset) created by the Operator, the Operator would just scale the workload back to the number of replicas still declared in the monitoring custom resource. There’ll probably be a back and forth with the underlying HPA attempting to scale the Deployment up and the Operator scaling it back down. This is why you want the number of replicas in the custom resource to be autoscaled.

However, as noted above the CRD of the custom resource monitored by the operator must define the /scale subresource.

You can find more information:

  • around the /scale subresource here.
  • about the Operator Pattern here.

Note: Scaling to or from Zero

This is mainly for ScaledObjects. If you want to set the starting or minimum replica count to 0, you need to enable the HPAScaleToZero feature gate. If this feature gate is not enabled and you set the minimum replica count in the ScaledObject to 0, KEDA will create an HPA object with a minimum replica of 1.

Note, at the time of writing, the HPAScaleToZero has been in alpha since Kubernetes version 1.16.

A possible alternative to enabling the HPAScaleToZero feature gate is to use a ScaledJob which starts from 0 (i.e. no Jobs) and always resets back to 0 once all Kubernetes Jobs are finished.

You can find more information around feature gates and how to enable them here.

Demos

We will demonstrate the following three things.

  1. A Kubernetes Deployment being autoscaled based on Prometheus metrics.
  2. A Kubernetes Deployment being autoscaled after a Redis list reaches a certain length.
  3. A Kubernetes Job being created when an item is added to a Redis list.

The first two will involve using KEDA’s ScaledObject and the last one will using KEDA’s ScaledJob.

Before going into the actual demos, I’d like to provide some details about how they were setup so that you can reproduce them locally and follow the basic flow each one will go through.

First, I have installed KEDA using the official helm chart onto a namespace called keda-demo within the Kubernetes cluster. You can find the installation instructions in the official documentation here. At the time of writing, version 2.3 was used.

Second, each demo will go through the following general flow:

  1. The event sources and target workloads will be deployed.
  2. A KEDA CRD object is deployed which contains the autoscaling configurations, including which events will trigger the target to be autoscaled.
  3. The event will be manually triggered.
  4. The autoscaling of the target workload will be observed.

Resources used in these demos are found and documented in this repository.

Demo #1 - KEDA ScaledObjects: Autoscaling with Prometheus Metrics

This demo will showcase a basic example of how someone can setup an application to be autoscaled based on metrics collected by Prometheus.

Prerequisites

For this demo to work, you’ll need:

  1. a Prometheus server
  2. an application that can export metrics that can be scraped by Prometheus.
  3. an application that’ll be the target for autoscaling.

I’d recommend the following:

  1. Install the Prometheus Operator into your Kubernetes cluster and have it configured to look for ServiceMonitors in the keda-demo namespace. You can use the community helm chart to install it.
  2. For the application that exports Prometheus metrics, I’ve chosen to use the open-source PodInfo application. I installed it into the keda-demo namespace using the helm chart found which you can find here. I’ve made sure to deploy it with the ServiceMonitor enabled with helm install podinfo --namespace keda-demo podinfo/podinfo --version 5.2.1 --set serviceMonitor.enabled=true --set serviceMonitor.interval=15s
  3. Any Deployment object with 1 replica.

Showcase

With the above setup, I can produce the demo below:

In the demo, I created a ScaledObject object that:

  1. Has a target workload to autoscale.

    # prom-scaledobject.yaml
    spec:
      scaleTargetRef:
        apiVersion:    apps/v1
        kind:          Deployment
        name:          target-workload
    
  2. Has a Prometheus server to monitor with a PromQL query and a threshold value that’ll determine what the value of the query has to be before the target workload is scaled up.

    # prom-scaledobject.yaml
    spec:
      ...
      triggers:
      - type: prometheus
        metadata:
          serverAddress: http://<prometheus-host>:9090
          metricName: promhttp_metric_handler_200_requests_total
          query: increase(promhttp_metric_handler_requests_total{namespace="keda-demo", code="200"}[30s])
          threshold: '3'
    

    More details about the Prometheus trigger can be found here.

  3. Configured the underlying HPA object to allow only one pod to be created or deleted within a 3 second period.

    # prom-scaledobject.yaml
    spec:
      ...
      advanced:
        horizontalPodAutoscalerConfig:
          behavior:
            scaleDown:
              stabilizationWindowSeconds: 30
              policies:
              - type: Pods
                value: 1
                periodSeconds: 3
            scaleUp:
              stabilizationWindowSeconds: 0
              policies:
              - type: Pods
                value: 1
                periodSeconds: 3
    

You can find the manifest file used in the demo here.

With the scaling infrastructure for the target workload setup and deployed. I repeatedly ran the curl command against the PodInfo service to increase the value outputted by the Prometheus query, thus triggering the target workload to be scaled up.

If the value outputted by the Prometheus query dropped below the threshold, the target workload would eventually be scaled back down.

Note, since I deployed PodInfo with no ingress I used telepresence to connect my local workstation to the Kubernetes cluster allowing me to curl the PodInfo service at <SERVICE_NAME>.<NAMESPACE>:<PORT_NUMBER>. If you don’t want to use telepresence, the alternative is to exec into a pod that can curl the PodInfo service or to deploy an ingress for the application.

Demo #2 - KEDA ScaledObjects: Autoscaling with a Redis List

This demo will showcase a basic example of how someone can setup an application to be autoscaled based on the length of a Redis list.

Prerequisites

For this demo to work, you’ll need to deploy:

  • A Kubernetes Deployment or Pod object with a redis container.
  • A Kubernetes Service that is configured as a load balancer for the above redis container.
  • Any Deployment object with 1 replica.

Showcase

With the above setup, I can produce the demo below:

I created a ScaledObject object that was almost identical to the one used for the Prometheus demo above but with a different trigger (note, it is possible to include multiple triggers in the same ScaledObject manifest).

  # redis-scaledobject.yaml
  triggers:
  - type: redis
    metadata:
      address: redis.keda-demo.svc.cluster.local:6379
      listName: mylist
      listLength: "3"

You can find the manifest file used in the demo here.

You’ll see the target workload being scaled up after I added enough items to the Redis list to increase the length of the length so that it was greater than the value set to listLength. After removing items in the list to drop the list length below the set listlength the target workload will eventually be scaled down.

Demo #3 - KEDA ScaledJobs: Autoscaling Jobs with a Redis List

This demo will showcase a basic example of how someone can setup a job to be autoscaled based when a Redis list is being populated.

Prerequisites

In order for this demo to work, you’ll need to deploy:

  • A Kubernetes Deployment or Pod object with a redis container.
  • A Kubernetes Service that is configured as a load balancer for the above redis container.

Showcase

With the above setup, I can produce the demo below:

This is the trigger being used for the ScaledJob:

  triggers:
  - type: redis
    metadata:
      address: redis.keda-demo.svc.cluster.local:6379
      listName: myotherlist
      listLength: "1"

As shown in the demo, multiple Kubernetes Jobs are being repeatedly created when an item is added to the list. Once the item is removed, no more Jobs created. In practice, each job would be consuming this item, removing it from the list and creating the effect of one Job handling one item. Two example use cases are messaging queues and running long running executions in parallel as items/data come in.

The specifications for all jobs being created are declared within the ScaledJob object.

spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    activeDeadlineSeconds: 30
    backoffLimit: 6
    template:
      spec:
        containers:
          - image: alpine:3.13.5
            name: alpine
            command: ['echo', 'hello world']
        restartPolicy: Never

You can find the manifest file used in the demo here. That’s all for this blog on KEDA. The KEDA website says it is “Application autoscaling made simple” and I agree with it. With the event sources already in place the only thing I had to do to enable autoscaling is to deploy KEDA and then a single KEDA custom resource object.

I’m looking to forward to using KEDA in the future and integrating it’s autoscaling functionality with our internal services.

Need help running Kubernetes?

Get in touch and see how we can help you.

Contact Us