Engineering • 9min read

Autoscaling workloads with KEDA and the Prometheus Scaler

Walking through the process of learning how to use a KEDA scaler for autoscaling Kubernetes workloads

Written by:

Kenny Vu

Published on:

Jun 9, 2025

Last updated on:

Jul 2, 2025

This blog is part of our KEDA series, we recommend reading the rest of the posts in the series:
Introducing Kubernetes Event-driven Autoscaling (KEDA)
Getting Started with Autoscaling in Kubernetes with KEDA
Autoscaling workloads with KEDA and the Prometheus Scaler
Autoscaling Jobs with KEDA's Redis List Scaler

KEDA’s Prometheus Scaler

In the list of KEDA’s built-in scalers we see that KEDA has a Prometheus Scaler, meaning KEDA has built-in support for using metrics/events from Prometheus as a source to trigger autoscaling operations. I want to create a proof of concept that will show we can have an autoscaling infrastructure with KEDA that can fetch metrics from Prometheus and automatically scale a workload.

In this blog post I will demonstrate using KEDA with Prometheus metrics to automatically scale a workload in a Kubernetes cluster. To do this, my proof of concept will scale a front-end workload based on the amount of traffic it is receiving.

To create this proof of concept, we are going to need:

A Kubernetes cluster using Kubernetes version v1.29 or higher
Prometheus
KEDA
An example workload that exposes Prometheus metrics
The exposed Prometheus metrics collected by Prometheus

Prerequisite Services: Installation

First, I use the kind (Kubernetes in Docker) tool to create a local Kubernetes cluster with the following command:
```
kind create cluster
```
To verify kind created a Kubernetes cluster compatible with KEDA, I run kubectl version to verify the version of the Kubernetes cluster.

Then, I add the following Helm repositories:

helm repo add kedacore https://kedacore.github.io/charts
helm repo add podinfo https://stefanprodan.github.io/podinfo
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm repo update

Now, I install the kube-prometheus-stack Helm chart with the Prometheus Operator configured to use ServiceMonitors in any namespace with the following command:

helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
--create-namespace --namespace obs-system \
--version 70.4.2 \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

Then, I install the KEDA Helm chart with the following commands:

helm install keda kedacore/keda \
--create-namespace --namespace keda \
--version 2.17.1

Finally, I install the podinfo Helm chart with it’s ServiceMonitor for the Prometheus Operator to find and use to configure Prometheus to scrape metrics from the podinfo app with the following command:
```
helm install podinfo podinfo/podinfo \
--create-namespace --namespace podinfo \
--version 6.8.0 \
--set serviceMonitor.enabled=true
```

ScaledObject or ScaledJob

Now that I have the all required services installed, I need to create the configurations that KEDA will use to automatically scale my target workload, the podinfo application.

I first need to determine if I want to use KEDA’s ScaledObject ¹ or ScaledJob ² since the podinfo Helm chart I deployed came with a Kubernetes Deployment configured with 1 pod replica by default, the ScaledObject seems to be the appropriate choice.

If the infrastructure of my Kubernetes application included a controller for running jobs or had the sole responsibility of processing queues, I would consider re-designing my application’s Kubernetes architecture by removing the controller and having KEDA serve as the controller for running the queue processing jobs³.

Creating a ScaledObject

Now, I want to create the manifest file for a Kubernetes custom resource I’m not familiar with, therefore to create a properly configured ScaledObject I will need to refer to the ScaledObject specification¹ and the Kubernetes documentation for configuring the scaling behaviour in HorizontalPodAutoscaler objects.

Defining the Target and Scaling Behaviour

After reviewing the documentation, I set my autoscaling target:

spec:
  scaleTargetRef:
    apiVersion:    apps/v1
    kind:          Deployment
    name:          podinfo

…and configure the podinfo application’s scaling behaviour:

spec:
  pollingInterval: 3
  cooldownPeriod:  30
  minReplicaCount: 1
  maxReplicaCount: 5
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 30
          policies:
          - type: Pods
            value: 1
            periodSeconds: 3
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Pods
            value: 1
            periodSeconds: 3

For the scaling strategy, I choose to define HPA configurations in the ScaledObject manifest, if I want to define a more advanced scaling strategy I would make use of KEDA’s scaling modifiers feature.

Defining the Scale Trigger(s)

Now, to complete the manifest file for my ScaledObject I need to configure how the scaling operations can be triggered with one or more scalers. To understand how I can configure Prometheus as a trigger, I go back to the Prometheus scaler documentation and see I need to provide the following information in my manifest file.

Address of the Prometheus server

Since my proof of concept has Prometheus and KEDA running in the same cluster, I can use Prometheus’s internal DNS record in the Kubernetes cluster which follows the format <serviceName>.<namespaceName>.svc.cluster.local.

Prometheus query

The Prometheus query to return the value that will be evaluated against a threshold every time a scaling decision needs to be made_. To determine which query I should use, I reviewed the metrics exposed by my application, ran a kubectl port-forward command to access the Prometheus URL using localhost and ran executed Prometheus queries until I found a query I was happy to use for scaling. In this proof of concept, I decided to have scaling decision determined by how many HTTP requests the podinfo application has received within a certain time period.

Thresholds

The threshold or activationThreshold the value the Prometheus query must be equal to or exceed in order for the number of pod replicas to scale up automatically. Since my podinfo installation starts off with 1 replica and I don’t plan on scaling my application to zero I don’t need to the set the activationThreshold, so I only set the value of the threshold.

After gathering the information I need, I add the parameters for my Prometheus trigger to complete the manifest file for ScaledObject:

spec:
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://kube-prom-stack-kube-prome-prometheus.obs-system.svc.cluster.local:9090
      query: sum(increase(http_requests_total{namespace="podinfo", container="podinfo", status="200"}[1m]))
      threshold: '30'

An example of the full manifest file can be found in LiveWyer’s Labs repository.

Deploying the ScaledObject

Now that I have a manifest file for my ScaledObject, I deploy it the same namespace as the configured scale target with the following command:

kubectl apply -f scaledobject.yaml -n podinfo

KEDA’s Generated HPA Object

After deploying my ScaledObject, I check to see if I configured it correctly by checking the status of the ScaledObject.

$ kubectl get scaledobject -n podinfo
NAME            SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   READY   ACTIVE   FALLBACK   PAUSED    TRIGGERS     AUTHENTICATIONS   AGE
scale-podinfo   apps/v1.Deployment   podinfo           1     5     True    True     False      Unknown   prometheus                     13s

Checking the logs of the KEDA operator pod, I can see it successfully detected the newly deployed ScaledObject…

2025-04-16T14:46:25Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"scale-podinfo","namespace":"podinfo"}, "namespace": "podinfo", "name": "scale-podinfo", "reconcileID": "c9d25bd1-5de3-43d3-920f-0fee1f3a3793"}
2025-04-16T14:46:25Z INFO Adding Finalizer for the ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"scale-podinfo","namespace":"podinfo"}, "namespace": "podinfo", "name": "scale-podinfo", "reconcileID": "c9d25bd1-5de3-43d3-920f-0fee1f3a3793"}
2025-04-16T14:46:25Z INFO KubeAPIWarningLogger metadata.finalizers: "finalizer.keda.sh": prefer a domain-qualified finalizer name including a path (/) to avoid accidental conflicts with other finalizer writers
2025-04-16T14:46:25Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"scale-podinfo","namespace":"podinfo"}, "namespace": "podinfo", "name": "scale-podinfo", "reconcileID": "c9d25bd1-5de3-43d3-920f-0fee1f3a3793", "resource": "apps/v1.Deployment", "name": "podinfo"}
2025-04-16T14:46:25Z INFO Creating a new HPA {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"scale-podinfo","namespace":"podinfo"}, "namespace": "podinfo", "name": "scale-podinfo", "reconcileID": "c9d25bd1-5de3-43d3-920f-0fee1f3a3793", "HPA.Namespace": "podinfo", "HPA.Name": "keda-hpa-scale-podinfo"}
2025-04-16T14:46:25Z INFO Initializing Scaling logic according to ScaledObject Specification {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"scale-podinfo","namespace":"podinfo"}, "namespace": "podinfo", "name": "scale-podinfo", "reconcileID": "c9d25bd1-5de3-43d3-920f-0fee1f3a3793"}

…and created a HPA object. If the ScaledObject is misconfigured then a HPA object may not be created by KEDA, for example if I misconfigured the target reference a HPA object would not have been created.

With the kubectl tree plugin⁴, I ran a command targeting the created ScaledObject to see it’s relationship with the created HPA object.

$ kubectl tree scaledobject scale-podinfo -n podinfo
NAMESPACE  NAME                                              READY  REASON             AGE
podinfo    ScaledObject/scale-podinfo                        True   ScaledObjectReady  28m
podinfo    └─HorizontalPodAutoscaler/keda-hpa-scale-podinfo  -                         28m

Just in case, I also check the status of the HPA object created by KEDA.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  ...
  name: keda-hpa-scale-podinfo
status:
  conditions:
  - lastTransitionTime: "2025-04-16T14:46:40Z"
    message: recommended size matches current size
    reason: ReadyForNewScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2025-04-16T14:46:40Z"
    message: 'the HPA was able to successfully calculate a replica count from external
      metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name:
      scale-podinfo,},MatchExpressions:[]LabelSelectorRequirement{},})'
    reason: ValidMetricFound
    status: "True"
    type: ScalingActive
  - lastTransitionTime: "2025-04-16T14:46:40Z"
    message: the desired count is within the acceptable range
    reason: DesiredWithinRange
    status: "False"
    type: ScalingLimited

The status message in the HPA object the HPA was able to successfully calculate a replica count from external metric indicates that the Kubernetes HPA controller was able to fetch the metrics from the KEDA Metrics Server.

Testing the Scale Triggers

Now that my application and autoscaling infrastructure are in place, I want to verify that my workload will be autoscaled as advertised based on the changing results returned by the Prometheus query I defined in my ScaledObject. To run the test with my setup:

I open a terminal and run a watch command:
```
kubectl get pods -n podinfo -w
```
I open another terminal and run the following command to allow me to access the podinfo application at localhost:8080:
```
kubectl -n podinfo port-forward deploy/podinfo 8080:9898
```
I open another terminal and run the following command that will continuously access my podinfo application every second
```
while :; do curl localhost:8080; sleep 1; done
```
While the looped command is running, I monitor my first terminal to see if the number the pods replicas increase and if it does increase, I observe how quickly it scales the number of replicas from the minReplicaCount to the maxReplicaCount that I declared in my ScaledObject.
After waiting a while, I check the logs of the KEDA operator and I don’t see any scale up events in the logs because KEDA has the Kubernetes HPA controller handle the autoscaling using the HPA object KEDA created.
After sometime I stop the curl loop and observe the scale down behaviour.

Now that I have a working ScaledObject I would personally want to move the ScaledObject into a Helm chart.

Wrapping Up

That wraps up the demo of the proof of concept I created to autoscale an example workload with Prometheus metrics via KEDA’S Prometheus metrics.

You can find the manifest files and more specific instructions to re-create my proof of concept demo in LiveWyer’s Lab repository.

If you already have experience writing Kubernetes manifests files and deploying them with kubectl apply, then the jump to writing a Kubernetes manifest file for KEDA custom resources should be relatively straightforward. The trickiest part would be learning the specification for KEDA’s custom resources but the specifications can be found in the KEDA website.

In regards to this proof of concept with Prometheus, in practice or production you should consider to having KEDA securely access the Prometheus server, for example, a Prometheus server may:

Be exposed with Kubernetes Ingress
Have authentication in front of it

In which case, you will have to make use of one of the methods KEDA provides to manage authentication flows.

In the next part of this blog series, I will be demonstrating how to have KEDA authenticate against a Redis server and scale jobs (starting from 0 jobs) based on the length of a Redis list.

Footnote

📖 KEDA | ScaledObject specification ↩︎ ↩︎
📖 KEDA | ScaledJob specification ↩︎
📖 KEDA| Scaling Jobs ↩︎
📖 Github | kubectl-tree ↩︎

This blog is part of our KEDA series, we recommend reading the rest of the posts in the series:
Introducing Kubernetes Event-driven Autoscaling (KEDA)
Getting Started with Autoscaling in Kubernetes with KEDA
Autoscaling workloads with KEDA and the Prometheus Scaler
Autoscaling Jobs with KEDA's Redis List Scaler

Autoscaling workloads with KEDA and the Prometheus Scaler

KEDA’s Prometheus Scaler

Prerequisite Services: Installation

ScaledObject or ScaledJob

Creating a ScaledObject

Defining the Target and Scaling Behaviour

Defining the Scale Trigger(s)

Address of the Prometheus server

Prometheus query

Thresholds

Deploying the ScaledObject

KEDA’s Generated HPA Object

Testing the Scale Triggers

Wrapping Up

Footnote

Table of contents

Related services

Augment

Consult

Related solutions

Infrastructure

Consult

Augment

Validate

Allocate

Educate

Application Delivery

Infrastructure

Observability

Security

Testing

Training

Case Studies

Whitepapers

Blog

Cloud Native News

About

Partnerships

Careers

Contact

Autoscaling workloads with KEDA and the Prometheus Scaler

KEDA’s Prometheus Scaler

Prerequisite Services: Installation

ScaledObject or ScaledJob

Creating a ScaledObject

Defining the Target and Scaling Behaviour

Defining the Scale Trigger(s)

Address of the Prometheus server

Prometheus query

Thresholds

Deploying the ScaledObject

KEDA’s Generated HPA Object

Testing the Scale Triggers

Wrapping Up

Footnote

Table of contents

Related services

Augment

Consult

Related solutions

Infrastructure