Engineering • 7min read

Leveraging Kubernetes HPA and Prometheus Adapter

Learn how to scale Kubernetes deployments using Prometheus metrics with the Horizontal Pod Autoscaler (HPA)

Written by:

Louise Champ

Published on:

May 15, 2025

Last updated on:

May 16, 2025

This blog is part of our Horizontal Pod Autoscaling series, we recommend reading the rest of the posts in the series:
Introduction to Horizontal Pod Autoscaling in Kubernetes
How to use Custom & External Metrics for Kubernetes HPA
Set up Kubernetes scaling via Prometheus & Custom Metrics
Leveraging Kubernetes HPA and Prometheus Adapter

Go beyond CPU and Memory: Autoscale Your Workloads with Meaningful, Application-Level Metrics

Kubernetes Horizontal Pod Autoscaler (HPA) is one of those quietly brilliant features, often underused, but essential once you’ve experienced its full potential. At a glance, it seems simple: it automatically adjusts the number of pods in a deployment based on resource usage. Most commonly, it works with CPU or memory.

But heres the thing: CPU and memory rarely tell the full story, especially in modern, cloud-native applications. If your backend processes jobs from a queue, or you handle varying workloads like video uploads or payment processing, those resource metrics can be misleading, or worse, dangerously slow to react.

In this post, we will explore how to scale more intelligently using custom and external metrics via Prometheus and Prometheus Adapter. You’ll learn how to scale based on signals that actually matter to your users, such as queue depth, request rate, or even metrics from third-party services like AWS SQS.

A Brief Refresher: Custom vs External Metrics

Before we dive into implementation, lets set the scene. Kubernetes HPA supports three types of metrics:

Resource metrics: such as CPU and memory usage. These are the default and come from the Metrics Server.
Custom metrics: internal metrics from your application, exposed via Prometheus and surfaced to the HPA through the Custom Metrics API.
External metrics: signals from outside your Kubernetes cluster, such as cloud queues or payment systems, integrated via the External Metrics API.

In this article, we will be focusing on the latter two: custom and external metrics. If you’d like a deeper dive into how these APIs work behind the scenes, we’ve covered them previously in this Horizontal Pod Autoscaling series.

For now, lets look at how you can make use of them in practice.

Example 1: Using Custom Metrics to Scale on Internal Queue Length

Imagine you have a microservice that consumes jobs from a queue. You probably already have a metric that tracks queue depth, why not scale on that?

Lets say you’re using Python. With the prometheus_client library, exposing a custom metric is straightforward. Here’s a small example:

from prometheus_client import start_http_server, Gauge
import random
import time

queue_length = Gauge('custom_queue_length', 'Length of job queue')

def get_queue_length():
    return random.randint(0, 100)

if __name__ == '__main__':
    start_http_server(8000)
    while True:
        queue_length.set(get_queue_length())
        time.sleep(5)

This script exposes the metric custom_queue_length on port 8000, updating it every 5 seconds. In reality, you’d fetch the real value from Redis, RabbitMQ, or your job broker of choice.

Once your application exposes metrics, you need to ensure Prometheus knows where to find them. Assuming you’re using kube-prometheus-stack, you can configure a ServiceMonitor like so:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: queue-worker-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: queue-worker
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: metrics
      interval: 15s

This tells Prometheus to scrape metrics from your service every 15 seconds. Be sure your deployment and service expose the metrics endpoint correctly.

Make sure your Prometheus instance is configured to select ServiceMonitors with the release: prometheus label (this is the default in kube-prometheus-stack).

Prometheus Adapter acts as a bridge between Prometheus and the Kubernetes metrics APIs. You’ll need to add a rule so it knows how to expose your metric:

rules:
  custom:
  - seriesQuery: 'custom_queue_length'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "custom_queue_length"
      as: "queue_length"
    metricsQuery: 'avg(custom_queue_length) by (namespace)'

This maps your raw Prometheus metric (custom_queue_length) to a Kubernetes-compatible custom metric (queue_length) which the HPA can use.

Now that the metric is exposed, you can plug it into a HorizontalPodAutoscaler definition:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: queue_length
      target:
        type: AverageValue
        averageValue: "30"

This HPA will scale your worker pods to maintain an average queue length of 30 jobs per pod, providing much more relevant responsiveness than CPU usage alone.

Example 2: Using External Metrics to Scale on AWS SQS Depth

Sometimes the metric you care about lives outside Kubernetes. Say you’re using AWS SQS for job queuing, how do you scale based on the number of messages in that queue?

Enter external metrics and tools like YACE (Yet Another Cloudwatch Exporter). YACE pulls metrics from AWS and exposes them in Prometheus format. As with Prometheus Adapter and Prometheus itself, YACE can be installed using an official Helm chart.

To track SQS queue depth, you might configure it like this:

config: |-
  apiVersion: v1alpha1
  sts-region: eu-west-2
  discovery:
    exportedTagsOnMetrics:
      AWS/SQS:
        - Name
    jobs:
      - type: AWS/SQS
        regions:
          - eu-west-2
        searchTags:
          - key: Environment
            value: production
        metrics:
          - name: ApproximateNumberOfMessagesVisible
            statistics:
            - Average
            period: 60

This setup pulls queue length from any SQS queues tagged for your production environment. Once deployed in your cluster, YACE exposes those metrics on a Prometheus-compatible endpoint.

Next define a rule to surface the metric to the External Metrics API:

rules:
  external:
  - seriesQuery: 'aws_sqs_approximate_number_of_messages_visible_average{queue_name="video-processing"}'
    name:
      matches: ".*"
      as: "external_queue_depth"
    metricsQuery: 'avg(aws_sqs_approximate_number_of_messages_visible_average{queue_name="video-processing"})'

This maps the SQS queue size to a Kubernetes-compatible external metric (external_queue_depth) which the HPA can use.

This can be created either in the Prometheus Adapter or the YACE chart, both charts will implement this rule by creating a PrometheusRule Kubernetes object.

Now you can define an autoscaler that responds to that external queue size:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: transcoder-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: transcoder
  minReplicas: 1
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: external_queue_depth
      target:
        type: Value
        value: "50"

This HPA will scale the transcoder pods until the SQS queue depth drops below 50 messages.

Combining Custom and External Metrics for Smarter Scaling

In real world systems, relying on a single metric for scaling can be too simplistic. For example, queue length might indicate backlog, but request rate helps contextualise how fast work is arriving. By combining both, you can trigger scaling only when load is high and demand is sustained.

Kubernetes allows you to define multiple metrics in the HPA spec. The deployment will scale based on whichever metric requires the highest number of replicas. In addition, you can define both metric types in a single autoscaler, allowing Kubernetes to react to internal signals and external demand.

Here’s an updated HPA definition that scales based on both queue_length and request_rate:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: transcoder-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: transcoder
  minReplicas: 2
  maxReplicas: 25
  metrics:
    - type: Pods
      pods:
        metric:
          name: queue_length
        target:
          type: AverageValue
          averageValue: "30"
    - type: External
      external:
        metric:
          name: external_queue_depth
        target:
          type: Value
          value: "50"

Kubernetes will evaluate both metrics and scale based on whichever one requires more replicas. This helps you respond more intelligently to different kinds of pressure, whether it originates inside your cluster or out in the cloud.

This is especially powerful when:

Internal metrics may be delayed (such as queue size not rising fast enough)
External systems signal future load (such as SQS buildup)
You want more stable, responsive scaling during traffic spikes

Closing Thoughts

Autoscaling with custom and external metrics opens up a whole new level of responsiveness and efficiency in Kubernetes. Rather than relying on blunt signals like CPU, you can scale your systems based on real indicators of demand, whether those come from inside your app or the wider ecosystem it operates in.

It does require some additional setup, exposing metrics, configuring Prometheus Adapter, and tuning your autoscalers, but the benefits are well worth the investment.

By aligning autoscaling with business logic and actual user behaviour, you’ll end up with systems that:

Scale faster when they’re needed most
Stay lean when traffic dips
Deliver better performance without unnecessary cost

If you’re looking to improve resilience, optimise cost, or build systems that scale organically with user demand, custom & external metrics are an essential part of your Kubernetes toolkit.

This blog is part of our Horizontal Pod Autoscaling series, we recommend reading the rest of the posts in the series:
Introduction to Horizontal Pod Autoscaling in Kubernetes
How to use Custom & External Metrics for Kubernetes HPA
Set up Kubernetes scaling via Prometheus & Custom Metrics
Leveraging Kubernetes HPA and Prometheus Adapter

Leveraging Kubernetes HPA and Prometheus Adapter

Go beyond CPU and Memory: Autoscale Your Workloads with Meaningful, Application-Level Metrics

A Brief Refresher: Custom vs External Metrics

Example 1: Using Custom Metrics to Scale on Internal Queue Length

Example 2: Using External Metrics to Scale on AWS SQS Depth

Combining Custom and External Metrics for Smarter Scaling

Closing Thoughts

Table of contents

Related services

Augment

Consult

Related solutions

Infrastructure

Observability

Consult

Augment

Validate

Allocate

Educate

Application Delivery

Infrastructure

Observability

Security

Testing

Training

Case Studies

Whitepapers

Blog

Cloud Native News

About

Partnerships

Careers

Contact

Leveraging Kubernetes HPA and Prometheus Adapter

Go beyond CPU and Memory: Autoscale Your Workloads with Meaningful, Application-Level Metrics

A Brief Refresher: Custom vs External Metrics

Example 1: Using Custom Metrics to Scale on Internal Queue Length

Example 2: Using External Metrics to Scale on AWS SQS Depth

Combining Custom and External Metrics for Smarter Scaling

Closing Thoughts

Table of contents

Related services

Augment

Consult

Related solutions

Infrastructure

Observability