save your seat
All posts

Scaling OTel Collectors on CPU? Here’s the Smarter Way with KEDA

Pipeline
Jun
10
2025
Jun
03
2025

Brace yourself, because you’ve been doing autoscaling wrong. If you’re an SRE or DevOps engineer still relying on CPU or memory-based HPA to scale your telemetry pipeline, it’s time for a wake-up call. In today’s high-volume observability world, CPU metrics are a terrible proxy for pipeline health. In this post, I’ll explain why the traditional approach fails and how a bold new solution – autoscaling with KEDA and the OpenTelemetry Collector using real telemetry metrics – can save your bacon. We’ll dive into the Sawmills KEDA Scaler Exporter that we just released, an OpenTelemetry Collector exporter that hooks custom metrics directly into KEDA’s scaling decisions. Buckle up: “cattle not pets” applies to your collectors too, and we’re about to treat them right

The Folly of CPU-Based Autoscaling for Telemetry

Let’s start with a controversial statement: if you’re scaling your OpenTelemetry Collector (or any service) based solely on CPU or memory, you’re doing it wrong. Sure, CPU and RAM were the easy buttons, they’re built-in, generic signals. But guess what? They often don’t reflect the reality of your workload. Especially for telemetry pipelines and modern apps, CPU usage can lie.

Not All Bottlenecks Are CPU. Your app or collector might be I/O-bound or rate-limited elsewhere. Imagine a surge in traffic where your app spends most of its time waiting on DB calls or network I/O. The CPU stays moderate while users are experiencing high latency. The HPA twiddles its thumbs because CPU is fine, even though your app is on fire. The result? Slow responses and unhappy users, all while your autoscaler snoozes.

Queues Backing Up – Many telemetry systems and background processors rely on internal queues. Your OpenTelemetry Collector could be drowning in a backlog of spans or metrics, or a message consumer might have a miles-long queue, but if the processing is efficient the CPU stays low. From HPA’s perspective, “No problem!” – until data loss or latency explodes. CPU doesn’t show that your collector’s pipeline is months behind on data or that it’s dropping telemetry on the floor.

In a real telemetry pipeline, throughput and latency matter far more than CPU%. A collector instance could be at 50% CPU yet be bottlenecked by a slow export or a stuck downstream, causing queues to pile up. The HPA won’t scale it out until CPU burns, which might be too late. This mismatch means your autoscaling is reactive at best, and dangerously blind at worst. We need to scale on signals that actually indicate stress in the pipeline. Things like like backlog, ingestion rate, or processing latency and not just host resource usage.

Why Telemetry Pipelines Demand Smarter Scaling

Telemetry pipelines are unique beasts. They’re the central nervous system of your observability stack, handling firehoses of data (metrics, logs, traces) and juggling bursts and lulls. Scaling them properly is critical. If your collector chokes, your entire monitoring goes blind when you need it most. Here’s why the old metrics don’t cut it and what better signals look like:

  • Backpressure is the Red Alert: In an OpenTelemetry Collector, data flows through receivers -> processors -> exporters. When exporters can’t keep up (say, your backend is slow or down), the collector’s queues start filling. Queue length and processing latency shoot up as early indicators of trouble . By the time CPU rises, you’ve likely already lost data or increased end-to-end latency. We should scale out as soon as those queues build or processing time spikes, not after the CPU finally maxes out.
  • Telemetry ≠ Steady Load: Observability traffic can be bursty and latency-sensitive . Think of a sudden error spike generating tons of logs, or a periodic batch job flooding your metrics pipeline. CPU-based HPA with a 60s stabilization might take minutes to respond, an eternity in outage terms. We need scaling that reacts within seconds to surges, not minutes. The traditional pull-model (Metrics Server or Prometheus scraping) adds delay: e.g. Prom scrapes every 1m, KEDA polls metrics every 30s – worst case you wait ~90s to respond . That’s too slow! In contrast, a push/event-driven signal can trigger scaling almost immediately on surge.
  • Scaling on Real SLOs: Ultimately, we care about service quality metrics – e.g. request latency, error rate, throughput – not just resource utilization. If your 95th percentile latency for requests goes beyond 500ms (your SLO), you’d want to add more pods before users suffer . If your collector’s ingestion rate is 100k spans/second and climbing, you might add another collector instance before it starts dropping data. These are the metrics that truly matter for scaling decisions, not CPU idle percentages.

Bottom line: Scaling on CPU/memory alone is like driving blindfolded – you’ll eventually crash. We need to scale on the metrics that reflect actual demand and strain on the system. This is where OpenTelemetry and KEDA team up to save the day.

Meet KEDA + OpenTelemetry: Smarter Autoscaling (No Prometheus Required)

So what’s the answer? KEDA (Kubernetes Event-Driven Autoscaling) + OpenTelemetry Collector metrics. This combo lets you scale anything – including the OTel Collector itself, based on custom telemetry signals. And you can do it without the clunky Prometheus middleman in the loop. That’s right: no need to maintain a Prometheus just to feed your autoscaler, and no waiting for scrape intervals . We’re cutting out the middleman and going straight to the source of truth: the metrics already flowing through your OTel pipeline.

How does it work? KEDA supports external scalers, which are basically gRPC interfaces that KEDA can call to get metrics for scaling. Instead of polling a metrics API server or PromQL query, KEDA can ask any service that implements its protocol: “Hey, got a metric for me? Should we scale?”. This is a perfect match for OpenTelemetry, which is all about collecting and exporting metrics. Why not have the collector itself supply metrics to KEDA? Enter the Sawmills KEDA Scaler Exporter.

Sawmills KEDA Scaler Exporter: Tying OTel Metrics to KEDA’s Brain

The Sawmills KEDA Scaler Exporter is a new component (an exporter in OTel terms) that glues OpenTelemetry metrics directly into KEDA’s external scaler interface. Essentially, it turns your OpenTelemetry Collector into a KEDA-aware metrics provider. Here’s what happens under the hood:

  • OpenTelemetry Collector receives metrics (from your apps, services, or the collector’s own internal metrics). This could be via any receiver – OTLP, Prometheus scrape, etc. – depending on what you want to scale on.
  • Those metrics flow through any processors (aggregations, filters, etc. if you have them). You might, for example, calculate a rolling 95th percentile latency using an OTel metrics aggregator, or just pass through raw counters/gauges.
  • The Sawmills exporter, at the end of the pipeline, grabs these relevant metrics and stores them in-memory for a short period. It’s essentially a lightweight, embedded time-series store inside the collector. By default it might keep, say, 1 minute of data points – enough to compute things like rates or percentiles over a recent window . (No heavy TSDB or external store needed – we’re keeping this lean and mean.)
  • The exporter opens a gRPC server and registers as a KEDA external scaler. This means it implements KEDA’s external scaler API – providing functions for KEDA to call like GetMetricSpec() and GetMetrics() . The Collector (with Sawmills exporter) listens on a port (e.g. gRPC on some endpoint) for KEDA’s requests.
  • KEDA is configured to use this external scaler. In your Kubernetes ScaledObject, you specify an external trigger pointing to the OTel Collector’s scaler service. For example, you might say: use the external scaler at otel-collector-service:31333, with metric name “otelcol_queue_utilization” and target value 0.8 (80%). Or perhaps “http_request_latency_p95” with target 300 (ms). Whenever KEDA’s control loop runs, it will call the Collector’s scaler API to fetch the latest value of that metric.
  • Autoscaling decisions are made on real metrics. KEDA treats that metric value just like any other. If the value says “95th percentile latency = 450ms” and your target was 300ms, KEDA will declare scaling needed (e.g., scale out). If the value drops below threshold, KEDA will scale in. You can define these triggers for any metric the Collector knows about – QPS, error count, queue length, you name it. KEDA essentially offloads the heavy lifting to the Collector: Collector collects & computes, KEDA triggers scaling.

No more guesswork. We’re now scaling on the same signals we monitor: if telemetry indicates stress, we add capacity. This aligns autoscaling with SLOs and real system behavior. And because it’s event-driven/push-based, the reaction time is dramatically better than the old scrape-and-poll model. KEDA gets metric updates almost instantly when the Collector receives them, so you can scale in seconds rather than minutes. In practice, this could mean preventing an observability outage by adding a collector pod at the first sign of backlog, or maintaining low latency for users by scaling out an app before CPU even notices a bump.

Real-World Autoscaling Examples (Finally Scaling on What Matters!)

Let’s get concrete. What kind of magic can you do with OTel + KEDA that you couldn’t (easily) before? Here are two real-world examples to spark your imagination:

  • Example 1: Scale Based on 95th Percentile Latency. You run a user-facing API service and you care about keeping 95th percentile response time under, say, 250ms. With the OpenTelemetry collector gathering request latency metrics from your service, you can feed the p95 latency directly into KEDA. For instance, configure a trigger: if http.server.duration{quantile=0.95} > 250ms for the last 1 minute, scale out. This way, the moment your high-end latency starts creeping up (indicating the service is struggling), KEDA will add pods proactively, before users really feel it. Contrast this with CPU-based scaling: your CPU might be only 50% while latency is already 500ms due to DB waits – HPA would stay asleep and your users would suffer. Scaling on latency is scaling on your SLA, which is ultimately what you actually care about.
  • Example 2: Scale Collectors on Ingestion Rate (QPS). Suppose your telemetry pipeline handles metrics from thousands of IoT devices, and the volume can fluctuate massively (day/night cycles, etc.). You can configure the collector’s Sawmills exporter to track the ingestion rate – e.g., number of metric data points or spans received per second. Then set up KEDA to scale the Collector deployment based on that rate. If ingestion jumps from 10k/sec to 50k/sec, KEDA will quickly spin up more collector replicas to handle the load. No more dropped data or skyrocketing processing delays. In our experiments, we’ve scaled Collector deployments horizontally based on incoming request count rates with great success – any metric an application or collector exposes can drive scaling. For example, one team used the number of incoming HTTP requests (a counter from the app, exported via OTel) to autoscale their web backend pods in near-real-time as traffic spiked. The same can be done for any pipeline: if 1000 messages/sec is your comfortable per-Collector throughput, set that as the target – when one instance starts seeing 1200/sec, boom, add a new one. When it drops to 200/sec at night, scale down and save resources. All automatic, all driven by the actual workload.

These are just two scenarios. The possibilities are endless. Error rates, queue depths, GC pause times, custom business metrics – if you can emit it to the OTel Collector, you can probably scale on it. You’ve now flipped the script: instead of scaling on generic signals and hoping it correlates to performance, you scale directly on performance metrics. It’s like going from using a Ouija board to a high-precision sensor.

Quickstart: Enabling the KEDA Scaler Exporter (It’s Ridiculously Easy)

Alright, you’re sold on the concept (hopefully!). But you might be thinking, “This sounds complex to set up.” Actually, it’s pretty straightforward. The Sawmills KEDA Scaler Exporter slots right into your Collector config, and if you’re using Helm or the OTel Operator, it’s a few lines of config. No custom code, no separate microservice to maintain (the exporter is part of the collector).

Here’s a sample of how you can enable it:

# values.yaml snippet for OpenTelemetry Collector (Helm chart)
config:
  receivers:
    otlp:
      protocols:
        grpc: {}        # receiving metrics via OTLP
  processors:
    batch: {}           # (optional) batch processor for efficiency
  exporters:
    logging: {}         # just for demo (print metrics to log)
    kedascaler:         # enable the KEDA scaler exporter
      endpoint: "0.0.0.0:31333"   # gRPC endpoint where KEDA will connect
      # (Optional config like retention period can go here, defaults are sensible)
  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [logging, kedascaler]


That’s it! We added the kedascaler exporter with an endpoint (here port 31333). The Helm chart (or OTel Operator CR) will deploy the Collector with this config. The collector will start up and begin listening on 31333 for KEDA’s gRPC calls.

On the KEDA side, you’d create a ScaledObject that references this external scaler. For example:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-collector-scaledobject
spec:
  scaleTargetRef:
    kind: Deployment
    name: otel-collector       # the deployment to scale
  triggers:
  - type: external
    metadata:
      scalerAddress: otel-collector.default.svc.cluster.local:31333
      metricName: "otelcol_queue_utilization"    # metric key exposed by the scaler
      targetValue: "0.8"


In the above, we tell KEDA there’s an external scaler at the Collector’s address. We specify which metric to use (the name depends on what the Collector exporter provides or how you configure it) and the target value. KEDA will handle the rest, polling the metric and driving the HPA beneath the hood. No Prometheus, no custom adapters, just direct metric-driven scaling.

Note: The exact syntax for the trigger might vary (the exporter could support multiple metrics or a query language to pick one). In some implementations, you might even write a small query (similar to PromQL) in the trigger metadata to select or aggregate a metric . The Sawmills exporter follows the OpenTelemetry Collector’s philosophy – it’s flexible. For instance, if you wanted a 5-minute rate or a percentile, you could configure the exporter or use an OTel processor to compute that metric and then expose it. The key point is: enabling this is not a heavy lift. If you already run the Collector, it’s a minor configuration change to supercharge it with autoscaling powers.

Architecture Deep Dive (For the Curious)

Let’s briefly recap the architecture in plain terms, because it’s worth understanding how elegant this is:

  • Receiver (Input): The Collector can ingest metrics from anywhere – your app, other collectors, Prometheus scrape, etc. In many setups, you might deploy collectors as agents on nodes or as a central service. No matter what, make sure the metrics you care about (e.g. your app’s latency, or the collector’s own throughput) are coming into a pipeline.
  • In-Memory Metric Storage: The KEDA scaler exporter keeps a short-term history of metrics. This is crucial for calculating things like rates (you need previous values) or percentiles over a time window. It’s essentially an embedded time-series database in memory , but very lightweight. By default, many use ~1 minute retention – enough to see trends without hogging memory. Think of it as a rolling window of data points that continuously updates.
  • gRPC Scaler API (Output): The exporter opens up a gRPC service endpoint that implements KEDA’s External Scaler API . This includes methods to report the current metric values and whether the scaler is “active.” (Active just means “should we be scaling at all or can we scale to zero when idle.”) When KEDA calls, the exporter calculates the latest value (or runs the query you gave it) against its in-memory data and returns it. It’s essentially pull-based from KEDA’s perspective, but the data was pushed into the collector, so we get the best of both worlds – push speed with controlled pull frequency.
  • Kubernetes HPA under the hood: It’s worth noting that KEDA isn’t replacing HPA, it’s augmenting it. KEDA will create or manage an HPA object for your deployment, feeding it these custom metrics. This means all the usual HPA behaviors (cooldowns, min/max replicas, etc.) still apply – just that the signal is now your custom metric. You can still combine this with CPU if you want (multiple triggers), but often the custom metric is enough.

To visualize it: the Collector with Sawmills exporter is like a wise advisor sitting next to the KEDA operator. Instead of KEDA only looking at generic signals, it asks the advisor “how stressed is the system really?” and the advisor (Collector) says “here’s the real metric you care about.” KEDA then uses that wisdom to scale accurately. The whole system is event-driven, streaming, and closes the observability-to-action loop in a way that finally makes your autoscaling as smart as your monitoring.

Looking Ahead: Smarter Autoscaling and How You Can Contribute

We’re just getting started. The integration of OpenTelemetry with KEDA via the Sawmills exporter opens up a new world of fine-grained, intelligent autoscaling – and there’s plenty of room to grow:

  • Advanced Metrics & Aggregations: Today, you might scale on one metric at a time or a simple windowed query. Future enhancements could allow more complex conditions – e.g. scaling on 95th percentile latency AND error rate simultaneously (multi-metric triggers), or using predictive algorithms to scale before a metric crosses the line. The groundwork is there: we have the data in OTel and the mechanism in KEDA, so community contributions could bring richer scaling logic (maybe even AI/ML driven scaling based on metric patterns – the sky’s the limit).
  • Wider Signal Support: Right now we focused on metrics, but imagine scaling on traces or logs signals. For example, scale up if the rate of error logs goes above X, or if tracing shows an average of 2+ retries per span in the last 5 minutes (indicating downstream slowness). OpenTelemetry has all that data. The current exporter is metric-centric (because KEDA expects numeric metrics), but there’s potential to derive metrics from other signals. Converters or processors could turn trace data into metrics that feed the autoscaler. Ideas like “scale if checkout service trace latency > 2s” could become reality.
  • Better Multi-Instance Coordination: As we deploy collectors in clusters, an interesting challenge is aggregating metrics across multiple instances. If you have N collector pods all exporting their own “current QPS,” KEDA might need the sum or max. Currently, you might push all metrics to one scaler service or use a Service that KEDA queries (which load-balances to one instance). Future work could introduce a small coordinator or allow the exporter to report cluster-wide metrics (maybe via gossip or an upstream aggregator). Contributors are welcome to tackle how to make scaling decisions when metrics are sharded across pods – a fun problem to solve!
  • Polish and Usability: The Sawmills KEDA Scaler Exporter is a new tool – documentation, tutorials, and best practices are evolving. We encourage you to try it out and give feedback. Found a rough edge in configuration? Have an idea to support a new metric type or transform? Jump in! The OpenTelemetry and KEDA communities are vibrant and would love your help. By contributing, you’re not just improving a single project – you’re pushing forward the state of the art in cloud automation.

One thing is clear: telemetry-driven autoscaling is the future. We’re moving towards a world where scaling decisions are directly tied to user experience and system reliability, not just low-level resource stats. It’s an exciting time to be an SRE/DevOps engineer because you finally have the tools to make autoscaling truly intelligent and reactive to what you care about (not what the kernel cares about).

So go ahead – rip out that old CPU-based autoscaling policy that’s been limping along. Deploy the KEDA scaler exporter, and let your telemetry pipeline scale itself on its own terms. You’ll wonder how you ever lived without it. And as you do, share your stories, join the community discussions, and consider contributing improvements. This is a new frontier, and we’re all building it together.

In summary: You’ve probably been autoscaling your OpenTelemetry Collector the wrong way – but now there’s a right way. It’s bold, it’s opinionated, and it works. Embrace the “metrics that matter” mindset, and your autoscaling will never be the same. Happy scaling!

Amir Jakoby is the Co-founder & CTO of Sawmills, a smart telemetry management platform that helps businesses identify and solve telemetry cost, quality, and availability issues in seconds. Schedule a demo to see Sawmills in action.