READ MORE
All posts

Prometheus Cardinality: Why Your Active Series Keep Growing

Observability
Jun
23
2026
Jun
23
2026
Prometheus cardinality

The first sign is almost never a dashboard. It is a pager. A Prometheus replica gets OOMKilled, restarts, replays its write-ahead log, and falls behind on scrapes. You look at prometheus_tsdb_head_series and it has doubled in a week, even though request volume is flat. Nobody shipped more traffic. Someone shipped one more label.

That is the shape of almost every cardinality incident. It does not arrive as a gradual climb you can plan around. It arrives as a step change tied to a single deploy, where an engineer added user_id or pod_uid or a raw URL path to a metric that was previously bounded, and the multiplication did the rest. The cost shows up everywhere downstream: head-block memory, index size, query latency, and, if you ship those metrics to a vendor billing on active series, the invoice your CFO asks about next quarter.

This article is about the mechanics. Where active series actually come from, how to find the label driving the explosion in under a minute, and the three places you can cut cardinality without losing the signal you page on. The goal is not to scare you off labels. Labels are the whole point of dimensional metrics. The goal is to put each piece of data where its cost economics make sense.

A metric is a name. A series is what you pay for

The single most common mistake in cardinality discussions is using "metric" and "series" interchangeably. They are not the same thing, and the difference is the whole cost model.

A metric is a name, like http_server_request_duration_seconds. A time series is a unique combination of that name plus a specific set of label values. Two samples belong to the same series if and only if they share the same metric name and identical label values. Change any label value and you have created a second series. Prometheus stores, indexes, and (for active-series-billed vendors) charges per series, not per metric and not per data point.

This is why ingest volume in gigabytes and series count diverge so sharply. You can have a low-traffic service emitting a handful of samples per scrape that still produces millions of series, because the damage is in the label combinations, not the sample rate. Prometheus holds the most recent data in an in-memory head block, backed by a write-ahead log and flushed to two-hour persistent blocks on disk, with a separate inverted index mapping label sets to series IDs. The Prometheus storage documentation describes this engine, and the practical takeaway is that head-block memory scales with active series count. Field reports and vendor guidance converge on roughly 3 to 4 KB of RAM per active series, so a million active series is several gigabytes of head block before you run a single query.

The number to watch is prometheus_tsdb_head_series. Track it over time. Sudden growth without a matching traffic increase is the signature of a cardinality bomb, and it is the one chart that will tell you a bad label landed before the OOMKill does.

The math: how one label turns 3,600 series into 180 million

Cardinality for a metric is approximately the product of the cardinalities of its labels, bounded by which combinations actually occur in production. Not every theoretical combination appears, but the upper bound is what you are gambling against.

Take a reasonable latency metric with four labels:

service.name          30 services
http.request.method    5 methods
http.response.status   8 status codes
environment            3 environments

The theoretical maximum is 30 × 5 × 8 × 3 = 3,600 series. Every metrics stack on earth prices that comfortably. Now an engineer adds user_id to "break latency down by customer," and your service has 50,000 active users:

3,600 × 50,000 = 180,000,000 series

That is the explosion, in one label, in one pull request. It passed code review because a single extra label looks harmless in a diff. Nobody multiplied through. This is the combinatorics that make cardinality such a routine and predictable failure, and Brian Brazil's point in "Cardinality is key" still holds: the cost of a label is not its presence, it is the product of its distinct values against everything else.

The reason engineers reach for these labels is legitimate. They want to slice latency by user, session, or request to debug a specific complaint. That instinct is correct. The placement is wrong. Request-level identity is forensic detail, and forensic detail belongs in traces and logs, not in metric labels that exist to aggregate behavior across populations.

Finding the cardinality before it finds you

You do not need to guess which metric is the offender. Prometheus ships the answer.

The fastest path is the TSDB status endpoint, documented in the Prometheus HTTP API reference. Hit it directly:

curl -s '<http://localhost:9090/api/v1/status/tsdb?limit=20>' | jq

The response lists seriesCountByMetricName, labelValueCountByLabelName, and seriesCountByLabelValuePair. Those three fields tell you, in order, which metric names own the most series, which labels carry the most distinct values, and which exact label-value pairs are doing the damage. That is usually enough to name the culprit in one call.

When you want to explore interactively, PromQL does the same job. To rank metrics by series count:

topk(10, count by (__name__)({__name__=~".+"}))

To find which label is exploding a specific metric:

count by (label_name)(http_server_request_duration_seconds_bucket)

Run these against a single replica, not a federated view, and run them off-peak if the instance is already memory-pressured, because a count by over a high-cardinality metric is itself an expensive query. These two queries plus the TSDB status endpoint belong in every platform team's runbook. They turn "the metrics stack is unhealthy and we don't know why" into "metric X grew because label Y started carrying request IDs after the 14:00 deploy."

The labels that blow up, and where they belong instead

Cardinality bombs are not exotic. The same handful of labels cause most incidents, and each one has a correct home that is not a metric label.

The decision rule is simple: anything that identifies a single instance of a request, transaction, user, or short-lived object does not belong as a metric label.

  • user_id, customer_id: cardinality equals your user count, which grows without bound. Move to trace attributes or structured logs.
  • request_id, trace_id, span_id: cardinality equals request count. These belong in logs, correlated through trace_id, not on metrics.
  • session_id, cart_id, order_id, invoice_id: cardinality equals the count of business objects. Logs or traces.
  • pod.uid, container_id: these churn on every restart, so even stable traffic rotates the series set continuously. Keep pod-level identity in resource attributes where it is bounded by pod count, and drop the UID variants.
  • error_message: free-form strings mint a new series per error variant. Move the text to a log attribute and keep a bounded reason label on the metric instead.
  • url.full or raw paths with embedded IDs: every unique URL is a new value. Use the templated http.route (/users/{id}) instead of /users/8675309.

The Prometheus project says the same thing more tersely in its metric and label naming practices, which is the page currently ranking first for this query: keep label cardinality bounded, and remember that every label value combination is a new series. The reframe that helps platform teams most is that the bad label was rarely bad data. user_id is valuable. It was bad as a metric label. The fix is to move it, not delete it.

Histograms: the cardinality multiplier hiding in your latency metrics

Latency histograms are where cardinality quietly doubles. A classic Prometheus histogram is not one series. With N buckets and L label combinations it produces N × L bucket series, plus 2 × L for the _sum and _count. A latency histogram with 20 fixed buckets across 200 service-and-method combinations is 4,000 bucket series before you add a single business label. Add one ill-placed label and the histogram multiplies it across every bucket.

This is why a single bad label feels so much worse on a histogram than on a counter. You are not multiplying one series, you are multiplying the entire bucket set.

The structural fix is the native histogram in Prometheus 2.40 and later, stored as the OpenTelemetry ExponentialHistogram data type. Instead of fixed buckets defined at instrumentation time, it uses adaptive exponential bucketing, which represents the same latency distribution in far fewer series, commonly 3 to 10 times fewer for typical request-latency shapes. The catch is that it is not automatic. Existing classic-histogram instrumentation has to be migrated, and your downstream has to support the format. But for any team where latency histograms dominate the series count, this is the highest-leverage structural change available.

Three places to cut cardinality, in order of leverage

Once you know the offender, you have three places to cut it. They are not equivalent, and the order matters.

First, fix it at the instrumentation point. Replace high-cardinality identifiers with templates at the SDK: http.route instead of url.full, a bounded reason enum instead of error_message. This produces the cleanest metrics and the lowest series count, because the bad value never gets created. It is also the slowest to roll out, because it requires a code change in every emitting service. This is the early-binding principle Grafana describes in its guide to managing high cardinality: the earlier you bind the decision, the cheaper it is.

Second, drop or relabel at the collector. When you cannot change the source quickly, strip the label in the pipeline before export. In Prometheus scrape configs this is metric_relabel_configs. If you run the OpenTelemetry Collector in front of your backend, the transform processor does it with one statement:

processors:
  transform/drop_user_id:
    metric_statements:
      - context: datapoint
        statements:
          - delete_key(attributes, "user_id")

This processor sits in the metrics pipeline and runs before export, so the offending label never reaches the backend's billing meter or your remote-write target. It is the most reversible control you have, which makes it the right emergency brake when a bad label lands in production at 2am.

Third, aggregate before storage. When you genuinely need the metric but not the dimension, pre-aggregate over the high-cardinality label with a Prometheus recording rule or the Collector's metricstransform processor, collapsing per-instance series into the dimensions you actually query. Sampling a metric to control cardinality is a last resort and almost never correct for SLO-critical signals, because it degrades exactly the density you need during an incident.

Why static relabel rules decay

Here is the uncomfortable part. Every control above is a snapshot. You drop user_id today, and the rule holds until next sprint, when a new service ships with tenant_id on a metric, or a framework upgrade starts auto-instrumenting a route with embedded IDs, or an incident gets debug labels added that nobody removes afterward.

Telemetry is not static. New services appear, traffic patterns shift, developers add labels, and the metric cardinality keeps exploding no matter how good your last cleanup was. If your only control plane is a pile of hand-maintained metric_relabel_configs and recording rules, you are depending on a human to notice every new source of waste and react to it, forever. That is precisely the kind of cross-cutting, never-finished work that loses to feature delivery every time, which is the same dynamic that makes the Datadog bill keep growing even after a big cleanup quarter. The labels come from developers. The bill lands on the platform team. That dead zone is where runaway cardinality lives.

How Sawmills approaches this

Sawmills is the operator that runs the cardinality controls described above continuously, inside the guardrails your platform team sets. It watches the active-series picture across your pipeline, identifies the metric and the label driving an explosion the way the TSDB status endpoint would, and applies the drop, relabel, or aggregation in the pipeline before the data reaches Prometheus or your vendor's billing meter. Your dashboards, alerts, and PromQL stay exactly where they are.

Cardinality is not a project, it is a permanent control problem. A new tenant_id label that multiplies series count overnight does not wait for your next audit. Sawmills catches the multiplication when it happens, attributes it to the service that introduced it, and lets developers self-serve the fix in Slack or Teams within the policy the platform team defined.

For a deeper treatment of the failure modes, our guide to high cardinality in metrics covers the causes and solutions in detail.

If your prometheus_tsdb_head_series chart has a step in it that you cannot explain, that is the exact problem Sawmills was built to catch before it pages you. Schedule a demo to see it find the label driving your active-series count against a pipeline like yours.