Kubernetes log management at scale: where the volume comes from and where to stop it

Observability

Jun

2026

Jun

2026

Most teams treat Kubernetes log management as a collection problem. Pick a DaemonSet, point it at /var/log/pods, ship everything to a backend, build a few dashboards, done. That works until the cluster grows past a few hundred pods. Then the same setup starts producing a bill that nobody can explain and a query experience that gets slower every quarter.

The reason is simple. Kubernetes makes it trivial to emit logs and expensive to keep them. Every pod writes to stdout. The container runtime captures it. A node-local collector reads it. A gateway forwards it. A backend indexes it. At no point in that chain does anything ask whether the log line deserved to exist. Collection scales linearly with pod count. Cost scales with volume times retention times index. Those are not the same curve, and the gap between them is where most Kubernetes observability budgets disappear.

This post is about closing that gap. Not by switching backends, and not by writing one more filter rule the week the bill spikes. By deciding what logs should exist, where they should be processed, and which tier they belong in, as a property of the telemetry lifecycle rather than a cleanup chore.

Where Kubernetes actually puts logs, and why the path matters

Before talking about volume, it helps to be precise about the mechanics.

Kubernetes itself does not store logs. Your application writes to stdout and stderr. The container runtime (containerd or CRI-O) captures those streams and writes them to disk on the node, typically under /var/log/pods/<namespace>_<pod-name>_<pod-uid>/<container-name>/. There is also /var/log/containers/, which holds symlinks back to the pod path. Most modern collectors read directly from /var/log/pods so they can extract pod metadata from the path and avoid chasing symlinks.

The kubelet does not rotate these files. The container runtime does, usually at 10 MiB per file with a handful of files retained, though cluster operators change those defaults. When a file rotates, the collector has to detect the inode change, continue from the new file, and avoid duplicating or dropping lines in the transition. This is the single most common place that production log pipelines quietly fail. Under a volume spike, the collector falls behind, rotation happens faster than it can keep up, and you lose lines exactly when you need them most.

None of this is a reason to log less by itself. It is a reason to understand that “just collect everything” has a failure mode that gets worse precisely under load. The volume question and the reliability question are the same question.

The collection architecture decision

There are three canonical ways to deploy a collector in Kubernetes, and the choice shapes both cost and what processing is even possible.

A DaemonSet runs one collector pod per node. It reads local pod logs off the node filesystem, scrapes kubelet metrics, and accepts OTLP from local pods. It is the cheapest pattern for log collection at volume because logs never cross the network just to reach a collector. The constraint is that each node runs its own instance, and tail-based decisions that need a full picture (like trace sampling) cannot happen here because the relevant data is scattered across nodes.

A sidecar runs a collector container inside each application pod. Strong isolation, per-app config, clean RBAC story. The cost is that you double resource usage across the cluster and every pod spec needs the sidecar. Appropriate for high-isolation workloads like payment processors, overkill for general logging.

A gateway is a standalone collector deployment, usually a few replicas behind a Service, that receives telemetry from agents and does centralized work: filtering, transformation, redaction, routing to backends. It becomes a critical-path service, so it has to be scaled and load-balanced, but it is the one place where you can enforce policy once instead of per node.

For most teams the answer is DaemonSet plus gateway:

[Apps] -> [DaemonSet agent (per node)] -> [Gateway (cluster)] -> [Backend(s)]

Agents do node-local collection. The gateway is where log volume policy lives. This matters for the rest of this post, because every reduction technique below has a natural home in that topology, and putting it in the wrong place either does not work or does not save what you think it saves.

Where the cost actually lives

“More logs means more cost” is true and useless. The useful version is specific about which logs and which cost dimension.

The biggest contributors, in rough order of how often they surprise people:

First, success-path lifecycle logs. The classic pattern is a service that logs starting payment, calling provider, provider returned, processing response, finished payment for every single request. Five lines per request, none of which classify a failure or change what an operator would do. At a few hundred requests per second this is millions of lines a day that exist only to confirm that normal things happened normally.

Second, kube-probe and health-check traffic. Liveness and readiness probes hit endpoints constantly, and if the app logs requests, those probes generate a steady baseline of logs that carry zero diagnostic value.

Third, debug logs that escaped to production. A service ships with DEBUG enabled “temporarily” during a rollout and nobody turns it off. Debug volume is often an order of magnitude above info.

Fourth, retention defaults. Most vendors tier storage: a hot, searchable, expensive tier and colder, cheaper, slower tiers. The common misconfiguration is everything at 30-day hot retention “to be safe,” which multiplies the hot-tier cost across log classes that nobody queries after day two. Different teams setting different retentions without coordination means the platform team pays for the maximum.

Notice that none of these are fixed by a cheaper backend. They are fixed by deciding what should exist and where it should live. That decision has a natural sequence.

Reduction techniques, in order of effectiveness

Order matters here because the cheapest log to manage is the one that never enters the pipeline.

Drop low-value logs at the source. Health checks, kube-probe traffic, success-case lifecycle noise. Drop these before they enter any pipeline. On the DaemonSet agent is ideal, because then they never cross the network or hit the gateway.
Sample high-volume, low-value logs. Chatty info-level logs from a known-noisy service can often be kept at 5 to 10 percent without losing the signal, since the hundredth identical line tells you nothing the first did not.
Reduce content size. Drop verbose fields, truncate long strings, strip metadata nobody queries. Smaller events, lower ingest.
Route by value to different retention tiers. Error logs to a hot tier with reasonable retention, info logs to a warm tier with short retention, debug to archive only or nowhere. This is gateway work, using namespace or severity to decide the destination.
Drop entire classes per environment. DEBUG should never reach production storage. That is a rule, not a judgment call made per service.

A filter on the OTel collector that drops health-check noise looks like this, and it belongs on the agent so the dropped lines never travel:

processors:
  filter/drop_health:
    logs:
      log_record:
        - 'attributes["http.target"] == "/healthz"'
        - 'attributes["http.target"] == "/readyz"'
        - 'IsMatch(body, ".*kube-probe.*")'

The point is not the specific syntax. It is that the rule has an owner, lives in version control, and applies to every service, instead of being rediscovered the next time someone notices the bill.

Severity is the lever almost everyone gets wrong

Most of the reduction techniques above depend on one thing being correct: severity has to mean something operational, not emotional. If severity is assigned based on how worried the developer felt, every routing and sampling rule built on top of it is unreliable.

The model worth enforcing is small:

debug: temporary or local diagnostic detail, not for production at scale
info:  meaningful lifecycle or business event, not a problem
warn:  unexpected or degraded behavior that was handled
error: failed operation affecting correctness, UX, or an SLO
fatal: process or service cannot continue safely

Under this model a declined credit card is info, because a decline is a valid business outcome, not a system failure. A provider timeout that the system retried and recovered from is warn. A database write failure during checkout is error. When severity is consistent, “route errors to hot, info to warm, debug to archive” becomes a safe automatic rule. When it is not, you are routing noise and signal into the same buckets and the cost reduction is a coin flip.

This is why log management in Kubernetes is not really a collector configuration problem. The collector executes whatever policy you give it. The hard part is having a policy that the logs themselves are structured to support, which is a code-time and standards decision, not a pipeline-time one.

Tooling: collectors are the easy part

When people ask which log management tool to use for Kubernetes, they usually mean the collector. Fluent Bit and Vector are strong in log-specific, high-throughput node paths. The OpenTelemetry Collector is the better fit when logs are one signal among metrics and traces and you want one vendor-neutral architecture across many teams. A common hybrid runs Vector or Fluent Bit agents for edge parsing and filtering on high-volume node logs, then an OTel Collector gateway for standardized routing and export.

But the collector choice is the part you make once and rarely revisit. The work that actually consumes the platform team is everything after install: figuring out which services are burning the budget this month, deciding what to drop or sample or reroute, validating that a drop rule does not delete the one line you need during an incident, and rolling that out across services that come and go every day. That work does not finish. It regenerates every time a team ships a new service with default instrumentation.

How Sawmills approaches this

Sawmills, the agentic telemetry operator, treats Kubernetes log management as a lifecycle problem rather than a pipeline one. It analyzes the telemetry flowing through your cluster in real time, identifies the volume that is not earning its keep, and applies the drop, sample, route, and retention policies your DevOps team defines, continuously, within the guardrails you set. The platform team owns the strategy. Developers self-serve changes in Slack or Teams without filing a ticket against the collector config.

Built on OpenTelemetry, Sawmills runs alongside the collectors and backends you already operate, so it makes Datadog, Splunk, or your Grafana stack cheaper and the data cleaner rather than replacing them. The result is that the cost reduction you were chasing manually stops being a quarterly fire drill and becomes a property of how telemetry moves through the cluster.

See it in action.