Demystifying DataDog Logs Pricing

Originally posted in Obics.io

Understanding DataDog pricing sometimes feels like you need a PhD on the matter. Some would say on purpose. And it’s a shame that such a wonderful product is accompanied by a less pleasant side. So, in this post, we’ll be demystifying exactly how DataDog bills for its log offering, which optimizations help and which don’t, and the fine-print gotchas to look out for.

Plans, discounts, and overage

For a big company, DataDog will always have a discounted plan, signed after careful negotiations. For example, Acme Inc. might be ingesting 1 Petabytes of logs/year and paying around $1 million/year. But that is a heavily discounted because the self-serve price will be about $4 million. If Acme Inc. ingests over their quota of 1 petabyte, they will pay a rate of about 50% more than the negotiated price for any extra, which is still going to be about half of the self-serve rate.

Understanding how DataDog charges for overage/self-serve, can give us a baseline for understanding negotiated contracts. At the end of the day, a yearly contract is just bundling a fixed volume of logs and a discount.

DataDog splits their billing into two parts: Ingestion and Indexing.

Ingestion is charged per 1GB of logs. Each gigabyte has a fixed amount, like $0.10 per GB, which is their documented starting price.
Indexing is charged per number of events. For example, $1.70 per million events when on a yearly commitment. Or $2.55 per million events on demand.

As mentioned earlier, when you have a plan that you’ve negotiated with a sales rep, those rates will be much lower.

This mismatch between volume and number of events affects several optimization techniques—you may or may not be able to use them effectively depending on the case. For example, if you use pipeline processors to take a big message and turn it into a smaller message, e.g. dropping fields, you don’t actually save anything. The ingested message is calculated from the original raw payload, and indexing isn’t based on volume anyway.

Tiers: Hot, Warm, and Cold

DataDog has tiering, also known as hot/cold storage. Four tiers actually, with the fourth being a new addition. In each tier, you are charged the same amount for per-GB ingestion, but the indexing price varies.

Standard tier

The default tier is the Standard tier. That’s the most expensive tier, but it gives the best user experience. Queries are the fastest for logs in this tier, you can build dashboards and monitors directly from logs, and you can use anomaly detection. You’ll want shorter retention in this tier than other tiers because it’s more expensive and intended for incident investigations.

Flex tier

The Flex tier is meant as a cheaper alternative for longer retention, but you can still query your logs, albeit slower.

However, you can’t create monitors, or use anomaly detection for logs in this tier. You can create dashboards, but those will be slower and can easily get into performance problems.

Flex is not necessarily going to be cheaper than Standard, just because it has a higher minimum retention starting point. That means that for queryable logs, there’s a minimum price that you can’t work around just by switching tiers.

Archive tier

The archive tier is the cheapest. In this tier you store the logs in your own cheap storage, like AWS S3. The tradeoff is that logs are not directly queryable. You can “rehydrate” archive tier logs into standard tier, but then you pay a “penalty” cost for the rehydration, in addition to the indexing cost of the standard tier. All in all, it’s very useful for compliance and long-term regulatory retention.

Across all tiers, you still pay a relatively large ingestion fee in DataDog. For comparison, ingesting to AWS S3 is 5-10x cheaper.

BYOC tier

If you’ve been following the observability space, you may know of startups like Groundcover that offer the Bring Your Own Cloud (BYOC) approach. The idea is simple: drastically reduce cost by storing logs in cheap storage (like S3) and use the storage in your own cloud, which means the observability vendor won’t charge the extra margin.

DataDog noticed that once customers reach a certain scale, say $10M+ annual spend, they wise up and start looking for cheaper alternatives. At first, they started offering this for the largest customers that were about to churn, an “invite only” kind of deal, but now they seem to be offering it to the everyone . At least for enterprise customers.

Common optimizations

There are two types of tools DataDog offers to optimize log costs.

1. Tier selection

If you need long retention for backup or for compliance, it should be obvious to use Flex or Archive instead of Standard.

Setting this up is fairly straightforward in DataDog’s Log Indexes settings.

2. Exclusion Filters

The second big tool is Exclusion Filters.

You can filter logs by their attributes, like when service name is X or severity > WARN. You can either stop indexing logs entirely or sample them by any percent that you choose. But if you’re going to do sampling for logs, the exclusion filter is going to “add holes” to your requests and transactions. It will be better to do head sampling in the origin level, so as to capture full spans traces.

A common optimization pattern is to:

Generate metrics from logs
Use exclusion filters to stop indexing entirely. Log-generated metrics don’t need indexing.

If all you need is aggregated data, you can build dashboards and monitors directly from metrics instead of logs. In most cases, indexing costs are 80–90% of the total cost, so this is usually a great optimization.

Optimizing by reducing data

All the optimizations above assume that the incoming data is inevitable. But as any engineer working long enough in a large org knows, most logs are redundant.

They often include:

Duplicated info
Legacy debug logs
Repetitiveness
Useless data

You can spend some time to do a “cleanup” but in truth it’s hard to make it a priority. There’s always some new feature the engineering works on. The redundant logs becomes technical debt that just keeps on piling.

That’s where Obics comes in.

We handle the toil of finding duplicate and redundant telemetry data, surfacing it for review, and fixing it directly in the source code itself. Contact us and give it a try.