Skip to main content

Pricing models & commitments

The same virtual machine, doing the same work, can cost you three very different amounts depending only on how you agree to pay for it. This is the single biggest lever on the compute meter you met in the last lesson, and it's almost free money — the discounts are large and the work is mostly analysis, not engineering. This lesson teaches the three pricing models, how to combine them, and the two mistakes that turn this lever into a liability.

The same compute, three prices

There are three fundamental ways to buy a unit of cloud compute. Picture them as a spectrum from "maximum flexibility, maximum price" to "maximum savings, minimum flexibility."

On-demand\npay fullrate,\nzerocommitment,\nstopCommitment\n~30–60%off\nfor a 1–3yr\nusage promiseSpot /preemptible\n~60–90%off,\nbut can

On-demand: the flexible default

On-demand is the baseline you've been using all along: you pay the full published rate per second, commit to nothing, and can start or stop whenever you like. It's the right choice for unpredictable, short-lived, or brand-new workloads where you don't yet know your steady-state usage. Its virtue is flexibility; its cost is that it's the most expensive way to pay for anything you run continuously. Anything you run 24/7 on on-demand is leaving a large discount on the table.

Commitment discounts: a promise in exchange for a discount

If you know you'll run a baseline amount of compute for a long time, you can promise the provider that usage up front in exchange for a steep discount — typically 30–60% off for a one- or three-year commitment. There are three flavors, which are the same idea per cloud:

  • Reserved Instances (RIs) — the original form: you commit to a specific instance type in a specific region for 1 or 3 years. Biggest discount, least flexible (you're locked to that shape).
  • Savings Plans (AWS) — you commit to a steady dollars-per-hour of spend rather than a specific instance. More flexible: the discount automatically applies across instance families/sizes as your fleet changes. Slightly smaller discount than a matched RI.
  • Committed Use Discounts (CUDs) (GCP) and Reservations (Azure) — the equivalent commitment programs on the other clouds.

The durable idea: a commitment is a financial instrument, not a technical one. You're not reserving a physical machine — you're pre-buying usage at a discount. You still pay even if you don't use it, which is the whole risk.

The two numbers that govern commitments are coverage and utilization — and confusing them is the most common FinOps mistake.

  • Coverage = what fraction of your eligible usage is covered by a commitment (vs paid at full on-demand rate). Low coverage means you're overpaying on usage you could have discounted.
  • Utilization = what fraction of your commitment you actually used. Low utilization means you committed to more than you needed and are paying for idle promises.

:::tip How to think about coverage vs utilization You want high utilization (don't waste what you bought) and reasonably high coverage (discount what you can) — but they pull against each other. The safe target is to commit only to your steady-state baseline — the floor of usage that's always on — and leave the variable top of your demand on on-demand or spot. Commit the valley, flex the peak. That keeps utilization near 100% (you always use the baseline) while still covering most spend. :::

:::warning The "buy commitments blindly" trap The classic failure is buying a 3-year RI for a workload that gets re-architected in 6 months, or committing to last quarter's peak instead of this quarter's floor. Now you pay for a promise nothing is using — utilization craters and the "discount" became a loss. Never buy a commitment without first analyzing your steady-state baseline and how stable it is. Tools like AWS Cost Explorer's recommendations or third parties model this; the discipline (commit the durable floor only) is what matters. :::

Spot / preemptible: cheap, but it can vanish

Spot instances (AWS) — also called preemptible (GCP) or Spot VMs (Azure) — sell the provider's spare capacity at a huge discount, often 60–90% off on-demand. The catch: the provider can reclaim the instance at any time, with little warning (typically a ~2-minute notice), when it needs the capacity back for a full-price customer. You're renting the empty seats on the plane — dirt cheap, but you can be bumped.

This makes spot perfect for fault-tolerant, interruptible work and a trap for anything that can't survive a sudden shutdown.

Safe for spot (interruption is a non-event):

  • Batch jobs, data processing, CI/CD build runners, ML training (checkpoint and resume)
  • Stateless web workers behind a load balancer, if losing one replica just sheds a little capacity
  • Anything that can checkpoint progress and restart

Unsafe for spot (interruption causes harm):

  • A single-instance stateful database
  • A long job with no checkpointing (you lose all progress)
  • Anything where losing the node loses data or breaks a user's in-flight request

Handling interruption is the engineering part: listen for the provider's termination notice, stop accepting new work, drain or checkpoint in-flight work, and let the scheduler replace the node elsewhere. In Kubernetes (Chapter 4), node-autoscalers like Karpenter and tools like Spot.io automate this — they run a pool partly on spot, catch the reclaim notice, cordon and drain the doomed node, and launch a replacement, so a fault-tolerant workload barely notices. Designing workloads to expect node loss is what makes the 60–90% discount usable.

Combining the three: the layered strategy

A mature cost posture doesn't pick one model — it layers all three to match the shape of demand:

Spiky peak\n→ SPOT(fault-tolerant) oron-demandVariable daytimeload\n→ ON-DEMANDAlways-onbaseline\n→COMMITMENTS (RI /
  • The always-on baseline (the floor of usage that never goes to zero) → commitments, for the big steady discount.
  • The variable middle (normal daytime fluctuation) → on-demand, for flexibility.
  • The interruptible peak and batch workspot, for the deepest discount.

This is the durable strategy: commit the valley, flex the middle, spot the fault-tolerant peak. The exact discount percentages and program names are intensely dated; the layering logic is not.

A worked example

A team runs a steady 10 web servers around the clock, scaling up to 25 during business hours, and runs a nightly batch job on 40 nodes for data processing. On pure on-demand, suppose that's $30k/month. Restructure it:

  • 10 baseline servers → 3-year commitment at ~50% off → those 10 now cost ~$7.5k instead of $15k.
  • The +15 daytime servers → on-demand (they come and go) → unchanged.
  • The 40-node nightly batch → spot at ~70% off (it checkpoints, so a reclaim just resumes) → a fraction of its old cost.

Same workload, same performance, roughly 40% off the bill — and zero new features written. The only "work" was analyzing the demand shape and matching each layer to the right pricing model. That ratio of savings-to-effort is why pricing-model strategy is the first lever a FinOps practice pulls.

Common pitfalls

  • Over-committing. Buying a long commitment for a workload that won't last that long — utilization collapses and the discount becomes a loss. Commit the stable floor only.
  • Under-covering. Running an obviously steady 24/7 fleet entirely on-demand for years, leaving 30–60% on the table out of inertia.
  • Spot where it isn't safe. Putting a stateful single-instance database on spot and losing data when it's reclaimed. Match spot to fault-tolerant work only.
  • No interruption handling. Using spot but ignoring the termination notice, so reclaims cause hard failures instead of graceful drains.
  • Confusing coverage with utilization. Optimizing one while ignoring the other; you need both healthy.

Why it matters

The same compute has three prices: on-demand (flexible, full price — right for the unpredictable middle), commitments (RIs / Savings Plans / CUDs — 30–60% off for promising your steady-state floor, governed by coverage and utilization), and spot/preemptible (60–90% off the provider's spare capacity, reclaimable anytime — only for fault-tolerant, interruption-handling workloads). Mature teams layer all three to the shape of demand: commit the valley, flex the middle, spot the peak. The discounts are huge and the work is mostly analysis — but buying commitments blindly or running stateful work on spot turns the lever into a liability. Next: shrinking the meter itself by matching resource size to real demand — rightsizing.

Where this leads: commitments discount whatever you run, so they pay off most after you've rightsized — never commit to an over-provisioned fleet.

Next: Rightsizing & eliminating waste →