Architecting for cost

The cheapest dollar is the one a good design never spends. Every lever so far — discounts, rightsizing, allocation, the loop — reacts to an architecture that already exists. This final concept lesson moves cost upstream into the design itself, where it's cheapest to influence. The core idea: cost is a non-functional requirement. Just as you design for latency, availability, and security, you design for cost — and the choices you made in earlier chapters were all cost choices, whether you priced them or not.

Cost as a non-functional requirement

A non-functional requirement (NFR) is a property of how a system behaves rather than what feature it delivers — performance, reliability, security, scalability. Cost belongs on that list. A design that meets every functional spec but costs 5× what it should is a failed design, the same way one that's too slow or insecure is. Treating cost as an NFR means: at design time, you estimate what an architecture will cost and weigh it against alternatives, before you build — not after the bill arrives.

This reframes earlier chapters. Every "which option?" decision you learned to make on technical grounds also has a cost axis. Let's make the big ones explicit.

The recurring cost trade-offs

Serverless vs containers vs VMs

You met this compute ladder in Chapter 2. Its cost shape:

Serverless bills per request × duration and scales to zero — so at low or spiky volume it's often the cheapest by far (you pay nothing when idle). But its per-unit price is high, so at sustained high volume a constantly-busy function can cost more than a reserved VM doing the same work.
Containers/VMs have a lower per-unit rate but bill for running time whether busy or not, so they win at steady, high utilization (especially with commitments).

The durable rule: serverless for spiky/low/unpredictable, reserved compute for steady/high. The crossover point is dated and worth estimating per workload, but the shape — scale-to-zero wins when idle, reserved wins when busy — is permanent.

Managed vs self-hosted

From Chapter 2's "managed by default": a managed database/queue/cache costs a premium per unit over running the same software yourself on a raw VM. Architecting for cost makes the premium a conscious trade: you're buying back the operational time (patching, backups, failover) the provider handles. For most teams that's worth it — engineer-hours cost more than the premium. But at large scale, or with deep in-house expertise, self-hosting a high-volume component can be a major saving. The point isn't "always managed" or "always self-host" — it's price the premium and decide on purpose.

Storage tiers and lifecycle

From Storage: object storage has tiers trading retrieval cost against storage price. The cost-aware design move is a lifecycle policy — automatically transition data to colder tiers as it ages (hot → cool → archive) and delete it when it expires. Logs hot for 30 days, cool for a year, archived after, deleted at retention end. Without lifecycle rules, everything sits in the expensive hot tier forever — one of the most common quiet wastes.

Data locality and egress

From lesson 9.1: egress and cross-AZ/region transfer are charged, and they hide. This is where architecture matters most, because transfer cost is a function of topology:

Keep chatty services in the same AZ where high availability allows — two services gossiping across AZs pay per GB both ways.
Put compute near its data — pulling terabytes across regions to process them is pure egress waste; move the compute to the data instead.
Use a CDN (Chapter 1's edge) to serve repeated content from cache, cutting origin egress dramatically.
Mind the NAT gateway — route a private service's heavy traffic so it doesn't pay NAT per-GB processing on top of egress.

A design that ignores data-transfer topology can spend more moving bytes around than on the compute that uses them — and it's invisible until you trace it.

Autoscaling and scale-to-zero

The idle-is-the-enemy rule, designed in: build systems that match capacity to demand automatically (HPA/Karpenter from rightsizing) and, where possible, scale to zero when there's no load — serverless by default for bursty work, scheduled shutdown for non-prod. The architectural choice to make a component able to scale to zero (stateless, fast-starting) is itself a cost decision made at design time.

A worked example: two designs for the same feature

A team needs a nightly report-generation feature. Two architectures meet the functional spec identically:

Design A (cost-blind). A pair of large VMs running 24/7, each with an oversized boot disk; report output written to the hot storage tier and kept forever; the VMs in a different region from the source data, pulling it across regions each night. Functionally perfect. Cost: high baseline compute (idle 23 hours a day), growing hot storage, and a fat cross-region egress bill.

Design B (cost-aware). The job runs as a serverless/batch task that spins up only at night and scales to zero the rest of the day; output lands in hot storage with a lifecycle rule to archive after 30 days and delete after a year; the compute runs in the same region as the source data, so there's no cross-region egress. Same report, same SLA.

Design B can easily cost a fraction of Design A — and the only difference is design decisions made before any code shipped: scale-to-zero compute, storage lifecycle, and data locality. No discount, no rightsizing pass, no cleanup. That's the leverage of treating cost as an NFR: you avoid the waste instead of removing it later.

:::tip The cost-aware design checklist At design review, ask the five questions: (1) Spiky or steady — serverless-to-zero or reserved compute? (2) Managed premium worth it here, or self-host? (3) Is there a storage lifecycle policy, or will data rot in the hot tier? (4) Where does data move — any avoidable cross-AZ/region egress, is compute near its data, is there a CDN? (5) Can this scale to zero when idle? Pricing the design before building is the highest-leverage FinOps move there is — and tools like Infracost (from the loop) put that estimate right in the pull request. :::

Common pitfalls

Sustained heavy serverless. Using scale-to-zero serverless for a constantly-busy workload where a reserved VM would be cheaper. Match the model to the load shape.
No storage lifecycle. Everything in the hot tier forever because nobody set a transition/expiry rule.
Egress-blind topology. Cross-AZ chatter and cross-region data pulls that cost more than the compute, never traced.
Managed premium on autopilot. Paying managed premiums at a scale where self-hosting a hot component would save a lot — without ever pricing the alternative.
Cost considered only after launch. Designing purely for features, then reacting to the bill, instead of treating cost as an NFR at design time.

Why it matters

Cost is a non-functional requirement you design for, like latency or security — and every earlier-chapter choice was secretly a cost choice. Serverless/scale-to-zero wins for spiky/idle work; reserved compute wins for steady/high. Managed buys back ops time at a premium — price it, decide on purpose. Storage lifecycle policies move data hot→cool→archive→delete instead of rotting in the expensive tier. Data topology — keeping chatter local, compute near data, and CDNs in front — controls the egress costs that hide. Designing for cost avoids waste before it's spent, which is far cheaper than removing it later; Infracost brings that estimate into the PR. With this, the FinOps loop closes — visibility, optimization, governance, and design all reinforcing each other. Take the checkpoint to lock it in.

Where this connects: this loops back to the FinOps lifecycle (design is the most upstream point of Optimize) and forward to MLOps/LLMOps, where GPU and inference costs make cost-aware architecture non-optional.

Next: Chapter 9 checkpoint →

Cost as a non-functional requirement​

The recurring cost trade-offs​

Serverless vs containers vs VMs​

Managed vs self-hosted​

Storage tiers and lifecycle​

Data locality and egress​

Autoscaling and scale-to-zero​

A worked example: two designs for the same feature​

Common pitfalls​

Why it matters​