Architecting for cost
The cheapest dollar is the one a good design never spends. Every lever so far — discounts, rightsizing, allocation, the loop — reacts to an architecture that already exists. This final concept lesson moves cost upstream into the design itself, where it's cheapest to influence. The core idea: cost is a non-functional requirement. Just as you design for latency, availability, and security, you design for cost — and the choices you made in earlier chapters were all cost choices, whether you priced them or not.
Cost as a non-functional requirement
A non-functional requirement (NFR) is a property of how a system behaves rather than what feature it delivers — performance, reliability, security, scalability. Cost belongs on that list. A design that meets every functional spec but costs 5× what it should is a failed design, the same way one that's too slow or insecure is. Treating cost as an NFR means: at design time, you estimate what an architecture will cost and weigh it against alternatives, before you build — not after the bill arrives.
This reframes earlier chapters. Every "which option?" decision you learned to make on technical grounds also has a cost axis. Let's make the big ones explicit.
The recurring cost trade-offs
Serverless vs containers vs VMs
You met this compute ladder in Chapter 2. Its cost shape:
- Serverless bills per request × duration and scales to zero — so at low or spiky volume it's often the cheapest by far (you pay nothing when idle). But its per-unit price is high, so at sustained high volume a constantly-busy function can cost more than a reserved VM doing the same work.
- Containers/VMs have a lower per-unit rate but bill for running time whether busy or not, so they win at steady, high utilization (especially with commitments).
The durable rule: serverless for spiky/low/unpredictable, reserved compute for steady/high. The crossover point is dated and worth estimating per workload, but the shape — scale-to-zero wins when idle, reserved wins when busy — is permanent.
Managed vs self-hosted
From Chapter 2's "managed by default": a managed database/queue/cache costs a premium per unit over running the same software yourself on a raw VM. Architecting for cost makes the premium a conscious trade: you're buying back the operational time (patching, backups, failover) the provider handles. For most teams that's worth it — engineer-hours cost more than the premium. But at large scale, or with deep in-house expertise, self-hosting a high-volume component can be a major saving. The point isn't "always managed" or "always self-host" — it's price the premium and decide on purpose.
Storage tiers and lifecycle
From Storage: object storage has tiers trading retrieval cost against storage price. The cost-aware design move is a lifecycle policy — automatically transition data to colder tiers as it ages (hot → cool → archive) and delete it when it expires. Logs hot for 30 days, cool for a year, archived after, deleted at retention end. Without lifecycle rules, everything sits in the expensive hot tier forever — one of the most common quiet wastes.
Data locality and egress
From lesson 9.1: egress and cross-AZ/region transfer are charged, and they hide. This is where architecture matters most, because transfer cost is a function of topology:
- Keep chatty services in the same AZ where high availability allows — two services gossiping across AZs pay per GB both ways.
- Put compute near its data — pulling terabytes across regions to process them is pure egress waste; move the compute to the data instead.
- Use a CDN (Chapter 1's edge) to serve repeated content from cache, cutting origin egress dramatically.
- Mind the NAT gateway — route a private service's heavy traffic so it doesn't pay NAT per-GB processing on top of egress.
A design that ignores data-transfer topology can spend more moving bytes around than on the compute that uses them — and it's invisible until you trace it.
Autoscaling and scale-to-zero
The idle-is-the-enemy rule, designed in: build systems that match capacity to demand automatically (HPA/Karpenter from rightsizing) and, where possible, scale to zero when there's no load — serverless by default for bursty work, scheduled shutdown for non-prod. The architectural choice to make a component able to scale to zero (stateless, fast-starting) is itself a cost decision made at design time.
A worked example: two designs for the same feature
A team needs a nightly report-generation feature. Two architectures meet the functional spec identically:
Design A (cost-blind). A pair of large VMs running 24/7, each with an oversized boot disk; report output written to the hot storage tier and kept forever; the VMs in a different region from the source data, pulling it across regions each night. Functionally perfect. Cost: high baseline compute (idle 23 hours a day), growing hot storage, and a fat cross-region egress bill.
Design B (cost-aware). The job runs as a serverless/batch task that spins up only at night and scales to zero the rest of the day; output lands in hot storage with a lifecycle rule to archive after 30 days and delete after a year; the compute runs in the same region as the source data, so there's no cross-region egress. Same report, same SLA.
Design B can easily cost a fraction of Design A — and the only difference is design decisions made before any code shipped: scale-to-zero compute, storage lifecycle, and data locality. No discount, no rightsizing pass, no cleanup. That's the leverage of treating cost as an NFR: you avoid the waste instead of removing it later.
:::tip The cost-aware design checklist At design review, ask the five questions: (1) Spiky or steady — serverless-to-zero or reserved compute? (2) Managed premium worth it here, or self-host? (3) Is there a storage lifecycle policy, or will data rot in the hot tier? (4) Where does data move — any avoidable cross-AZ/region egress, is compute near its data, is there a CDN? (5) Can this scale to zero when idle? Pricing the design before building is the highest-leverage FinOps move there is — and tools like Infracost (from the loop) put that estimate right in the pull request. :::
Common pitfalls
- Sustained heavy serverless. Using scale-to-zero serverless for a constantly-busy workload where a reserved VM would be cheaper. Match the model to the load shape.
- No storage lifecycle. Everything in the hot tier forever because nobody set a transition/expiry rule.
- Egress-blind topology. Cross-AZ chatter and cross-region data pulls that cost more than the compute, never traced.
- Managed premium on autopilot. Paying managed premiums at a scale where self-hosting a hot component would save a lot — without ever pricing the alternative.
- Cost considered only after launch. Designing purely for features, then reacting to the bill, instead of treating cost as an NFR at design time.
Why it matters
Cost is a non-functional requirement you design for, like latency or security — and every earlier-chapter choice was secretly a cost choice. Serverless/scale-to-zero wins for spiky/idle work; reserved compute wins for steady/high. Managed buys back ops time at a premium — price it, decide on purpose. Storage lifecycle policies move data hot→cool→archive→delete instead of rotting in the expensive tier. Data topology — keeping chatter local, compute near data, and CDNs in front — controls the egress costs that hide. Designing for cost avoids waste before it's spent, which is far cheaper than removing it later; Infracost brings that estimate into the PR. With this, the FinOps loop closes — visibility, optimization, governance, and design all reinforcing each other. Take the checkpoint to lock it in.
Where this connects: this loops back to the FinOps lifecycle (design is the most upstream point of Optimize) and forward to MLOps/LLMOps, where GPU and inference costs make cost-aware architecture non-optional.
Next: Chapter 9 checkpoint →