Caching & CDNs

Every primitive so far has done work on each request — compute runs code, a database executes a query, storage fetches an object. Caching is the primitive that lets you skip the work by keeping a ready-made copy of an answer close to whoever needs it, so the next identical request is served from the copy instead of recomputed from scratch. It's one of the highest-leverage tools in the cloud: a well-placed cache can make a system feel ten times faster, shrink its bill, and let it survive traffic spikes that would otherwise melt the database. It also introduces the single trickiest problem in the field — keeping those copies from going stale. This lesson builds both halves from the ground up.

Why caching exists: latency, load, and cost

A cache is a fast, temporary store that holds copies of data so future requests for that data are served quicker. The reason to bother comes down to three pressures that show up in every real system:

Latency. Recomputing an answer takes time — a database query might take 50 ms, an external API call 300 ms, rendering a page 100 ms. If the answer rarely changes, paying that cost on every request is waste the user feels. Reading the same answer from an in-memory cache can take well under a millisecond.
Load. A popular item — the homepage, a trending product, a celebrity's profile — can be requested thousands of times per second. Hitting the database for every one of those identical reads pins the database at 100% and starves everyone else. A cache absorbs the repeated reads so the database only sees the first one.
Cost. Every database query, every byte served from your origin servers, every call to a metered API costs money. Serving from a cache instead is dramatically cheaper per request — and a cache is often the difference between needing one database and needing ten.

The core insight is locality: real traffic is wildly uneven. A small fraction of your data ("hot" data) accounts for the overwhelming majority of requests, while most data ("cold" data) is rarely touched. Caching exploits this — keep the hot data close and fast, and let the cold data sit in the slow, cheap, authoritative store. The authoritative store — the database or origin that holds the real, current data — is called the origin or the source of truth; the cache is always just a disposable copy in front of it.

The two numbers that describe how well a cache is working:

A cache hit is when the requested data is in the cache and gets served from it (fast, cheap).
A cache miss is when it isn't, so the request falls through to the origin (slow), and usually the answer gets stored in the cache on the way back.
The hit ratio is the fraction of requests that are hits. A cache with a 95% hit ratio means the origin only handles 1 in 20 requests — a 20× reduction in load. Hit ratio is the headline health metric of any cache.

The cache hierarchy: many caches, not one

"The cache" is rarely a single thing. A request to a modern web app passes through a chain of caches, each closer to the user than the last, each catching what it can before the request travels further. From the user inward:

Browser cache — the user's own device keeps copies of files (images, scripts, stylesheets) it recently downloaded, so a repeat visit re-uses them with no network call at all. The fastest cache is the one that requires no request.
CDN / edge cache — a network of servers spread across the globe (more on this next) holds copies physically close to users, so content is served from a nearby city instead of your distant origin.
Application / in-memory cache — a fast key-value store (typically Redis) running right next to your application servers, holding the results of expensive computations and database queries.
Database — the authoritative source of truth at the bottom. Every cache above it exists to keep traffic off it.

The further out a request is satisfied, the faster and cheaper it is — and the less of your infrastructure it touches. The art of performance work is pushing each piece of data to the outermost cache that can safely hold it.

CDNs: caching at the edge of the network

A CDN (Content Delivery Network) is a globally distributed fleet of cache servers — called edge servers or points of presence (PoPs) — that sit between your users and your origin and serve cached copies of your content from a location physically near each user. The problem it solves is the speed of light: a user in Tokyo fetching from a server in Virginia pays an unavoidable round-trip delay measured in hundreds of milliseconds. Put a copy on an edge server in Tokyo and that drops to single-digit milliseconds — plus your origin never sees the request at all.

What a CDN caches

Static content — files that are the same for every user: images, video, CSS, JavaScript, fonts, downloads. This is the classic, no-brainer CDN win; static assets are made to be edge-cached.
Cacheable dynamic content — content generated by your app that's nonetheless identical across many users for a short window: a product page, an article, an API response that lists today's headlines. If a thousand users would all get the same generated HTML, the CDN can generate it once (via one origin request) and serve the rest from the edge.
Not user-specific or private content — a logged-in user's dashboard, their cart, anything personalized or sensitive. (This is a critical pitfall we return to below.)

How the CDN knows what (and how long) to cache: cache-control headers

The origin tells caches what they may do with a response using HTTP cache-control headers — instructions attached to the response. The ones to know:

Cache-Control: public, max-age=3600 — "anyone may cache this for 3600 seconds (one hour)." The max-age is the TTL (time to live): how long the copy is considered fresh before it must be re-checked.
Cache-Control: private — "this is for one user; a shared cache like a CDN must not store it." This is the header that keeps personalized content off shared caches.
Cache-Control: no-store — "never cache this at all" (for truly sensitive or always-fresh data).
ETag / Last-Modified — a fingerprint or timestamp the cache can use to ask the origin "has this changed?" and get a cheap "no, still good" (a 304 Not Modified) instead of re-downloading.

These headers are the contract between your origin and every cache between it and the user. Setting them correctly is most of the work of using a CDN well.

Origin shielding

When an edge server has a miss, it fetches from your origin. With hundreds of edge servers worldwide, a brand-new or just-expired item can trigger many edges hammering your origin at once. Origin shielding designates one mid-tier cache that all edges fetch through, so your origin sees at most one request per item no matter how many edges need it. It's the CDN's built-in defense against its own cache misses stampeding your origin — the same thundering-herd problem we'll meet again with application caches.

Application caching patterns

Closer in, your application code caches the results of expensive work — usually database queries — in a fast in-memory store like Redis. How the cache and database stay in sync defines three patterns, and the tradeoffs between them are a staple interview topic.

Cache-aside (lazy loading) — the default

In cache-aside (also called lazy loading), the application manages the cache explicitly, and data is loaded into it only when first requested (hence "lazy"). On every read:

Check the cache for the key.
Hit → return the cached value. Done.
Miss → read from the database, write the value into the cache (with a TTL), then return it.

Writes go straight to the database and either delete or update the matching cache entry. This is the most common pattern because it's simple, and the cache only ever holds data that's actually been requested. Its weaknesses: the first request for any item is always a miss (a "cold" cache is slow until it warms up), and there's a window where the cache and database can disagree.

Write-through — consistency over write speed

In write-through, every write goes to the cache and the database together (synchronously) before the write is acknowledged. The cache is always populated with the latest data, so reads are almost always hits and rarely stale. The cost: writes are slower (you pay for two writes), and you cache data that may never be read.

Write-back (write-behind) — write speed over durability

In write-back (or write-behind), writes go to the cache immediately and are flushed to the database later, asynchronously, in batches. Writes feel instant and the database load smooths out — great for write-heavy workloads. The danger: if the cache fails before flushing, those writes are lost. Write-back trades durability for speed and is used only where some data loss is tolerable.

Pattern	Read path	Write path	Best for	Main risk
Cache-aside	Lazy: populate on miss	DB write + invalidate cache	General-purpose default	Cold-start misses; brief staleness
Write-through	Almost always a hit	Write cache + DB together	Read-heavy, freshness matters	Slower writes; caches unread data
Write-back	Almost always a hit	Write cache now, DB later	Write-heavy, loss-tolerant	Data loss if cache dies before flush

:::tip Start with cache-aside For the vast majority of applications, cache-aside with a sensible TTL is the right first choice. It's the simplest to reason about, only caches data that's actually used, and its failure mode (an occasional miss falling through to the database) is gentle. Reach for write-through or write-back only when a specific consistency or write-throughput need justifies the extra complexity. :::

TTLs: putting an expiry on every copy

A cached copy is only correct until the underlying data changes — so you almost never cache something forever. A TTL (time to live) is an expiry stamped on each cache entry: after the TTL elapses, the entry is considered stale and is dropped (or re-validated) on next access, forcing a fresh read from the origin. The TTL is your simplest, most robust freshness knob, and choosing it is a direct tradeoff:

Short TTL → fresher data, lower hit ratio (more origin load), so you pay more in latency and load to stay current.
Long TTL → higher hit ratio (less origin load, faster), but users may see stale data for longer.

You tune the TTL to how tolerant the data is of being slightly old. A stock ticker might use a 1-second TTL; a product description that changes monthly can happily cache for hours. The honest truth is that for a huge amount of data, a short TTL is the entire invalidation strategy — you accept a few seconds of staleness and let the clock fix everything. That sidesteps the hard problem we turn to now.

The hard part: invalidation

There's a famous joke in computing: "There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors." Cache invalidation — removing or updating a cached copy when the underlying data changes, so nobody is served a stale answer — earns its place on that list. The difficulty is fundamental: the data lives in (at least) two places now, and the moment the origin changes, every cached copy of the old value is wrong until something fixes it.

You have three broad strategies, in increasing order of freshness and difficulty:

TTL expiry (passive). Don't actively invalidate at all — just let entries expire. Simple and bulletproof, at the cost of a staleness window equal to the TTL. The right default for most data.
Active invalidation on write. When you update the database, also explicitly delete (or update) the matching cache entry, so the next read misses and re-fetches fresh. Far fresher than TTL alone, but now your write path must know every cache key the change affects — and getting that mapping wrong leaves a stale entry serving forever. (CDNs expose this as a purge or cache invalidation API: "drop the cached copy of /products/42 everywhere, now.")
Versioned keys (cache busting). Instead of invalidating, change the key so the old entry is simply never requested again. Static assets do this by embedding a content hash in the filename — app.9f3c1.js — so deploying new code references a brand-new URL and the old cached file is harmlessly ignored. No invalidation needed; the key change is the invalidation.

:::warning Invalidation is where caching bugs live The overwhelming majority of caching bugs are stale-data bugs: someone updated the database but forgot to invalidate one of the caches holding the old value, and users see wrong data with no error in the logs. When in doubt, prefer the strategies that can't go stale silently — a short TTL or versioned keys — over hand-maintained active invalidation, which is correct only as long as every write path remembers every affected key. :::

Stampedes and the thundering herd

A subtle, dangerous failure: a popular key's cache entry expires, and in the instant before anything re-populates it, thousands of concurrent requests all miss at once and all stampede the database to recompute the same value simultaneously. This is the cache stampede (or thundering herd), and it can take down a database that was perfectly healthy a second earlier — the cache that was protecting it suddenly stops, all at once, for the single hottest item. The standard defenses:

Request coalescing (single-flight) — when many requests miss the same key together, let only one go to the origin to recompute; the rest wait and share its result.
Locking — the first miss takes a short lock to recompute and refill; others briefly serve the stale value or wait.
Jittered / staggered TTLs — add a small random offset to TTLs so a batch of items written together don't all expire on the exact same tick and stampede in unison.
Early/background refresh — refresh a hot entry before it expires, so it's never actually absent.

The lesson: the moment a cache becomes load-bearing, you must plan for what happens when its hottest entries expire. A naïve cache makes the common case fast and the expiry case catastrophic.

A traced example: cache-aside, a CDN hit, and an expiry

Let's trace one product page through the hierarchy to make the moving parts concrete.

Request 1 — cold, an all-miss path. Alice in Tokyo opens /products/42.

Her browser cache is empty for this URL → miss. The request goes to the CDN.
The CDN edge in Tokyo has never seen /products/42 → miss. It fetches from the origin (through origin shielding).
Your app server runs the handler. It checks Redis for key product:42 → miss. So it queries the database (50 ms), gets the product, writes it into Redis with TTL=300s, and renders the HTML.
The app returns the page with Cache-Control: public, max-age=60. The CDN stores it at the Tokyo edge for 60 s; Alice's browser caches it too.

Total: slow (database + render + a trans-Pacific origin trip), but every cache along the way is now warm.

Request 2 — warm, a CDN hit. Bob, also in Tokyo, opens /products/42 four seconds later.

His browser has never visited → browser miss. Request goes to the CDN.
The Tokyo edge has a fresh copy (4 s old, TTL 60 s) → HIT. Bob gets the page in a few milliseconds. Your origin, app server, Redis, and database are never touched.

That's the payoff: the second-through-thousandth viewer in that 60-second window is served entirely from the edge.

Request 3 — the CDN entry expires. Ninety seconds in, the edge copy's 60 s TTL has elapsed.

Carol opens /products/42. The Tokyo edge entry is stale/expired → miss. The edge re-fetches from the origin.
At the app server, the Redis entry for product:42 is 90 s old but its TTL was 300 s → still a HIT. The app skips the database, renders from cached data, and returns the page.
The edge re-stores the fresh response for another 60 s.

Notice the layered TTLs working together: the short CDN TTL (60 s) keeps edge data current, while the longer Redis TTL (300 s) means even a CDN re-fetch usually still avoids the database. Carol's request cost one origin round-trip and zero database queries.

Mapping onto today's products

With the concepts in hand, the catalog is just labels. The patterns are durable; these product names are dated — they change, the concepts don't.

Layer	AWS	GCP	Azure	Cross-cloud / open
CDN (edge cache)	CloudFront	Cloud CDN	Azure CDN / Front Door	Cloudflare, Fastly, Akamai
In-memory cache (managed)	ElastiCache	Memorystore	Azure Cache for Redis	self-managed Redis / Memcached
Cache engine	Redis or Memcached	Redis or Memcached	Redis	Redis, Memcached

Two engine choices worth knowing by name, because the decision recurs everywhere:

Memcached — a bare, blazing-fast in-memory key-value cache. Simple, multi-threaded, no persistence, no rich data types. Great when you want a pure, dumb, fast cache and nothing more.
Redis — also an in-memory store, but with rich data structures (lists, sets, sorted sets, hashes), optional persistence, pub/sub, atomic operations, and clustering. It's the default modern choice: it does everything Memcached does and far more, which is why the managed services above mostly center on it.

The honest rule: reach for a managed cache service (ElastiCache / Memorystore / Azure Cache for Redis) for the same reason you reach for a managed database — the provider carries the operational weight of running, patching, and replicating it. And default to Redis unless you have a specific reason to want Memcached's stripped-down simplicity.

Common pitfalls: when caching hurts

Caching is so useful that it gets over-applied, and a misused cache is worse than none — it serves wrong answers fast. The recurring traps:

Stale data served as truth. The defining caching bug: the origin changed, a cache still holds the old value, and users see stale data with no error anywhere. It hides because nothing crashes. Defend with conservative TTLs, disciplined invalidation, or versioned keys — and never cache data whose staleness causes real harm (a bank balance, an inventory count at checkout) without a deliberate, short freshness window.
Caching user-specific or private data in a shared cache. The most dangerous mistake: caching a personalized response (Alice's dashboard, her cart, a response containing her PII) in a shared cache like a CDN, so Bob's request hits Alice's cached page and is served her private data. This is a real, recurring security incident. Mark personalized responses Cache-Control: private (or no-store), and only edge-cache content that's genuinely identical for all users.
Caching to paper over a slow query. A cache is not a fix for a missing database index or an O(n²) query — it just hides the slowness until the cache misses, at which point the original problem returns at the worst moment (under load, during a stampede). Fix the underlying query first; cache to handle scale, not to excuse inefficiency. A cache as a crutch for a slow origin is a time bomb.
Caching things that barely repeat. A cache only helps data that's read far more than it's written, with real locality. Caching data that's read once, or written as often as read, spends memory and complexity (and invalidation risk) for a near-zero hit ratio. Measure the hit ratio; if it's low, the cache isn't earning its keep.
Ignoring the stampede. Adding a cache without planning for hot-key expiry (request coalescing, jittered TTLs) gives you a system that's fast right up until its busiest item expires and a thundering herd takes down the database. If the cache is load-bearing, the stampede defense is not optional.

The throughline: a cache is a copy, and every copy can be wrong. Add caching deliberately, cache only what's safe and repeatedly-read, give every copy an expiry, and have a plan for keeping it honest.

Why it matters

Caching lets a system skip work by keeping a ready-made copy of an answer close to whoever needs it — cutting latency, shielding the origin from load, and slashing cost by exploiting the fact that a little hot data drives most traffic. It works as a hierarchy: browser → CDN/edge (caching static and cacheable-dynamic content near users, governed by cache-control headers and TTLs, with origin shielding against miss storms) → application/in-memory cache (Redis, via cache-aside by default, or write-through / write-back when their tradeoffs fit) → the database as source of truth. Every copy needs an expiry (TTL) and a story for invalidation — the genuinely hard problem — plus a defense against the thundering herd when hot keys expire. These map onto CloudFront / Cloud CDN / Cloudflare and ElastiCache / Memorystore / Azure Cache for Redis (mostly Redis, sometimes Memcached). Used well, caching is one of the biggest levers you have; used carelessly, it serves stale or private data fast — so cache only what's safe and repeatedly read, never as a crutch for a slow query. This is the primitive that makes systems fast and survivable under load, and it reappears as a core tool of scaling in Chapter 11. It pairs with the databases it protects and slots into the service map as a specialization of storage and networking.

Quiz

Required checkpoint

Caching & CDNs

Pass to unlock the Next button below

You can now make a system dramatically faster and cheaper by serving from copies — while keeping those copies honest with TTLs, careful invalidation, and a stampede defense. One primitive remains: the permission system that gates every action, including every read of every cache and origin.

Next: IAM: who can do what →

Why caching exists: latency, load, and cost​

The cache hierarchy: many caches, not one​

CDNs: caching at the edge of the network​

What a CDN caches​

How the CDN knows what (and how long) to cache: cache-control headers​

Origin shielding​

Application caching patterns​

Cache-aside (lazy loading) — the default​

Write-through — consistency over write speed​

Write-back (write-behind) — write speed over durability​

TTLs: putting an expiry on every copy​

The hard part: invalidation​

Stampedes and the thundering herd​

A traced example: cache-aside, a CDN hit, and an expiry​

Mapping onto today's products​

Common pitfalls: when caching hurts​

Why it matters​

Quiz​