Compute: VMs, containers & serverless

Compute is the primitive that runs your code — the rented processing power that actually executes your application. There are three dominant ways to get it, and they form a ladder where each rung hands more of the operational burden to the provider and gives you a little less control in exchange. Knowing the three, and when each fits, is one of the most consequential skills in cloud engineering.

The ladder: VM → container → serverless

The three options differ in how much you manage versus how much the provider manages — the same trade-off as the IaaS/PaaS ladder from Chapter 1, applied specifically to running code.

Virtual machines: a whole computer to yourself

A virtual machine (VM) is a complete, virtualized computer — its own operating system, its own disk, full administrative access. From Chapter 1 you know it's a slice of a physical host carved out by virtualization. When you launch a VM, you get a blank (or pre-imaged) Linux or Windows box, and everything above the hardware is yours: install the OS packages, the runtime, your app; handle patching, scaling, and uptime.

You control: the entire OS and everything on it. Maximum flexibility — run literally anything.
You're responsible for: patching the OS, securing it, scaling it, keeping it alive. Maximum operational burden.
Best for: legacy apps that expect a full OS, workloads needing special OS-level configuration, or when you simply want total control. It's the lowest-level, most general option.

Brand names: EC2 (AWS), Compute Engine (GCP), Virtual Machines (Azure).

Containers: your app and its dependencies, packaged

A container packages your application together with everything it needs to run — the runtime, libraries, and dependencies — into a single, portable unit, but without bundling a whole operating system. Containers on the same host share the host's OS kernel, which makes them dramatically lighter and faster to start than VMs (seconds or less, versus minutes), while still being isolated from each other.

The crucial property is "it runs the same everywhere." Because the container carries its own dependencies, the bug-causing phrase "but it works on my machine" largely disappears: the container that ran on your laptop runs identically in the cloud. Containers are the foundation of modern deployment and the whole subject of Chapter 4.

You control: what goes inside the container (your app + its deps). You don't manage a full guest OS per app.
You're responsible for: the container image, and (without extra tooling) where containers run and how they scale — which is exactly the problem Kubernetes solves in Chapter 4.
Best for: modern applications, microservices, anything you want to be portable and to scale by running many identical copies.

Serverless functions: just your code

Serverless (specifically, Functions as a Service) is the far end of the ladder: you upload only your code — a single function — and the provider runs it on demand, automatically handling all the servers, scaling, and capacity. You don't choose a machine size or keep anything running; when an event arrives (an HTTP request, a file upload, a scheduled timer), the provider spins up your function, runs it, and tears it down. You pay only for the milliseconds it actually runs and nothing when it's idle.

"Serverless" is a slight misnomer — there are servers, you just never see or manage them.

You control: only your function's code and its configuration.
You're responsible for: essentially nothing operationally. The provider scales from zero to thousands of concurrent runs automatically.
Best for: event-driven work, spiky or unpredictable traffic, glue code between services, and anything where you'd rather not run a server at all.
Watch out for: the cold start — the first invocation after idleness has to initialize, adding latency; long-running or stateful work fits poorly; and per-invocation pricing can surprise you at very high, steady volume.

Brand names: Lambda (AWS), Cloud Functions / Cloud Run (GCP), Azure Functions (Azure).

The trade-off in one table

	Virtual Machine	Container	Serverless
You manage	The whole OS + app	The app + its deps	Just the function code
Provider manages	Hardware, hypervisor	+ the host OS	+ everything; scales to zero
Startup time	Minutes	Seconds	Milliseconds (after warm-up)
Scaling	Manual / you configure	Orchestrated (e.g. K8s)	Automatic, to zero
Pay for	The VM while it's on	The hosts running containers	Only execution time
Control	Most	Medium	Least

A decision rule

You don't have to agonize. A serviceable default heuristic:

Reach for serverless when the workload is event-driven, spiky, or small, and you don't want to operate anything. It's often the cheapest and simplest place to start.
Reach for containers when you have a real application, especially several services, that you want portable and scalable — the mainstream choice for modern apps, leading naturally into Kubernetes.
Reach for VMs when you need full OS control, you're lifting an existing/legacy app as-is, or a workload doesn't fit the other two (special hardware, long-lived stateful processes).

:::tip Durable vs dated The three models — VM, container, serverless — and their trade-offs are durable; this ladder has been stable for years and structures how the whole industry thinks about compute. The specific products, pricing, and limits (cold-start times, max function duration, instance families) are dated and change often — look them up when you build. :::

:::note "Serverless containers": the line is blurring Newer services (like AWS Fargate or Google Cloud Run) let you run containers with serverless operational characteristics — you provide a container, the provider runs and scales it without you managing hosts. Don't let this confuse the mental model: it's the container packaging with the serverless operations model. The two axes (how you package vs how much you operate) are increasingly independent. :::

When serverless is the wrong choice

Serverless is genuinely magical for the right workload, and exactly that magic makes it frequently over-adopted — the mirror image of the Kubernetes mistake. "No servers to manage" is so attractive that teams reach for it reflexively, then hit walls that were predictable from the start. Here are the cases where serverless is the wrong tool, with the reasoning, because knowing the limits is what separates a serviceable default from a costly default.

Latency-sensitive paths and the cold start. When a function has been idle, the provider has nothing running for it; the next request must spin up a fresh execution environment — load the runtime, your code, and dependencies — before a single line runs. That cold start can add tens to hundreds of milliseconds (worse for heavy runtimes and large dependency bundles). For background and event-driven work, nobody notices. For a user-facing request on a tight latency budget, an occasional half-second stall is a real defect. You can mitigate it (provisioned concurrency on Lambda, minimum instances on Cloud Functions / Cloud Run, "always-ready" instances on Azure Functions) — but paying to keep functions warm is quietly re-introducing the always-on server you were trying to avoid.
Vendor lock-in. A serverless function is rarely just your code — it's your code wired into one provider's event sources, triggers, IAM, and deployment model. Move from Lambda to Cloud Functions and the handler signature, the event payload shapes, and the surrounding glue all change. This is lock-in, and it's deeper than it looks because the architecture around the function (which managed services trigger it, how it's permissioned) is provider-specific too. Mitigate it by keeping your business logic in plain, portable functions and treating the provider's handler as a thin adapter, or by adopting a portability layer (the Serverless Framework, SST, or running containers on Cloud Run / Fargate so the unit stays portable even if the operations are managed). You rarely eliminate lock-in here; you decide consciously how much to accept.
The execution-time ceiling. Functions are designed for short, bursty work and the platform enforces it: a single invocation has a hard maximum duration (on the order of ~15 minutes for Lambda; Cloud Functions and Azure Functions have their own caps, and HTTP-triggered paths are often shorter still). A job that legitimately needs to run for an hour — a large batch transform, a long video encode, a slow data migration — simply cannot complete in one invocation. You can sometimes chunk the work and chain invocations, but that's added complexity papering over a tool mismatch. Long-running jobs belong on a container or VM (or a purpose-built batch service), not a function.
Statefulness. A function is stateless and ephemeral by design — it spins up, runs, and is torn down, and anything held in its memory or local disk vanishes. That's a feature for scaling to zero, but it means serverless is the wrong home for anything that must keep state between requests: an in-memory cache you want to reuse, a long-lived WebSocket or streaming connection, a stateful game-server session. You can push state into an external store (a database, a cache like Redis, object storage), but if the workload is fundamentally about holding live state, you're fighting the model — reach for a long-lived container or VM instead.
The high-volume cost crossover. Per-invocation pricing is a bargain when traffic is spiky or low, because you pay nothing while idle. But the same metered pricing inverts at high, steady volume: a function billed per request and per millisecond, run constantly at scale, can cost substantially more than a right-sized always-on server or container fleet doing the same work, because an always-on machine amortizes its fixed cost across a saturated workload while serverless keeps charging per unit. There's a crossover point — below it serverless is cheaper, above it always-on compute wins — and steady high-throughput services often sit well past it. (This per-unit-vs-committed economics theme returns in Chapter 9, FinOps.)
Local development and debugging friction. Because the runtime, the event sources, and the IAM all live in the provider's cloud, faithfully reproducing a function on your laptop is awkward. Emulators (AWS SAM local, the Functions Core Tools, Cloud Functions' local emulator) help but never perfectly match production triggers and permissions, so a class of bugs only appears once deployed. Step-through debugging across a chain of event-driven functions is harder than attaching a debugger to one long-running process. The developer-experience tax is real and worth weighing for anything non-trivial.
Concurrency and connection-pool gotchas. Serverless scales out by running many independent copies of your function concurrently — and each copy opens its own connections. Point a few thousand concurrent invocations at a traditional database and each grabs a connection: you exhaust the database's connection limit almost instantly, because functions can't share a pool the way a single long-lived app process can. The fixes are real engineering (a managed connection proxy like RDS Proxy, a serverless/HTTP-native database, or capping function concurrency), but the gotcha bites teams who assumed "infinitely scalable functions" meant "infinitely scalable system." Anything downstream with a hard concurrency limit is a constraint serverless will happily blow past.

A decision heuristic

Reach for serverless when the work is event-driven, spiky or low-volume, short-lived (well under the duration cap), stateless, and latency-tolerant — glue code, webhook handlers, scheduled jobs, image-thumbnailing, light APIs. It is often the cheapest and simplest place to start.
Avoid serverless when you have a latency-critical hot path, a long-running or batch job, genuinely stateful work, sustained high-volume traffic (past the cost crossover), a strict no-lock-in mandate, or a downstream system (like a classic relational database) that can't absorb a flood of concurrent connections. Those workloads want a container or VM.

:::tip Trace this scenario A team moves a steady, high-traffic checkout API onto Lambda for the "no servers" appeal. Three problems surface at once: the first request after a quiet stretch stalls on a cold start, hurting tail latency on a user-facing path; under load, thousands of concurrent functions each open a Postgres connection and exhaust the database's connection limit; and at the end of the month the per-invocation bill on constant traffic comes in above what a small always-on container fleet would have cost. None of these are bugs — they're the serverless model meeting a workload it doesn't fit. The right move is a container service (Cloud Run / Fargate or a small Kubernetes deployment) for the steady hot path, leaving serverless for the genuinely spiky, event-driven edges. :::

Why it matters

Cloud compute comes in three flavors on one ladder: virtual machines (a whole OS, most control, most to manage), containers (your app + dependencies, portable, the modern default), and serverless functions (just your code, the provider runs everything and scales to zero). Each step up the ladder trades control for less operational burden. The decision rule — serverless for event-driven and spiky, containers for real portable apps, VMs for full control or legacy — covers the vast majority of choices you'll face. Compute is where code runs; next we cover where its data rests.

Where this leads: the container rung opens directly into Chapter 4: Containers & Kubernetes, where "where do my containers run and how do they scale?" gets its full answer.

Next: Storage: object, block & file →

The ladder: VM → container → serverless​

Virtual machines: a whole computer to yourself​

Containers: your app and its dependencies, packaged​

Serverless functions: just your code​

The trade-off in one table​

A decision rule​

When serverless is the wrong choice​

A decision heuristic​

Why it matters​