Skip to main content

Progressive delivery & rollback

Every deploy is a risk: the new version might be broken in a way your tests didn't catch. Progressive delivery is the discipline of limiting and observing that risk — instead of swapping everyone to the new version at once, you expose it gradually, watch real metrics, and automatically abort if it misbehaves. This lesson covers the four strategies (blue-green, canary, rolling, feature flags), how automated metric/SLO analysis drives an automatic rollback, and why in a GitOps world rollback is just reverting a commit.

The problem: a deploy is a bet

When you replace version 1.4.0 with 1.5.0, you're betting that 1.5.0 is healthy. If you swap all traffic at once and the bet is wrong, every user hits the bug simultaneously. Progressive delivery reframes the deploy: don't bet everything at once — expose the new version to a small slice, measure, and widen only if it's healthy. The strategies differ in how they slice and shift traffic.

The four strategies

Rolling deployment

Replace old instances with new ones a few at a time, gradually, until all are new. This is Kubernetes' default for a Deployment (Chapter 4): it brings up some new pods, removes some old ones, repeats. Simple and built-in, but during the rollout both versions serve real traffic, and it doesn't automatically watch metrics to decide whether to continue — it just proceeds.

all v1mostly v1,\nsome v2

Blue-green deployment

Run two complete environments: blue (current, live) and green (the new version, fully deployed but receiving no traffic). You verify green in isolation, then flip all traffic from blue to green at once. If anything's wrong, you flip straight back to blue — instant rollback, because blue is still running, untouched.

all usersBLUE (v1, live)flip
  • Pro: instant rollback (flip back), and you test green fully before any user sees it.
  • Con: double the resources during the switch, and the cutover is still all-at-once (everyone moves together).

Canary deployment

Named after the canary in a coal mine: send a small percentage of real traffic (say 5%) to the new version while everyone else stays on the old one. Watch the canary's metrics. Healthy? Widen to 25%, then 50%, then 100%. Unhealthy? Abort and send everyone back to the old version. The new version proves itself on real traffic before it's trusted with all of it.

100% traffic95% → v15% → v2(canary)\nwatchmetrics→ 25% → 50% → 100%abort → 0%, back tov1healthyunhealthy

Canary is the safest general-purpose strategy because it limits the blast radius (only a small slice is exposed to a bad version) and makes the decision on real production signal.

Feature flags

A different axis entirely. A feature flag (or feature toggle) wraps new behavior in a runtime switch: the new code ships to production off, then you turn it on for a chosen audience — a few internal users, then 1% of customers, then everyone — without redeploying. This decouples deploy (ship the code) from release (expose the behavior). It's also what lets unfinished work merge to main safely (lesson 5.2): merge it behind a flag that's off. Flags are how you do canary-style exposure at the feature level rather than the deployment level.

:::note Deploy ≠ release Feature flags make the crucial distinction concrete: deploying is putting code on the servers; releasing is turning the behavior on for users. With flags they're separate events — you can deploy at 9 a.m. and release at noon, to 1% of users, and dial back instantly if metrics dip. This decoupling is one of the most powerful risk-reduction ideas in delivery. :::

Automating the decision: metric analysis and automatic abort

Canary and blue-green are only as good as the decision about whether the new version is healthy — and a human staring at dashboards doesn't scale and is too slow. The modern practice is automated analysis: define what "healthy" means as measurable thresholds, and let the system promote or abort on its own.

You define success in terms of SLOs (Service Level Objectives — target levels for things like error rate and latency; covered fully in Chapter 6) and key metrics: error rate, p99 latency, CPU, request success ratio. During a canary, the controller continuously compares the new version's metrics against these thresholds (and often against the old version's baseline). If the canary regresses — error rate spikes, latency blows past the SLO — the controller automatically aborts: it shifts traffic back to the old version without waiting for a human.

Canary at 5%Automatedanalysis\nerror rate< 1%?\np99 latency <widen trafficAUTO-ABORT\nrollback to v1passfail

Two tools implement exactly this on Kubernetes, paired with the GitOps controllers from lesson 5.5:

  • Argo Rollouts — Replaces the standard Deployment with a Rollout object that natively does canary and blue-green with automated metric analysis and automatic abort. Pairs with Argo CD.
  • Flagger — Automates canary/blue-green with metric analysis on top of a service mesh or ingress; pairs with Flux.

This is the deployment metrics feedback loop: the pipeline doesn't just ship — it watches the consequences and acts on them. (It closes fully in lesson 5.7 with DORA, and depends entirely on the observability of Chapter 6.)

Rollback: revert the commit

When a deploy goes wrong, rollback is returning to the last known-good version. Progressive delivery makes rollback cheap in two ways:

  • During the rollout, the strategies give you fast escape hatches: blue-green flips back to blue instantly; canary aborts to 0%; automated analysis does it for you.
  • After the fact, in a GitOps world, rollback is reverting the Git commit that changed the desired state (lesson 5.5). Because the desired state — including the exact immutable image digest — lives in Git, "go back to 1.4.0" is a git revert, and the in-cluster agent reconciles production back to the previous, known-good state. Immutable artifacts (lesson 5.3) are what make this guaranteed identical: rolling back to a digest gives byte-for-byte the version that worked.

:::tip Durable vs dated Blue-green, canary, rolling, feature flags, automated metric-based abort, and "rollback = revert a commit" are durable risk-reduction strategies. The tools (Argo Rollouts, Flagger) and the exact metric APIs are dated. The enduring idea: never bet all traffic at once on an unproven version; expose gradually, measure on real signal, and make rollback fast and guaranteed-identical. :::

Common pitfalls

  • All-at-once deploys with no rollback plan. Swapping everyone to an unproven version means a bad release hits every user at once, with no fast way back. At minimum use canary or blue-green.
  • Canary with no automated analysis. A canary nobody measures is just a slow all-at-once deploy. Wire metric thresholds and automatic abort, or the canary buys you little.
  • No automatic abort on SLO regression. Relying on a human to notice and react is too slow at 3 a.m. Define healthy as thresholds and let the controller abort itself.
  • Mutable tags break rollback. If you deployed :latest, "roll back to the previous image" is ambiguous — it may have moved. Immutable digests make rollback exact (lesson 5.3).
  • Confusing deploy with release. Without feature flags, shipping code is exposing it. Decouple them so you can release gradually and dial back instantly.

Why it matters

A deploy is a bet that the new version is healthy, and progressive delivery keeps that bet small: rolling replaces instances gradually, blue-green keeps the old environment ready for an instant flip-back, canary exposes a small traffic slice and widens only if it stays healthy, and feature flags decouple deploying code from releasing behavior. The decision is automated: controllers like Argo Rollouts and Flagger compare the new version's metrics against SLO thresholds and automatically abort on regression — the deployment metrics feedback loop in action. And rollback is cheap and guaranteed identical: flip back during a rollout, or in GitOps simply revert the commit, letting the agent reconcile production back to the last known-good immutable digest. One risk remains across this whole chapter — the credentials and secrets the pipeline itself wields — which we close next, along with the metrics that tell you if all of this is working.

Next: Pipeline security, OIDC & DORA feedback →