Pipelines: CI/CD/CT for models
Chapter 5 taught CI/CD: every code change flows automatically through build → test → deploy, with no human hand-carrying artifacts. ML keeps all of that and adds a third leg that ordinary software simply doesn't have — because in ML, new data (not just new code) is a reason to ship a new model. This lesson is the "automate the model lifecycle" layer, and it introduces continuous training (CT): retraining on a trigger.
What an ML pipeline actually is
A trained model isn't produced by one script; it's the output of a sequence of steps that must run in order, reliably, repeatably:
This sequence is a pipeline, and — exactly like Infrastructure as Code (Chapter 3) and CI/CD (Chapter 5) — you define it as code so it's versioned, reviewable, and reproducible rather than a human running notebooks by hand. Two steps deserve special attention because they're the quality gates.
Data validation: garbage in, garbage model
In normal CI you test code. In ML you must also test data, because the data feeds the model. Data validation is an automated step that checks incoming data against expectations before it's allowed to train a model: are columns present, are value ranges sane, is the share of missing values normal, does the distribution roughly match what's expected? A schema change or a broken upstream feed can silently produce a terrible model; data validation catches it at the gate, before you've wasted a GPU-hour or shipped a bad model. Skipping data validation is a classic ML pipeline gap — the model trains "successfully" on broken data and you find out in production.
Evaluation as a gate
After training, the pipeline evaluates the new model on a held-out test set and only promotes it (to the registry, lesson 10.2) if it clears a quality bar — and ideally beats the current production model. This is the model-world equivalent of "tests must pass before deploy." We make this gate concrete for LLMs in lesson 10.6, but the principle is universal: no model reaches production without passing an automated quality gate.
CI, CD, and the new leg: CT
ML operations is often summarized as CI/CD/CT:
- CI (continuous integration) — test the code and the pipeline on every change (plus data validation): does the pipeline run, do unit tests pass, is the data sane?
- CD (continuous delivery/deployment) — automatically deliver the trained, registered model to serving infrastructure (the serving patterns in lesson 10.5).
- CT (continuous training) — the ML-specific addition: automatically retrain and redeploy the model on a trigger, because models drift (10.1) and new data improves them.
The durable new idea: in software, only a code change triggers a new release. In ML, new data or detected drift is also a reason to ship — so retraining is automated and triggered, not a manual annual ritual.
What triggers continuous training?
CT pipelines fire on triggers, and naming them is the point:
- Schedule — retrain nightly/weekly so the model keeps up with fresh data. Simple, predictable, the common default.
- Data drift / quality degradation — your monitoring (lesson 10.6) detects the input distribution has drifted or accuracy has dropped, and automatically kicks off retraining. This is the feedback loop that directly fixes the "silent degradation" failure from 10.1.
- New data volume — enough new labeled data has accumulated to be worth retraining.
That drift-triggered loop — monitor → detect drift → retrain → redeploy → monitor — is the closed loop that keeps a production model honest. It is the single biggest thing ordinary CI/CD lacks, and the direct antidote to treating a model deploy as "done."
Orchestrating the pipeline
Something has to run these multi-step pipelines on schedule, handle failures and retries, and pass artifacts between steps. That's an orchestrator (you saw the same idea generically — a system that runs ordered steps reliably). The common tools:
- Airflow — the long-standing, widely-used workflow orchestrator (define pipelines as code; schedule and monitor them).
- Dagster and Prefect — modern alternatives with better data-awareness and developer experience.
- Kubeflow — Kubernetes-native ML pipelines specifically, running each step as a container on your cluster (Chapter 4).
They differ in ergonomics, but the role is identical: reliably run the ordered, retried, scheduled steps of an ML pipeline. Pick by team and ecosystem, not hype.
Training vs serving infrastructure: two different machines
A theme worth making explicit: training and serving are different workloads with different infrastructure, and confusing them wastes money.
| Training | Serving | |
|---|---|---|
| Shape | Bursty, long-running batch jobs | Always-on, request-driven |
| Goal | Throughput — finish the job | Low latency per request |
| Compute | Big GPUs, can tolerate interruption | Right-sized, must stay responsive |
| Cost lever | Spot/preemptible GPUs (Ch. 9) — cheap, interruptible | Autoscaling + scale-to-zero (lesson 10.4–10.5) |
| Tolerates failure? | Yes — checkpoint and resume | No — it's serving users |
Training is a job: kick it off, let it run for hours on cheap interruptible hardware, checkpoint so it can resume if preempted. Serving is a service: it must answer now, scale with traffic, and (because GPUs are expensive) ideally scale to zero when idle. The next two lessons are exactly these two halves — GPU compute, then serving.
:::tip Durable vs dated CI/CD/CT, data validation, eval gates, and the drift→retrain loop are durable. The orchestrators (Airflow, Dagster, Prefect, Kubeflow) are dated — they trade places constantly. Learn to recognize "this is a pipeline-as-code that needs scheduling, retries, and gates"; which tool runs it is an implementation detail. :::
Why it matters
An ML pipeline is the model lifecycle as code — ingest, validate data, build features, train, gate on evaluation, register, deploy — versioned and automated rather than hand-run in notebooks. ML keeps CI and CD but adds CT (continuous training): because models drift and new data helps, retraining is triggered automatically — by schedule, by detected drift, or by new data — closing the monitor → retrain → redeploy loop that ordinary CI/CD has no equivalent for and that directly fixes silent model decay. Orchestrators (Airflow, Dagster, Prefect, Kubeflow) reliably run those steps. And training and serving are genuinely different infrastructure — bursty interruptible jobs on cheap spot GPUs versus always-on low-latency services — which sets up the next two lessons. First, the resource that dominates every ML budget: GPUs.