Multi-tenancy & self-service guardrails
Self-service (7.3) and reconciling abstractions (7.5) hand real power to developers: spin up environments, provision databases, deploy anywhere, anytime. That power has a dark side. Unbounded self-service is a security and cost hole, not a feature. This lesson is about the guardrails that make self-service safe — isolation between tenants, and limits on what any one tenant can do — and it closes the chapter's big strategic question: build or buy the platform.
The problem: many tenants, one platform
A platform serves many teams on shared infrastructure. In platform language, each team (or app, or environment) is a tenant, and running many of them on shared infrastructure is multi-tenancy. Without deliberate design, multi-tenancy creates three dangers:
- The noisy neighbor. One team's runaway job consumes all the CPU/memory, starving everyone else on the shared cluster.
- The blast radius. One team's mistake or breach can reach another team's data or workloads, because nothing isolates them.
- The cost bomb. Self-service with no limits means someone spins up a giant cluster "to test something," forgets it, and the bill explodes.
The two tools that contain these are isolation (keep tenants apart) and guardrails (limit what each can do). You need both.
Isolation: keeping tenants apart
Isolation is a spectrum from cheap-and-soft to expensive-and-hard. Pick per workload by how sensitive it is:
- Namespace isolation (soft). A namespace is Kubernetes' built-in logical partition — a way to carve one cluster into named slices so each team's objects are grouped and separable. It's cheap and the default first step, but it's soft: tenants still share one control plane and one node pool, so the isolation is logical, not bulletproof.
- vCluster (virtual cluster). vCluster gives each tenant their own virtual Kubernetes control plane running inside a shared host cluster. Tenants get a much stronger illusion of a private cluster — their own API objects, versions, and CRDs — without the cost of a real separate cluster. It's the popular middle ground when namespaces are too soft but full clusters are too expensive.
- Separate clusters (hard). A dedicated cluster per tenant is the strongest, physical isolation — and the most expensive and operationally heavy. Reserve it for the highest-sensitivity or strictest-compliance tenants.
The durable rule: match isolation strength to risk. Don't give every dev team a separate cluster (wasteful); don't put two regulated, sensitive tenants in the same soft namespace (dangerous).
Guardrails: bounding what each tenant can do
Isolation keeps tenants apart; guardrails cap what each can do inside its space. Three layers, and you want all three:
- Quotas (cost & capacity limits). A resource quota caps how much CPU, memory, or how many objects a tenant may consume. This directly defuses the noisy-neighbor and cost-bomb dangers: a team can self-serve up to its quota and no further, so one team can't starve the cluster or run up an unbounded bill.
- RBAC (who can do what). RBAC — Role-Based Access Control — grants permissions by role, on the principle of least privilege: each tenant can act only within its own scope and can't touch another tenant's resources. RBAC is what turns "self-service" into "self-service within your blast radius," not "self-service across the whole company."
- Policy-as-code (rules enforced automatically). Policy-as-code expresses organizational rules as code that's checked automatically — "no public databases," "every workload must set resource limits," "all images from the approved registry," "everything must be tagged for cost." A policy engine evaluates each self-service request against these rules and rejects anything that violates them, before it's created. This is the automated gatekeeper that lets you remove the human gatekeeper without losing control — exactly the 7.3 goal of staying off the critical path while staying safe.
:::tip Self-service without guardrails is just an open door The most dangerous version of platform engineering is "we made everything self-service" with no quotas, RBAC, or policy. That's not empowerment — it's an unguarded door to your security and your cloud bill. Bounded self-service is the only kind worth shipping: developers move fast inside guardrails they can't see until they hit them, and the platform sleeps at night. Guardrails are also how the secure-by-default golden path from 7.2 is actually enforced. :::
The strategic decision: build vs buy an IDP
Zoom out. Should your organization build its platform (assemble Backstage, Crossplane, your own modules and Operators) or buy one (an IDP product like Humanitec, a portal like Port, or PaaS-style tooling like Qovery)? This is the chapter's biggest decision, and the common way to get it wrong is to default — "build, for control" or "buy, for ease" — without matching the choice to the org.
Think in terms of TCO — Total Cost of Ownership — the full lifetime cost, not the sticker price:
| Build | Buy | |
|---|---|---|
| Up-front | Months to stand up | Days to weeks |
| Ongoing | Years to maintain — a permanent team owns it | Vendor maintains; you pay subscription |
| Fit | Exactly your needs | Their opinions; you adapt to fit |
| Control | Total | Bounded by the product |
| Best when | Large org, unusual needs, platform is a differentiator | Smaller org, common needs, speed matters more than fit |
The honest framing most teams underweight: build is months to set up and years to maintain. A built platform is not a project you finish; it's a product you staff forever. If your needs are common and your org is mid-sized, buying is often the rational choice even though building feels more "real." Conversely, a large org with genuinely unusual constraints — where the platform is a competitive advantage — may find no product fits, and building is right. The decision is org size × constraint uniqueness × TCO, not ideology. (Many mature platforms are a blend: buy the portal, build the golden-path internals — or vice versa.)
:::note This connects back to the whole chapter Build-vs-buy is the same trade-off you met for portals in 7.4 (Backstage vs Port) — generalized to the entire platform. And whichever you choose, the 7.1 law still rules: a platform nobody adopts fails regardless of whether you built or bought it. Tooling never rescues a platform built without its customers. :::
Common pitfalls
- Unbounded self-service. No quotas, RBAC, or policy turns self-service into a security and cost hole. Always ship bounded self-service.
- One isolation level for everything. Namespaces for sensitive regulated tenants (too soft) or separate clusters for every dev team (too wasteful). Match isolation to risk.
- Manual policy enforcement. Relying on humans to remember "no public databases" doesn't scale and reintroduces the gatekeeper. Encode rules as policy-as-code so they're enforced automatically.
- Defaulting on build-vs-buy. Choosing "build for control" or "buy for ease" reflexively, without TCO and org-fit analysis. Build is years of maintenance; price that in.
- Forgetting cost is a guardrail. Tagging and quotas are part of safety, not just finance — see Chapter 9 (FinOps). Unlabeled, unlimited self-service is a budget incident waiting to happen.
Why it matters
Self-service is only safe when it's bounded: multi-tenancy must combine isolation — matched to risk along the namespace → vCluster → separate-cluster spectrum — with guardrails: quotas (defuse noisy-neighbor and cost bombs), RBAC (least-privilege, so self-service stays inside each tenant's blast radius), and policy-as-code (an automated gatekeeper that rejects unsafe requests before they're created, replacing the human gatekeeper without losing control). The chapter's strategic capstone is build vs buy, decided by org size × constraint uniqueness × TCO — remembering that build is months to set up and years to maintain — never by reflex. You now have the full arc: why platforms exist, golden paths, self-service and its team model, the portal, the reconciliation primitive, and the guardrails that keep it all safe. Lock it in with the checkpoint.