Databases: managed SQL & NoSQL
Storage keeps raw bytes; a database keeps structured data you need to query and update reliably — users, orders, messages, the living state of your application. Two questions dominate this primitive: SQL or NoSQL? and should I run the database myself or let the cloud manage it? This lesson answers both from the ground up.
What a database adds over plain storage
You could store your data as files in object storage. But the moment you ask "give me all users in Germany who signed up last week, sorted by spend," files fall apart — you'd have to read and scan everything yourself. A database is software purpose-built to store, query, update, and protect structured data efficiently: it indexes data for fast lookups, enforces consistency, and handles many users reading and writing at once without corrupting anything. That's the value over raw storage.
SQL (relational) databases
A relational database organizes data into tables — rows and columns, like a spreadsheet — where tables can reference each other (a orders table points at a users table). You query it with SQL (Structured Query Language), a decades-old, near-universal language for asking precise questions of your data (SELECT * FROM users WHERE country = 'DE').
Its defining strength is strong consistency and structure, captured by the acronym ACID (Atomicity, Consistency, Isolation, Durability) — a set of guarantees ensuring that transactions either fully happen or fully don't, and that the data is never left half-updated. When two people transfer money at the same time, ACID is what stops the money from being created or destroyed. You also define a schema up front: the exact columns and types each table has.
- Best for: data with clear structure and relationships, and anything where correctness is non-negotiable — financial records, orders, inventory, user accounts. Most applications start here, and most should.
- Trade-off: the rigid schema and strong guarantees make it harder to scale a single database across many machines (horizontal scaling) than NoSQL.
Brand names (managed): RDS / Aurora (AWS), Cloud SQL / AlloyDB (GCP), Azure SQL (Azure). The underlying engines are usually PostgreSQL or MySQL — durable open standards worth knowing by name.
NoSQL (non-relational) databases
NoSQL ("not only SQL") is an umbrella for databases that don't use the rigid table-and-relationship model, trading some of SQL's guarantees for flexibility and massive scale. The common types:
- Document stores keep data as flexible JSON-like documents (no fixed schema). Great when records vary in shape. (e.g. MongoDB, DynamoDB, Firestore)
- Key-value stores are a giant, ultra-fast dictionary: store a value by a key, fetch it by the key. Great for caching and simple lookups. (e.g. Redis, DynamoDB)
- Wide-column and graph stores serve more specialized shapes (huge tables, or networks of relationships).
NoSQL databases typically scale horizontally with ease — spread across many machines to handle enormous volume and traffic — and let you change the data's shape without migrations. The cost is that many relax strong consistency (offering "eventual consistency," where a write may take a moment to appear everywhere) and push the job of enforcing structure into your application.
- Best for: very large scale, high write throughput, flexible/changing data shapes, caching, real-time feeds, and simple high-speed lookups.
- Trade-off: weaker built-in consistency guarantees and no rich cross-table SQL joins, so your application code has to do more.
The honest decision rule
Beginners often think NoSQL is "newer and therefore better." It isn't better — it's different, with a different trade-off. A reliable rule:
- Default to a managed SQL (relational) database — specifically PostgreSQL. It's flexible enough for the vast majority of apps, gives you strong consistency for free, and the structure prevents whole categories of bugs. Most successful products run on relational databases for their entire lives.
- Reach for NoSQL when you have a specific reason: extreme scale a single relational DB can't handle, genuinely schema-less data, a need for a fast cache (key-value), or a workload that's a natural document/graph shape.
- You'll often use both — a relational database as the source of truth plus a key-value store (like Redis) as a cache in front of it.
"Managed" — and why you almost always want it
Whichever shape you pick, you face one more choice: run the database engine yourself on a VM, or use a managed database service. A managed database is one where the cloud provider operates the database for you — they handle installation, patching, backups, replication across availability zones, failover when a server dies, and scaling the underlying hardware. You just get a connection string and use it.
This is the shared-responsibility model (Chapter 1) at work: managed databases shift the heavy operational burden — backups, patching, high availability — onto the provider, leaving you to focus on your data and queries. Running a production database yourself means you own backups, you handle the 3 a.m. failover, you patch security holes. That is a large, easy-to-get-wrong job.
:::tip Default to managed Unless you have a strong, specific reason, use a managed database. Self-hosting a production database is a serious operational commitment (backups, replication, failover, patching) that the provider does better, more reliably, and often more cheaply than you can. The whole point of the cloud is to rent away exactly this kind of toil. (That managed databases cost more per hour than a raw VM is dated detail and almost always worth it.) :::
Why it matters
Databases store the structured, queryable, living state of your app. SQL (relational) databases use tables and the SQL language, give you strong ACID consistency and an enforced schema, and are the right default for most applications — especially PostgreSQL. NoSQL databases trade those guarantees for flexibility and horizontal scale, in document, key-value, and other shapes; reach for them when you have a specific scale, shape, or caching need. And almost always, choose a managed database so the provider carries the operational weight of backups, patching, and failover. Compute runs, storage keeps, databases query — now we need the wiring that connects them and decides who can reach them.