Relational Database Design & Normalization (1NF to 3NF)
On this page
Database Normalization
Normalization is the process of organizing data to reduce redundancy and improve data integrity. It ensures that every piece of data is stored in exactly one place. If you don't normalize, you'll end up with "Update Anomalies" where changing a customer's address requires editing 1,000 order records.
1. The Three Golden Rules
- First Normal Form (1NF): No repeating groups. Every column must contain atomic (single) values. (e.g., Don't store "Skills" as a comma-separated string).
- Second Normal Form (2NF): Must be in 1NF + every non-key column must depend on the *entire* Primary Key, not just part of it.
- Third Normal Form (3NF): Must be in 2NF + no "Transitive Dependencies". If Column A depends on Column B, and B depends on the Key, then A doesn't belong in this table.
2. Why normalize?
It saves **Space** (no repeated names/addresses) and ensures **Integrity**. If you update a Category Name in the Categories table, it automatically reflects across 1 million products because they only store the CategoryID.
4. Interview Mastery
Q: "When is it acceptable to DE-normalize a database?"
Architect Answer: "Normalization is for **Writing** (OLTP); De-normalization is for **Reading** (OLAP/Reporting). In high-performance reporting systems, JOINing 20 tables is too slow. We intentionally duplicate data (e.g., storing the 'CustomerName' directly in the 'Order' table) to speed up SELECT queries. However, you must accept the risk of 'Data Staleness' and handle it via background synchronization or triggers."