Data Modeling

Database normalization: 1NF, 2NF, and 3NF explained with examples

A step-by-step guide to database normalization — what 1NF, 2NF, and 3NF mean, why they matter, and how to recognize and fix violations in real schemas.

December 15, 202510 min readnormalizationdatabase design

Why normalization matters for your data model

Normalization is the discipline of organizing a database so that each fact is stored in exactly one place. When the same fact lives in two rows or two tables, updating it means updating it in multiple places — and eventually, someone will forget one.

The result is inconsistent data: two rows describing the same customer with different cities, or a product price that differs across invoice records. Normalization eliminates these anomalies by design rather than by discipline.

Mental Model

Normalization answers: where does this fact belong? If customer email appears in three tables, it belongs in exactly one — customers. Every other table should reference it by foreign key. One source of truth, many references.

First Normal Form: eliminate repeating groups and ensure atomicity

A table is in First Normal Form (1NF) when every column contains atomic (indivisible) values and there are no repeating groups. The most common 1NF violations are storing multiple values in a single column — like comma-separated tags — or using numbered columns like item_1, item_2, item_3 to store a list.

The fix is always the same: one value per cell, one fact per row. Move repeating groups into their own table and join them back with a foreign key.

1NF: before and after

-- BEFORE 1NF: repeating groups as columns
CREATE TABLE orders (
    order_id   INT PRIMARY KEY,
    customer   VARCHAR,
    item_1     VARCHAR,  -- repeating groups!
    item_2     VARCHAR,
    item_3     VARCHAR,
    qty_1      INT,
    qty_2      INT,
    qty_3      INT
);
-- Problem: item_4 requires a schema change; NULL padding wastes storage.

-- AFTER 1NF: one fact per row, no repeating groups
CREATE TABLE orders (
    order_id   INT,
    customer   VARCHAR,
    item_name  VARCHAR,
    quantity   INT,
    PRIMARY KEY (order_id, item_name)
);
-- Now adding a fourth item is just a new row, not a schema change.

Second Normal Form: remove partial dependencies

A table is in Second Normal Form (2NF) when it is already in 1NF and every non-key column depends on the entire primary key — not just part of it. Partial dependencies only appear in tables with composite primary keys.

The classic example: an order_items table with a composite key of (order_id, product_id), where product_name depends only on product_id and not on order_id. That attribute belongs in a products table, not in order_items.

2NF: removing partial dependencies

-- BEFORE 2NF: product_name depends on product_id only
-- (partial dependency on a composite PK)
CREATE TABLE order_items (
    order_id      INT,
    product_id    INT,
    product_name  VARCHAR,  -- depends only on product_id!
    quantity      INT,
    PRIMARY KEY (order_id, product_id)
);

-- AFTER 2NF: product attributes moved to their own table
CREATE TABLE order_items (
    order_id    INT,
    product_id  INT REFERENCES products(product_id),
    quantity    INT,
    PRIMARY KEY (order_id, product_id)
);

CREATE TABLE products (
    product_id    INT PRIMARY KEY,
    product_name  VARCHAR,
    unit_price    NUMERIC(10,2)
);
-- Now renaming a product is one UPDATE in one table, not thousands.

Third Normal Form: eliminate transitive dependencies

A table is in Third Normal Form (3NF) when it is in 2NF and no non-key column depends on another non-key column. This is called a transitive dependency. The most common example: storing city and state alongside zip code in a customers table, when city and state are actually determined by zip.

If you update a customer's zip code but forget to update their city and state, the record becomes internally inconsistent. The fix is to move the transitive dependency into its own lookup table.

Before normalization: one denormalized table with repeated values and derived columns. After 3NF: four smaller tables, each storing one type of fact, linked by foreign keys.

3NF: removing transitive dependencies

-- BEFORE 3NF: city and state depend on zip, not customer_id
CREATE TABLE customers (
    customer_id  INT PRIMARY KEY,
    name         VARCHAR,
    email        VARCHAR,
    zip          CHAR(5),
    city         VARCHAR,   -- depends on zip, not customer_id!
    state        CHAR(2)    -- depends on zip, not customer_id!
);

-- AFTER 3NF: transitive dependency extracted to zip_codes
CREATE TABLE customers (
    customer_id  INT PRIMARY KEY,
    name         VARCHAR,
    email        VARCHAR,
    zip          CHAR(5) REFERENCES zip_codes(zip)
);

CREATE TABLE zip_codes (
    zip    CHAR(5) PRIMARY KEY,
    city   VARCHAR,
    state  CHAR(2)
);
-- Updating a city for a zip code now fixes every customer at once.

When to intentionally denormalize for analytics

Normalization is the right default for transactional databases where write correctness matters. For analytical workloads, deliberate denormalization often improves query performance by reducing joins at read time — this is exactly what dimensional modeling and wide tables do.

The key word is deliberate. A denormalized analytical table is a design decision, not a mistake. Document why it is denormalized, what the source-of-truth table is, and how updates flow from the normalized model to the denormalized one.

Watch Out

Denormalization is a deliberate trade-off, not a shortcut. If you denormalize without documenting it, the next engineer will normalize it — and break every query pattern that depended on the wide format.

Denormalize when join cost dominates query runtime on large fact tables.
Always maintain a normalized source and derive denormalized outputs from it.
Document the denormalization decision in the schema's data dictionary.
Use dbt to automate the derivation so the denormalized table stays in sync.