Define grain before columns
Dimensional modeling gets easier when the team forces itself to define grain first. Grain answers the most important question in the model: what does one row represent?
Without a grain statement, teams blur events and entities together, overpack facts with descriptive attributes, and end up rebuilding marts when reporting questions expand.
Mental Model
Answer this question in one sentence before creating any columns: 'One row in this table represents one ___.' If you need more than one sentence, the model has multiple grains and will cause problems downstream.
Facts measure events, dimensions describe them
A useful dimensional model keeps a clear line between measurable activity and reusable descriptive context. Fact tables hold the measurable event. Dimensions provide consistent business context for slicing and grouping.
That sounds simple, but most modeling mistakes come from crossing those responsibilities: dimensions that carry metrics, facts that absorb too much descriptive text, or marts that hide multiple grains in one table.
- Put additive and semi-additive metrics in facts.
- Keep dimension attributes descriptive and stable.
- Model many-to-many edge cases intentionally, not implicitly.
The four dimensions almost every model needs
Regardless of domain, most analytical fact tables need the same four dimensional anchors: date, customer (or user), product (or item), and location. These four dimensions cover the who, what, when, and where of nearly every business event.
Designing them as conformed dimensions — shared across marts — means analysts can combine facts from different domains without writing joins that break on mismatched keys.
table dim_date {
date_id int [pk]
full_date date
year int
quarter int
month int
month_name varchar
day_of_week varchar
is_weekend boolean
is_holiday boolean
}
table dim_customer {
customer_id int [pk]
full_name varchar
email varchar [unique]
segment varchar // SMB, Mid-Market, Enterprise
region varchar
signup_date date
}
table dim_product {
product_id int [pk]
product_name varchar
category varchar
subcategory varchar
unit_cost decimal
is_active boolean
}
table dim_location {
location_id int [pk]
city varchar
state varchar
country varchar
region varchar
}- Conformed date dimensions unlock time-series analysis across every fact.
- Never embed date attributes directly in a fact — always join to dim_date.
- Reuse the same surrogate keys across fact tables for clean cross-domain joins.
Review joins as a product decision
Join paths are not just implementation details. They define what downstream teams can safely ask from the warehouse. If joins are ambiguous, analytics work becomes fragile no matter how polished the SQL looks.
A schema review should therefore validate join structure explicitly: which dimensions connect to which facts, whether role-playing dimensions are clearly named, and whether path choices match the business story the mart is meant to tell.
Keep documentation generated from the same model
Dimensional models age poorly when docs are maintained separately. The quickest route to drift is to describe the warehouse in one place and change it in another.
The better pattern is to keep review, documentation, and handoff outputs tied to the same schema source. When a fact grain changes or a dimension expands, the change propagates through the workflow instead of creating parallel truths.
Pro Tip
Docs that live in a separate system from the schema are docs that lie. Generate your data dictionary from the same DSL or model definition that produces your DDL. When the schema changes, the docs change with it.