dbt

How to model a star schema in dbt

Name: SchemaStruct
Availability: InStock
Author: SchemaStruct

A practical workflow for moving from raw source tables to reviewable fact and dimension models in dbt — with examples.

March 18, 20268 min readdbtstar schema

Start from the business event, not the table list

The fastest way to derail a star schema is to begin by mirroring source systems. In dbt, start with the business event you want to measure instead: orders placed, invoices paid, sessions started, claims submitted.

That event becomes the fact table candidate. Once the event is clear, the rest of the model becomes easier to judge: which dimensions describe the event, which attributes belong in the fact, and which transformations belong upstream in staging or intermediate models.

Mental Model

Think of the fact table as a business ledger and each dimension as a lookup table. Every row in the ledger records a transaction. Every dimension row enriches that transaction with context — who, what, where, when.

Write the fact grain in one sentence before you create any models.
List the measures the business expects from that grain.
Separate descriptive attributes from additive metrics early.

Use staging models to normalize warehouse noise

Raw tables usually contain naming drift, duplicated status logic, mixed timestamp semantics, and denormalized blobs that make dimensional modeling harder than it needs to be. dbt staging models are where you clean that noise.

Rename columns into warehouse-friendly terms, standardize types, and push source-specific cleanup into the staging layer. Your dimensional models should read like analytics products, not ingestion artifacts.

models/staging/stg_orders.sql

-- Staging: rename, retype, clean. No business logic here.
with source as (
    select * from {{ source('raw', 'orders') }}
),
renamed as (
    select
        order_id,
        customer_id,
        product_id,
        store_id,
        cast(order_date  as date)          as order_date,
        cast(quantity    as integer)       as quantity,
        cast(unit_price  as numeric(10,2)) as unit_price,
        quantity * unit_price              as total_amount,
        lower(trim(status))                as order_status
    from source
    where order_id is not null
)
select * from renamed

Keep staging close to the source — avoid business metrics there.
Normalize keys and timestamps before dimensional modeling starts.
Document field meaning in the same repo where models are reviewed.

Design conformed dimensions before you optimize marts

A star schema scales better when dimensions are reusable across facts. Customer, product, account, and date are common examples. If each mart invents its own version, reviews get noisy and definitions drift across the warehouse.

In dbt, conformed dimensions also reduce duplicate logic. Shared dimensions feed multiple marts while preserving a single, reviewable source of truth for descriptive attributes.

A star schema: one central fact table surrounded by dimension tables. The fact table holds measurable events; dimension tables hold descriptive context.

Prefer shared dimensions when the entity definition is stable across domains.
Keep slowly changing history intentional, not accidental.
Use surrogate keys only when the modeling problem genuinely requires them.

Express your schema in SchemaStruct DSL before writing SQL

Before writing any dbt SQL, expressing your star schema in a structured DSL lets the whole team review the design as a diagram. You can catch grain mismatches, missing foreign keys, and naming inconsistencies at design time instead of during code review.

SchemaStruct parses the DSL into a live diagram. Once the design is agreed, the DSL exports directly to SQL DDL for any target dialect — Snowflake, PostgreSQL, or Databricks.

schema.schemastruct — DSL example

table dim_customer {
  customer_id  int     [pk]
  full_name    varchar
  email        varchar [unique]
  region       varchar
  segment      varchar
  note: "Conformed — shared across fct_orders and fct_sessions"
}

table dim_product {
  product_id   int     [pk]
  product_name varchar
  category     varchar
  sku          varchar [unique]
  unit_cost    decimal
}

table dim_date {
  date_id      int     [pk]
  full_date    date
  year         int
  quarter      int
  month        int
  day_of_week  varchar
  is_weekend   boolean
}

table fact_orders {
  order_id     int     [pk]
  customer_id  int     [ref: > dim_customer.customer_id]
  product_id   int     [ref: > dim_product.product_id]
  date_id      int     [ref: > dim_date.date_id]
  quantity     int
  unit_price   decimal
  total_amount decimal
}

Review the model before you ship SQL

Teams often treat dbt as the first review surface, but that forces reviewers to reason through the implementation before agreeing on the structure. A better approach is to review the schema itself first: grain, joins, dimensions, and delivery outputs.

That is where a visual review layer compounds. Once the star schema is agreed, the dbt implementation becomes a handoff step instead of the place where design ambiguity is discovered.

Pro Tip

Ship the schema review, not just the SQL review. Agreeing on grain and joins before writing dbt models cuts downstream rework by catching design issues at the cheapest possible moment — before any SQL exists.