You are here
Data Warehouse Design
This is my cheat sheet for dimensional modelling design techniques of a data warehouse (DWH) according to Kimball/Ross.
Dimensional Design Process (p. 38 ff.)
- Select the business process.
- Declare the grain (what a single fact table row represents).
- Identify the dimensions.
- Identify the facts.
Dimension Tables (p. 46 ff., p. 62 ff.)
who, what, where, when, why and how
- Dimension tables are entry point to the fact tables.
- Every dimension table has a single primary key (PK) column.
- Dimension tables are usually wide, flat denormalized tables.
- Use Surrogate Keys (synthetic Primary Keys) instead of Natural Keys (NK).
- Centipede facts should be avoided (year_dim + month_dim + day_dim vs date_dim).
- Resist the normalization urges and denormalize instead into a flattened row for simplicity and speed!
- Avoid cryptic abbreviations, use meaningful words instead.
- NULL comes from not yet populated attributes. Avoid NULL on dimensions use "Dummy strings" instead.
- A dimension can have different roles (purchase_date_fk, delivery_date_fk).
- Junk dimension.
- Avoid Snowflake Dimensions use a flattened denormalized dimension instead.
- Degenerated Dimensions (dimension with PK only).
- Outrigger dimension: Dimension to Dimension Reference (use sparsely) typically done over fact.
Slowly Changing Dimension (SCD)
- Type 0: Retain original (attribute values never change)
- Type 1: Overwrite (attribute values is overwritten, most recent, destroy history)
- Type 2: Add new row (3 additional columns: row_effective_date, row_expiration_date, current_row_indicator)
- Type 3: Add new attribute (alternate reality, infrequently used)
- Type 4: Add mini-dimension (group of attributes rapidly changes or frequently used attributes in multi-million-row dimensions)
- Type 5: Add mini-dimension and type 1 outrigger (preserve historical attribute values)
- Type 6: Add Type 1 attributes to type 2 dimensions
- Type 7: Dual type 1 and type 2 dimensions
Fact Tables (p. 41 ff. p. 58 ff.)
Measurements that result from a business process event. Consists of (numeric) business measures and foreign keys.
- Additive (all dimensions)
- semi-additive (some dimensions)
- non-additive (no dimensions) facts
NULL on facts OK but NOT OK of Foreign Keys
If two measures do not mean the same thing, they should be labeled differently.
Types of fact tables
- Transaction fact tables (atomic)
- Periodic Snapshot Fact Tables (summary of period, a zero insert on no activity)
- Accumulating Snapshot Fact Tables (summary of process)
- Factless Fact Tables (no metrics, analyze what did NOT happen)
- Aggregate Fact Tables or OLAP Cubes (solely performance)
- Consolidated Fact Tables (2 processes in 1 table)
- Never join 2 fact tables, merge results of 2 queries instead
- Facts: Basic unit + factor
tags: