Star schema is a dimensional modeling pattern optimized for analytics.
You place one central fact table (events/measures) and surround it with denormalized dimension tables (context).
Interview framing:
- Think in terms of business process first (orders, trips, watches)
- Then define fact grain (one row per what?)
- Then attach dimensions that answer who/what/when/where/how
Star schema is popular in product companies because it keeps BI queries simple and predictable at scale.
dim_date
|
dim_customer --- fact_order_line --- dim_product
|
dim_store
|
dim_promotion
Fact Table: fact_order_line (grain: one row per order line item)
----------------------------------------------------------------
order_line_id (PK)
order_id
date_key (FK)
customer_key (FK)
product_key (FK)
store_key (FK)
promotion_key (FK)
quantity
unit_price
discount_amount
line_amount
Dimension: dim_customer
-----------------------
customer_key (PK)
customer_id (natural id)
customer_name
city
state
country
customer_tier
signup_date
is_prime
Dimension: dim_product
----------------------
product_key (PK)
product_id
product_name
brand
category
sub_category
list_price
Business question:
“Top 20 products by net revenue in Prime customers from Tier-1 cities during last 90 days.”
With star schema:
- Query fact_order_line
- Join dim_customer + dim_product + dim_date
- Aggregate line_amount
This is typically much faster than deeply normalized OLTP joins.
- Read-heavy analytical workloads
- BI dashboards, slicing/dicing, ad-hoc SQL
- Need understandable model for analysts and product teams
- Need conformed dimensions across multiple facts
- Pure transactional OLTP system
- Heavy row-level updates with strict write latency
- Extremely sparse semi-structured raw ingestion without curated model
- Fewer joins; simpler SQL
- Better query performance for aggregates
- Friendly for BI tools and self-serve analytics
- Easy to communicate to non-DB engineers
- Data redundancy in dimensions
- ETL/ELT complexity for SCD and surrogate keys
- Requires disciplined grain definition
- Potential storage overhead
- Wrong grain (e.g., order-level fact when product analysis needs line-level)
- Fact table stores descriptive attributes (city/product_name) instead of keys
- Missing surrogate keys in dimensions
- Joining fact-to-fact directly (causes explosion)
- Over-snowflaking dimensions and losing star simplicity
- Partition large facts by date (or event_date)
- Cluster/sort by high-cardinality join keys where engine supports it
- Keep dimensions narrow for hot columns used in filters
- Use integer surrogate keys for joins
- Build summary tables/materialized views for repeated KPI queries
- Watch for skewed keys (e.g., huge “unknown” bucket)
- Why is star schema usually faster for analytics than 3NF?
- Why are dimensions denormalized in star schema?
- What is a conformed dimension?
- You modeled fact_orders at order header grain, but product-level conversion analysis is now required. What changes?
- Revenue dashboard is slow despite star schema. What 5 things do you check first?
- Design a star schema for Uber rides and explain grain.
- Design watch-time analytics star for Netflix with device and title dimensions.
- Explain how Prime membership history impacts Amazon customer dimension design.
- How do you handle late-arriving dimensions?
- When would you introduce aggregate fact tables?
- How do you avoid double counting in multi-join scenarios?