A junk dimension groups multiple low-cardinality flags/indicators into one compact dimension table.
Instead of storing many repetitive yes/no/status columns in a fact table, store a single junk_key.
Interview framing:
- Junk dimensions reduce fact width and repetition
- Useful for miscellaneous attributes that do not deserve standalone dimensions
- Common in clickstream, order events, support tickets
fact_order_event
----------------
event_id
date_key
customer_key
is_gift
is_expedited
payment_risk_flag
coupon_applied
channel_type
...
fact_order_event
----------------
event_id
date_key
customer_key
junk_key
event_amount
dim_junk_order_flags
--------------------
junk_key
is_gift
is_expedited
payment_risk_flag
coupon_applied
channel_type
Low-cardinality order attributes:
- gift flag
- coupon flag
- fraud risk bucket
- checkout platform (web/app)
These are grouped into a junk dimension to simplify fact design and improve model consistency.
Trip-level flags:
- pooled ride indicator
- surge-applied flag
- airport-pickup flag
Playback flags:
- autoplay
- subtitles enabled
- HDR enabled
- connection quality bucket
- Multiple unrelated low-cardinality attributes exist
- Attributes are frequently used together in filters/grouping
- Need to reduce repeated columns in large facts
- Attribute has high cardinality (should be separate dimension)
- Attributes represent clear business entity requiring its own dimension
- Combination explosion becomes too large/unmanageable
- Reduces fact table width
- Cleaner schema with one FK instead of many flags
- Better reusability and governance of flag combinations
- Can improve compression and query ergonomics
- Harder to understand without documentation
- Combination cardinality can grow quickly
- ETL must maintain deterministic combination-to-key mapping
- Putting high-cardinality attributes in junk dimension
- No deduping of combinations (duplicate rows with same flags)
- Letting one junk dimension grow across unrelated domains
- Missing “unknown/default” junk member
- Not documenting attribute semantics for analysts
- Keep junk dimensions small and indexed by junk_key
- Deduplicate combinations before key generation
- Monitor cardinality growth; split if necessary
- Use dictionary encoding/compression benefits
- Materialize common decoded views for BI ease
- What is a junk dimension and why use it?
- How is junk dimension different from degenerate dimension?
- What attributes should not go into a junk dimension?
- Your fact has 25 boolean flags. What design do you propose?
- Junk dimension cardinality grew from 100 to 2M. What went wrong?
- Analysts complain junk_key is opaque. How do you improve usability?
- Design junk dimension for Amazon checkout event flags.
- Design junk flags for Uber trip quality/risk indicators.
- Design playback junk attributes for Netflix viewing events.
- How do you generate stable junk keys in ETL?
- When do you split one junk dimension into two?
- How do you test unknown/default flag handling?