🧩 Junk Dimension

1) Concept Explanation

A junk dimension groups multiple low-cardinality flags/indicators into one compact dimension table.
Instead of storing many repetitive yes/no/status columns in a fact table, store a single junk_key.

Interview framing:

Junk dimensions reduce fact width and repetition
Useful for miscellaneous attributes that do not deserve standalone dimensions
Common in clickstream, order events, support tickets

2) Text-Based Diagrams

2.1 Without junk dimension

fact_order_event
----------------
event_id
date_key
customer_key
is_gift
is_expedited
payment_risk_flag
coupon_applied
channel_type
...

2.2 With junk dimension

fact_order_event
----------------
event_id
date_key
customer_key
junk_key
event_amount

dim_junk_order_flags
--------------------
junk_key
is_gift
is_expedited
payment_risk_flag
coupon_applied
channel_type

3) Real-World Use Case

Amazon checkout analytics

Low-cardinality order attributes:

gift flag
coupon flag
fraud risk bucket
checkout platform (web/app)

These are grouped into a junk dimension to simplify fact design and improve model consistency.

Uber

Trip-level flags:

pooled ride indicator
surge-applied flag
airport-pickup flag

Netflix

Playback flags:

autoplay
subtitles enabled
HDR enabled
connection quality bucket

4) When to Use / When NOT to Use

Use when

Multiple unrelated low-cardinality attributes exist
Attributes are frequently used together in filters/grouping
Need to reduce repeated columns in large facts

Avoid when

Attribute has high cardinality (should be separate dimension)
Attributes represent clear business entity requiring its own dimension
Combination explosion becomes too large/unmanageable

5) Advantages & Disadvantages

Advantages

Reduces fact table width
Cleaner schema with one FK instead of many flags
Better reusability and governance of flag combinations
Can improve compression and query ergonomics

Disadvantages

Harder to understand without documentation
Combination cardinality can grow quickly
ETL must maintain deterministic combination-to-key mapping

6) Common Mistakes

Putting high-cardinality attributes in junk dimension
No deduping of combinations (duplicate rows with same flags)
Letting one junk dimension grow across unrelated domains
Missing “unknown/default” junk member
Not documenting attribute semantics for analysts

7) Performance Considerations

Keep junk dimensions small and indexed by junk_key
Deduplicate combinations before key generation
Monitor cardinality growth; split if necessary
Use dictionary encoding/compression benefits
Materialize common decoded views for BI ease

8) 🔥 Interview Questions

Conceptual

What is a junk dimension and why use it?
How is junk dimension different from degenerate dimension?
What attributes should not go into a junk dimension?

Scenario-based

Your fact has 25 boolean flags. What design do you propose?
Junk dimension cardinality grew from 100 to 2M. What went wrong?
Analysts complain junk_key is opaque. How do you improve usability?

Product-based

Design junk dimension for Amazon checkout event flags.
Design junk flags for Uber trip quality/risk indicators.
Design playback junk attributes for Netflix viewing events.

Follow-ups

How do you generate stable junk keys in ETL?
When do you split one junk dimension into two?
How do you test unknown/default flag handling?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧩 Junk Dimension

1) Concept Explanation

2) Text-Based Diagrams

2.1 Without junk dimension

2.2 With junk dimension

3) Real-World Use Case

Amazon checkout analytics

Uber

Netflix

4) When to Use / When NOT to Use

Use when

Avoid when

5) Advantages & Disadvantages

Advantages

Disadvantages

6) Common Mistakes

7) Performance Considerations

8) 🔥 Interview Questions

Conceptual

Scenario-based

Product-based

Follow-ups

FilesExpand file tree

junk_dimension.md

Latest commit

History

junk_dimension.md

File metadata and controls

🧩 Junk Dimension

1) Concept Explanation

2) Text-Based Diagrams

2.1 Without junk dimension

2.2 With junk dimension

3) Real-World Use Case

Amazon checkout analytics

Uber

Netflix

4) When to Use / When NOT to Use

Use when

Avoid when

5) Advantages & Disadvantages

Advantages

Disadvantages

6) Common Mistakes

7) Performance Considerations

8) 🔥 Interview Questions

Conceptual

Scenario-based

Product-based

Follow-ups