Database Normalization

Definition

Database Normalization is the process of organizing data in a relational database to reduce redundancy, prevent anomalies, and ensure data integrity.

The goal is to structure tables in a way that:

Avoids storing duplicate or inconsistent data
Supports reliable insert, update, and delete operations
Improves data accuracy
Makes relational models clear and optimized

Normalization breaks a large, unstructured table into smaller, well-defined tables connected by relationships (foreign keys).

Goals of Normalization

Normalization aims to create a database structure that:

1. Minimizes redundancy

Duplicate data wastes space and increases risk of inconsistency.

2. Prevents anomalies

Bad table design leads to faulty behaviors:

Insert anomaly: Cannot insert data because other unrelated data is required
Update anomaly: Updating data in one place but not others → inconsistent database
Delete anomaly: Deleting a record removes additional useful information accidentally

3. Guarantees data integrity

Each fact must be stored once and only once.

4. Improves maintainability

Makes schema easy to update, extend, and understand.

5. Enhances query reliability

Reduces risk of ambiguity and conflicting values.

Data Anomalies (Problems Normalization Fixes)

Normalization mainly fixes three anomalies:

1. Update Anomaly

Occurs when the same data is duplicated and needs multiple updates.

Example:

Student	Course	Instructor	InstructorPhone
John	DBMS	Prof. Y	99988
Mary	DBMS	Prof. Y	99988

If Prof. Y changes phone number → must update in multiple rows.

2. Insert Anomaly

Occurs when data cannot be inserted because other fields are missing.

Example:

A student cannot be added unless they enroll in a course.

3. Delete Anomaly

Deleting one piece of data removes valuable information.

Example: If the only student enrolled in DBMS drops out → course info is lost.

Types of Normal Forms

Below are the official normalization steps (Normal Forms) used in database design.

1. First Normal Form (1NF)

Rule

Each column must contain atomic (indivisible) values
No repeating groups or arrays
Each row must be unique

Example (Not in 1NF)

Student	Subjects
John	Math, Physics

1NF Conversion

Split multi-valued columns into separate rows:

Student	Subject
John	Math
John	Physics

2. Second Normal Form (2NF)

Rule

Must already satisfy 1NF
No partial dependency → A non-key column must not depend on part of a composite primary key

When It Applies

Only applies if the table has a composite primary key.

Example (Not in 2NF)

Composite key: (Student, Course)

Student	Course	Instructor
John	DBMS	Prof. Y

Issue: Instructor depends only on Course (partial dependency).

2NF Conversion

Split the table:

Table 1: Enrollment

Student	Course

Table 2: Course Info | Course | Instructor |

3. Third Normal Form (3NF)

Rule

Must already satisfy 2NF
No transitive dependencies → Non-key columns should not depend on other non-key columns.

Example (Not in 3NF)

Student	City	Zipcode

Issue: City depends on Zipcode, not on Student.

3NF Conversion

Split tables:

Table 1: Students | Student | Zipcode |

Table 2: Locations | Zipcode | City |

Boyce–Codd Normal Form (BCNF)

Rule

A stronger version of 3NF
Every determinant must be a candidate key

Example (Not in BCNF)

Course	Instructor	Room

Assume one instructor teaches multiple courses but always in the same room. Then:

Instructor → Room but Instructor is not a primary key.

BCNF Conversion

Split into:

Instructor Rooms | Instructor | Room |

Course Info | Course | Instructor |

Fourth Normal Form (4NF)

Rule

Must have no multi-valued dependencies

Example (Not in 4NF)

A student can have multiple hobbies and multiple phone numbers.

Student	Hobby	Phone

This mixes two independent multi-valued attributes.

4NF Conversion

Split into:

Student Hobbies | Student | Hobby |

Student Phones | Student | Phone |

Fifth Normal Form (5NF)

Rule

No join dependencies
Data should not require complex joins to reconstruct valid information

Used in very high-end OLTP systems to eliminate every redundancy.

Normalization Summary Table

Normal Form	Requirement	Purpose
1NF	Atomic values, no repeating groups	Removes multi-valued attributes
2NF	No partial dependency	Eliminates redundancy in composite keys
3NF	No transitive dependency	Avoids indirect dependency errors
BCNF	Determinant must be a key	Stronger version of 3NF
4NF	No multi-valued dependencies	Avoids independent repeating groups
5NF	No join dependency	Eliminates over-decomposition problems
6NF	No non-trivial join dependencies under any condition	Used in temporal databases

When to Normalize vs Denormalize

Normalize when:

Data integrity is critical
Writes are frequent
OLTP systems (banking, finance, transactions)
Avoiding anomalies is important

Denormalize when:

Reads dominate (e.g., 90% reads, 10% writes)
You want faster query performance
Using distributed systems where joins are expensive
Caching or precomputation is used

Most real-world systems use normalized writes + denormalized reads through caching, materialized views, and search engines.

Real-World Examples

Highly Normalized

Banking systems
Inventory management
ERP systems

Less Normalized / Denormalized

Social media feeds
E-commerce product catalog
Analytics dashboards

Common Interview Questions

1. What problem does normalization solve?

Normalization solves data redundancy, prevents anomalies (insert, update, delete), and ensures data integrity by organizing data into well-structured tables.

-2. Difference between 3NF and BCNF?

3NF allows some functional dependencies where non-key attributes depend on other non-key attributes, while BCNF requires that every determinant is a candidate key, eliminating all such dependencies.

3. What is a functional dependency?

A functional dependency is a relationship between two attributes, typically between a primary key and a non-key attribute, where the value of one attribute (the determinant) uniquely determines the value of another attribute.

4. What is partial dependency?

A partial dependency occurs when a non-key attribute is dependent on part of a composite primary key, rather than the whole key. This violates the rules of 2NF.

5. What is transitive dependency?

A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than directly on the primary key. This violates the rules of 3NF.

6. Why don’t we normalize to 5NF always?

5NF is rarely used in practice because it can lead to excessive complexity and performance issues. Most applications do not require the level of normalization that 5NF provides, and it can complicate queries and data retrieval.

FilesExpand file tree

8. Database Normalization.md

Latest commit

History

8. Database Normalization.md

File metadata and controls

Database Normalization

Definition

Goals of Normalization

1. Minimizes redundancy

2. Prevents anomalies

3. Guarantees data integrity

4. Improves maintainability

5. Enhances query reliability

Data Anomalies (Problems Normalization Fixes)

1. Update Anomaly

2. Insert Anomaly

3. Delete Anomaly

Types of Normal Forms

1. First Normal Form (1NF)

Rule

Example (Not in 1NF)

1NF Conversion

2. Second Normal Form (2NF)

Rule

When It Applies

Example (Not in 2NF)

2NF Conversion

3. Third Normal Form (3NF)

Rule

Example (Not in 3NF)

3NF Conversion

Boyce–Codd Normal Form (BCNF)

Rule

Example (Not in BCNF)

BCNF Conversion

Fourth Normal Form (4NF)

Rule

Example (Not in 4NF)

4NF Conversion

Fifth Normal Form (5NF)

Rule

Normalization Summary Table

When to Normalize vs Denormalize

Normalize when:

Denormalize when:

Real-World Examples

Highly Normalized

Less Normalized / Denormalized

Common Interview Questions

Additional Resources

Database Normalization Explained

Visually Explained Database Normalization with SQL Example