Skip to content

Latest commit

 

History

History
340 lines (193 loc) · 8.26 KB

File metadata and controls

340 lines (193 loc) · 8.26 KB

Database Normalization

Definition

Database Normalization is the process of organizing data in a relational database to reduce redundancy, prevent anomalies, and ensure data integrity.

The goal is to structure tables in a way that:

  • Avoids storing duplicate or inconsistent data

  • Supports reliable insert, update, and delete operations

  • Improves data accuracy

  • Makes relational models clear and optimized

Normalization breaks a large, unstructured table into smaller, well-defined tables connected by relationships (foreign keys).


Goals of Normalization

Normalization aims to create a database structure that:

1. Minimizes redundancy

Duplicate data wastes space and increases risk of inconsistency.

2. Prevents anomalies

Bad table design leads to faulty behaviors:

  • Insert anomaly: Cannot insert data because other unrelated data is required

  • Update anomaly: Updating data in one place but not others → inconsistent database

  • Delete anomaly: Deleting a record removes additional useful information accidentally

3. Guarantees data integrity

Each fact must be stored once and only once.

4. Improves maintainability

Makes schema easy to update, extend, and understand.

5. Enhances query reliability

Reduces risk of ambiguity and conflicting values.


Data Anomalies (Problems Normalization Fixes)

Normalization mainly fixes three anomalies:

1. Update Anomaly

Occurs when the same data is duplicated and needs multiple updates.

Example:

Student Course Instructor InstructorPhone
John DBMS Prof. Y 99988
Mary DBMS Prof. Y 99988

If Prof. Y changes phone number → must update in multiple rows.


2. Insert Anomaly

Occurs when data cannot be inserted because other fields are missing.

Example:

A student cannot be added unless they enroll in a course.


3. Delete Anomaly

Deleting one piece of data removes valuable information.

Example: If the only student enrolled in DBMS drops out → course info is lost.


Types of Normal Forms

Normal Forms

Below are the official normalization steps (Normal Forms) used in database design.

1. First Normal Form (1NF)

Rule

  • Each column must contain atomic (indivisible) values

  • No repeating groups or arrays

  • Each row must be unique

Example (Not in 1NF)

Student Subjects
John Math, Physics

1NF Conversion

Split multi-valued columns into separate rows:

Student Subject
John Math
John Physics

2. Second Normal Form (2NF)

Rule

  • Must already satisfy 1NF

  • No partial dependency → A non-key column must not depend on part of a composite primary key

When It Applies

Only applies if the table has a composite primary key.

Example (Not in 2NF)

Composite key: (Student, Course)

Student Course Instructor
John DBMS Prof. Y

Issue: Instructor depends only on Course (partial dependency).

2NF Conversion

Split the table:

Table 1: Enrollment

Student Course

Table 2: Course Info | Course | Instructor |


3. Third Normal Form (3NF)

Rule

  • Must already satisfy 2NF
  • No transitive dependencies → Non-key columns should not depend on other non-key columns.

Example (Not in 3NF)

Student City Zipcode

Issue: City depends on Zipcode, not on Student.

3NF Conversion

Split tables:

Table 1: Students | Student | Zipcode |

Table 2: Locations | Zipcode | City |


Boyce–Codd Normal Form (BCNF)

Rule

  • A stronger version of 3NF
  • Every determinant must be a candidate key

Example (Not in BCNF)

Course Instructor Room

Assume one instructor teaches multiple courses but always in the same room. Then:

Instructor → Room but Instructor is not a primary key.

BCNF Conversion

Split into:

Instructor Rooms | Instructor | Room |

Course Info | Course | Instructor |


Fourth Normal Form (4NF)

Rule

  • Must have no multi-valued dependencies

Example (Not in 4NF)

A student can have multiple hobbies and multiple phone numbers.

Student Hobby Phone

This mixes two independent multi-valued attributes.

4NF Conversion

Split into:

Student Hobbies | Student | Hobby |

Student Phones | Student | Phone |


Fifth Normal Form (5NF)

Rule

  • No join dependencies

  • Data should not require complex joins to reconstruct valid information

Used in very high-end OLTP systems to eliminate every redundancy.


Normalization Summary Table

Normal Form Requirement Purpose
1NF Atomic values, no repeating groups Removes multi-valued attributes
2NF No partial dependency Eliminates redundancy in composite keys
3NF No transitive dependency Avoids indirect dependency errors
BCNF Determinant must be a key Stronger version of 3NF
4NF No multi-valued dependencies Avoids independent repeating groups
5NF No join dependency Eliminates over-decomposition problems
6NF No non-trivial join dependencies under any condition Used in temporal databases

When to Normalize vs Denormalize

Normalize when:

  • Data integrity is critical
  • Writes are frequent
  • OLTP systems (banking, finance, transactions)
  • Avoiding anomalies is important

Denormalize when:

  • Reads dominate (e.g., 90% reads, 10% writes)
  • You want faster query performance
  • Using distributed systems where joins are expensive
  • Caching or precomputation is used

Most real-world systems use normalized writes + denormalized reads through caching, materialized views, and search engines.


Real-World Examples

Highly Normalized

  • Banking systems
  • Inventory management
  • ERP systems

Less Normalized / Denormalized

  • Social media feeds
  • E-commerce product catalog
  • Analytics dashboards

Common Interview Questions

1. What problem does normalization solve?

Normalization solves data redundancy, prevents anomalies (insert, update, delete), and ensures data integrity by organizing data into well-structured tables.

-2. Difference between 3NF and BCNF?

3NF allows some functional dependencies where non-key attributes depend on other non-key attributes, while BCNF requires that every determinant is a candidate key, eliminating all such dependencies.

3. What is a functional dependency?

A functional dependency is a relationship between two attributes, typically between a primary key and a non-key attribute, where the value of one attribute (the determinant) uniquely determines the value of another attribute.

4. What is partial dependency?

A partial dependency occurs when a non-key attribute is dependent on part of a composite primary key, rather than the whole key. This violates the rules of 2NF.

5. What is transitive dependency?

A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than directly on the primary key. This violates the rules of 3NF.

6. Why don’t we normalize to 5NF always?

5NF is rarely used in practice because it can lead to excessive complexity and performance issues. Most applications do not require the level of normalization that 5NF provides, and it can complicate queries and data retrieval.


Additional Resources