Database Normalization is the process of organizing data in a relational database to reduce redundancy, prevent anomalies, and ensure data integrity.
The goal is to structure tables in a way that:
-
Avoids storing duplicate or inconsistent data
-
Supports reliable insert, update, and delete operations
-
Improves data accuracy
-
Makes relational models clear and optimized
Normalization breaks a large, unstructured table into smaller, well-defined tables connected by relationships (foreign keys).
Normalization aims to create a database structure that:
Duplicate data wastes space and increases risk of inconsistency.
Bad table design leads to faulty behaviors:
-
Insert anomaly: Cannot insert data because other unrelated data is required
-
Update anomaly: Updating data in one place but not others → inconsistent database
-
Delete anomaly: Deleting a record removes additional useful information accidentally
Each fact must be stored once and only once.
Makes schema easy to update, extend, and understand.
Reduces risk of ambiguity and conflicting values.
Normalization mainly fixes three anomalies:
Occurs when the same data is duplicated and needs multiple updates.
Example:
| Student | Course | Instructor | InstructorPhone |
|---|---|---|---|
| John | DBMS | Prof. Y | 99988 |
| Mary | DBMS | Prof. Y | 99988 |
If Prof. Y changes phone number → must update in multiple rows.
Occurs when data cannot be inserted because other fields are missing.
Example:
A student cannot be added unless they enroll in a course.
Deleting one piece of data removes valuable information.
Example: If the only student enrolled in DBMS drops out → course info is lost.
Below are the official normalization steps (Normal Forms) used in database design.
-
Each column must contain atomic (indivisible) values
-
No repeating groups or arrays
-
Each row must be unique
| Student | Subjects |
|---|---|
| John | Math, Physics |
Split multi-valued columns into separate rows:
| Student | Subject |
|---|---|
| John | Math |
| John | Physics |
-
Must already satisfy 1NF
-
No partial dependency → A non-key column must not depend on part of a composite primary key
Only applies if the table has a composite primary key.
Composite key: (Student, Course)
| Student | Course | Instructor |
|---|---|---|
| John | DBMS | Prof. Y |
Issue: Instructor depends only on Course (partial dependency).
Split the table:
Table 1: Enrollment
| Student | Course |
|---|
Table 2: Course Info | Course | Instructor |
- Must already satisfy 2NF
- No transitive dependencies → Non-key columns should not depend on other non-key columns.
| Student | City | Zipcode |
|---|
Issue: City depends on Zipcode, not on Student.
Split tables:
Table 1: Students | Student | Zipcode |
Table 2: Locations | Zipcode | City |
- A stronger version of 3NF
- Every determinant must be a candidate key
| Course | Instructor | Room |
|---|
Assume one instructor teaches multiple courses but always in the same room. Then:
Instructor → Room but Instructor is not a primary key.
Split into:
Instructor Rooms | Instructor | Room |
Course Info | Course | Instructor |
- Must have no multi-valued dependencies
A student can have multiple hobbies and multiple phone numbers.
| Student | Hobby | Phone |
|---|
This mixes two independent multi-valued attributes.
Split into:
Student Hobbies | Student | Hobby |
Student Phones | Student | Phone |
-
No join dependencies
-
Data should not require complex joins to reconstruct valid information
Used in very high-end OLTP systems to eliminate every redundancy.
| Normal Form | Requirement | Purpose |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Removes multi-valued attributes |
| 2NF | No partial dependency | Eliminates redundancy in composite keys |
| 3NF | No transitive dependency | Avoids indirect dependency errors |
| BCNF | Determinant must be a key | Stronger version of 3NF |
| 4NF | No multi-valued dependencies | Avoids independent repeating groups |
| 5NF | No join dependency | Eliminates over-decomposition problems |
| 6NF | No non-trivial join dependencies under any condition | Used in temporal databases |
- Data integrity is critical
- Writes are frequent
- OLTP systems (banking, finance, transactions)
- Avoiding anomalies is important
- Reads dominate (e.g., 90% reads, 10% writes)
- You want faster query performance
- Using distributed systems where joins are expensive
- Caching or precomputation is used
Most real-world systems use normalized writes + denormalized reads through caching, materialized views, and search engines.
- Banking systems
- Inventory management
- ERP systems
- Social media feeds
- E-commerce product catalog
- Analytics dashboards
1. What problem does normalization solve?
Normalization solves data redundancy, prevents anomalies (insert, update, delete), and ensures data integrity by organizing data into well-structured tables.
-2. Difference between 3NF and BCNF?
3NF allows some functional dependencies where non-key attributes depend on other non-key attributes, while BCNF requires that every determinant is a candidate key, eliminating all such dependencies.
3. What is a functional dependency?
A functional dependency is a relationship between two attributes, typically between a primary key and a non-key attribute, where the value of one attribute (the determinant) uniquely determines the value of another attribute.
4. What is partial dependency?
A partial dependency occurs when a non-key attribute is dependent on part of a composite primary key, rather than the whole key. This violates the rules of 2NF.
5. What is transitive dependency?
A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than directly on the primary key. This violates the rules of 3NF.
6. Why don’t we normalize to 5NF always?
5NF is rarely used in practice because it can lead to excessive complexity and performance issues. Most applications do not require the level of normalization that 5NF provides, and it can complicate queries and data retrieval.
