Skip to content

Commit 55d3698

Browse files
clean up the normalization chapter
1 parent 762fb8a commit 55d3698

File tree

1 file changed

+3
-77
lines changed

1 file changed

+3
-77
lines changed

book/30-schema-design/055-normalization.md

Lines changed: 3 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -179,9 +179,7 @@ DataJoint takes a different, more intuitive approach to normalization that emerg
179179

180180
> **Every relation (table) must represent a well-defined entity type, and all attributes in that table must describe that entity type directly and only that entity type.**
181181
182-
This principle applies to both:
183-
1. **Schema design** (the structure of your database tables)
184-
2. **Query results** (the relations returned by queries)
182+
This principle guides schema design, ensuring that each table represents a coherent entity type.
185183

186184
### What This Means in Practice
187185

@@ -1225,78 +1223,6 @@ class StudentMajor(dj.Manual):
12251223

12261224
The `declaration_date` is not a property of the student or the major—it's a property of the relationship between them.
12271225

1228-
## Normalization in Query Results
1229-
1230-
A unique aspect of DataJoint's normalization principle is that it applies to **query results**, not just schema design.
1231-
1232-
### Normalized Query Results
1233-
1234-
When you execute a query in DataJoint, the result should also represent a well-defined entity type:
1235-
1236-
**Example: Well-normalized query**
1237-
1238-
```python
1239-
# Query result represents "students with their enrollment count"
1240-
Student.aggr(Enrollment, enrollment_count='COUNT(*)')
1241-
```
1242-
1243-
Result:
1244-
```
1245-
┌────────────┬──────┬──────────────────┐
1246-
│ student_id │ name │ enrollment_count │
1247-
├────────────┼──────┼──────────────────┤
1248-
│ 1 │ Alice│ 3 │
1249-
│ 2 │ Bob │ 5 │
1250-
└────────────┴──────┴──────────────────┘
1251-
```
1252-
1253-
**Entity type**: "Students with enrollment statistics"
1254-
**All attributes**: Describe the student (including computed statistics about that student)
1255-
1256-
### Why This Matters
1257-
1258-
Traditional SQL allows queries that mix unrelated entity types:
1259-
1260-
**Potentially confusing SQL query**:
1261-
```sql
1262-
SELECT
1263-
s.student_id,
1264-
s.name,
1265-
c.course_id,
1266-
c.title,
1267-
d.dept_name,
1268-
p.professor_name
1269-
FROM student s
1270-
JOIN enrollment e ON s.student_id = e.student_id
1271-
JOIN course c ON e.course_id = c.course_id
1272-
JOIN department d ON c.dept_id = d.dept_id
1273-
JOIN professor p ON c.professor_id = p.professor_id
1274-
```
1275-
1276-
**Question**: What entity type does this result represent?
1277-
- Students? No, multiple rows per student
1278-
- Courses? No, multiple rows per course
1279-
- Enrollments? Closer, but also includes department and professor info
1280-
1281-
**Answer**: This is a denormalized join mixing multiple entity types. It's valid SQL but doesn't represent a clear entity type.
1282-
1283-
### DataJoint Encourages Entity-Focused Queries
1284-
1285-
DataJoint's query language encourages you to think about what entity type you're querying:
1286-
1287-
```python
1288-
# Clear: Querying students
1289-
Student & 'name LIKE "A%"'
1290-
1291-
# Clear: Querying enrollments with student and course info
1292-
Enrollment * Student * Course
1293-
1294-
# Clear: Students with their enrollment count (still student entities)
1295-
Student.aggr(Enrollment, enrollment_count='COUNT(*)')
1296-
```
1297-
1298-
Each query result has a clear primary key (the entity being queried) and all attributes describe that entity.
1299-
13001226
## Practical Normalization Guidelines
13011227

13021228
When designing DataJoint schemas, follow these practical rules:
@@ -1720,7 +1646,7 @@ class Friendship(dj.Manual):
17201646
- **Method**: Design tables to represent one entity type each
17211647
- **Era**: Post-ER model (leverages conceptual clarity)
17221648
- **Focus**: Semantic meaning of entities and relationships
1723-
- **Unique extension**: Applies to both schema design AND query results
1649+
- **Key principles**: Immutability of tuples, schemas as workflows, permanent vs. changeable attributes
17241650

17251651
### The Unified Principle
17261652

@@ -1731,7 +1657,7 @@ When this principle is followed:
17311657
- Insertion anomalies are eliminated (entities can exist independently)
17321658
- Deletion anomalies are eliminated (deleting one entity doesn't affect others)
17331659
- Schema structure is clear (one entity type per table)
1734-
- Queries are meaningful (results represent coherent entity types)
1660+
- Data integrity is maintained through immutable tuples and explicit dependencies
17351661

17361662
### Practical Application
17371663

0 commit comments

Comments
 (0)