Skip to content

Commit f92eb3c

Browse files
refine the discussion of DataJoint's new paradigm
1 parent 1bc2df8 commit f92eb3c

File tree

1 file changed

+40
-24
lines changed

1 file changed

+40
-24
lines changed

book/20-concepts/03-workflows.md

Lines changed: 40 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,43 @@
22

33
## The Entity-Workflow Model: A New Paradigm for Relational Databases
44

5-
This book introduces a revolutionary approach to relational database design and implementation: the **Entity-Workflow Model** embodied by DataJoint.
6-
While traditional database education focuses on E.F. Codd's mathematical foundations and the Entity-Relationship Model, the Entity-Workflow Model represents a fundamental evolution that addresses the limitations of both approaches.
5+
The relational data model, while powerful, offers considerable semantic flexibility that can be both a blessing and a curse. This flexibility has led to the development of distinct conceptual frameworks for understanding and applying relational principles in database design and operations. While these approaches share common underlying constructs (tables, data types, primary keys, foreign keys, etc.), they operate on fundamentally different semantics that lead to distinct approaches to database design, data manipulation, and query formation.
76

8-
## The Evolution of Relational Database Thinking
7+
This book introduces a paradigm shift in how we think about relational database design and implementation: the **Entity-Workflow Model**. This model is embodied by DataJoint and affects how we think about database design, data manipulation, and query formation.
98

109
To understand the significance of the Entity-Workflow Model, we must first examine the two dominant paradigms that preceded it and their inherent limitations.
1110

12-
### The Mathematical Foundation: Codd's Predicate Calculus Approach
11+
## The Mathematical Foundation: Codd's Predicate Calculus Approach
1312

14-
Edgar F. Codd's relational model, rooted in **predicate calculus** and **set theory**, treats relations as mathematical predicates—statements about variables that can be determined to be true or false.
13+
### Core Concepts
14+
The **mathematical view** of the relational model, championed by Edgar F. Codd, is rooted in **predicate calculus**, **first-order logic**, and **set theory**. This approach treats relations as mathematical predicates—statements about variables that can be determined to be true or false.
1515

16-
#### Core Concepts
16+
**Relation as Predicate**: In the mathematical view of relational databases, a table (relation) represents a logical predicate; it contains the complete set of all facts (propositions) that make the predicate true. For example, the table "EmployeeProject" represents the predicate "Employee $x$ works on Project $y$."
1717

18-
**Relation as Predicate**: A table represents a logical predicate containing all facts that make the predicate true.
18+
**Tuple as Proposition**: Each row (tuple) is a specific set of attribute values that asserts a true proposition for the predicate. For example, if a table's predicate is "Employee $x$ works on Project $y$," the row `(Alice, P1)` asserts the truth: "Employee Alice works on Project P1."
1919

20-
**Functional Dependencies**: The foundation of normalization theory, where attribute `A` functionally determines attribute `B` (written `A → B`).
20+
**Functional Dependencies between Attributes**: The core concept is the functional dependency: attribute `A` functionally determines attribute `B` (written `A → B`) if knowing the value of `A` allows you to determine the unique value of `B`. For example, the attribute `department` functionally determines the attribute `department_chair` because knowing the department name allows you to determine the unique name of the department chair. Functional dependencies are helpful for reasoning about the structure of the database and for performing queries.
2121

22-
**Normalization Principle**: "Every non-key attribute must depend on the key, the whole key, and nothing but the key."
22+
Then the database can be viewed as a collection of predicates and a mininmal complete set of true propositions from which all other true propositions can be derived. Data queries are viewed as logical inferences using the rules of predicate calculus. *Relational algebra* and *relational calculus* provide set of operations that can be used to perform these inferences. Under the Closed World Assumption (CWA), the database is assumed to contain all true propositions and all other propositions are assumed to be false. CWA is a simplifying assumption that allows us to reason about the data in the database in a more precise way.
2323

24-
#### Limitations
24+
Then the question of database design is to choose a minimal complete set of true propositions from which all other true propositions can be effectively derived. This is the problem of *database normalization*, a collection of design principles—called *normal forms*—that ensure data integrity and maintainability and makes databases more amenable to analysis and inference.
2525

26-
While mathematically rigorous, Codd's approach suffers from several practical limitations:
26+
SQL—the primary language for defining and querying relational databases—is based on the mathematical semantics of the relational model. However, in practice, even most experienced database programmers hardly rely on the mathematical semantics of the relational model. Educational materials typically use more intuitive design methodologies and then teach how to translate the conceptual design into SQL.
27+
28+
### Limitations
29+
30+
While mathematically rigorous, Codd's mathematical semantics approach suffers from several practical limitations:
2731

28-
- **No Diagramming Notation**: Designers must work with abstract dependency analysis without visual representations
2932
- **Abstract Reasoning**: Requires thinking in terms of predicates and functional dependencies rather than intuitive domain concepts
30-
- **Implementation Gap**: SQL's mathematical semantics don't align with how people naturally conceptualize domains
3133
- **Learning Curve**: Demands mastery of formal mathematical concepts that don't map to real-world thinking
34+
- **No Diagramming Notation**: Designers must work with abstract dependency analysis without visual representations
3235

33-
### The Entity-Relationship Revolution: Chen's Domain Modeling
36+
As a result, even most proficient database programmers are rarely aware of the mathematical semantics of the relational model and design principles such as Codd's normal forms. They use on more intuitive design methodologies and then translate them into SQL.
3437

35-
Peter Chen's Entity-Relationship Model (1976) revolutionized database design by shifting from abstract mathematical concepts to concrete domain modeling: @10.1145/320434.320440, @10.1007/978-3-642-59412-0_17
38+
## The Entity-Relationship Revolution
39+
40+
Today, most common aproaches to database design are based on the Entity-Relationship Model (ERM), which shifts the focus from abstract mathematical reasoning to concrete domain modeling.
41+
Introduced by MIT professor Peter Chen [@10.1145/320434.320440], the ERM models the domain of interest as a collection of entities and relationships between them [@10.1007/978-3-642-59412-0_17].
3642

3743
```{figure} ../images/PChen.jpeg
3844
:name: Peter Chen
@@ -41,20 +47,30 @@ Peter Chen's Entity-Relationship Model (1976) revolutionized database design by
4147
Peter Chen, born in 1943, Taiwanese-American computer scientist, inventor of the Entity-Relationship Model.
4248
```
4349

44-
#### Core Concepts
50+
### Core Concepts
51+
The key insight of the ERM is that the relational model can be viewed through the lens of entities and relationships between them.
52+
Tables in the database represent either sets of well-defined entity sets of the same type or relationships between entities, mapping the database schema to them domain of interest.
53+
Foreign keys between tables define the cardinality and optionality of the relationships between entity sets.
4554

46-
**Entity Set**: An unordered collection of identifiable items that share the same attributes and are distinguished by a primary key.
55+
One of ERM's most significant contributions is the ability to visualize the database schema as an Entity-Relationship Diagram (ERD).
56+
Several different notations have been developed, including Chen's original notation with rectangles and diamonds as well as Crow's Foot notation (see below).
4757

48-
**Entity Normalization Principle**: "Each table represents exactly one well-defined entity type, identified by the table's primary key."
58+
```{figure} ../images/employee-project-erd.svg
59+
:align: center
60+
```
4961

50-
#### Achievements
62+
```{mermaid}
63+
---
64+
title: Crow's Foot notation.
65+
---
66+
erDiagram
67+
EMPLOYEE }o--o{ PROJECT : assigned-to
68+
```
69+
Entity-relationship diagram in [Crow's Foot notation](https://mermaid.js.org/syntax/entityRelationshipDiagram.html).
70+
:::
5171

52-
ERM brought significant advances:
53-
- **Comprehensive Diagramming**: Visual ERDs with multiple notation styles
54-
- **Intuitive Design**: Maps naturally to how people think about domains
55-
- **Dominant Paradigm**: Became the standard for conceptual database design
5672

57-
#### Persistent Limitations
73+
### Limitations
5874

5975
Despite its success, ERM still suffers from fundamental gaps:
6076

0 commit comments

Comments
 (0)