Skip to content

Commit de48fe5

Browse files
reorganize Concepts
1 parent acadf96 commit de48fe5

File tree

3 files changed

+203
-173
lines changed

3 files changed

+203
-173
lines changed

book/20-concepts/02-relational.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Relational Model
2+
3+
## Origins of Relational Theory
4+
5+
Relations are a key concept in mathematics, representing how different elements from one set are connected to elements of another set. When two sets are of elements are related to each other, this forms a *second-order* or *binary* relation. Higher orders are also possible: third, fourth, and $n^{th}$ order relations.
6+
7+
If you are conversant in Set Theory, then an $n^{th}$ order relation is formally defined as a subset of a Cartesian product of $n$ sets.
8+
Many useful operations can be modeled as operations on such relations.
9+
10+
Imagine two sets: one set representing clinics and another representing animal species.
11+
A relation between these two sets would indicate, for example, which clinics treat which species.
12+
13+
```{figure} ../images/relations.png
14+
:name: relations
15+
:width: 75 %
16+
:alt: mathematical relations
17+
18+
Relations are mappings of elements of one set to elements of another domain (binary relations). Higher order relations map elements of three, four and and more sets.
19+
```
20+
21+
This diagram illustrates two different relations between "Clinics" and "Species."
22+
On the left side, the relation shows Clinic 1 is connected to "goat," Clinic 2 is connected to "dog," and Clinic 3 is connected to both "goat" and "cow."
23+
On the right side, the relation changes: Clinic 1 is now connected to "dog," "horse," and "goat," Clinic 2 is connected to "dog" and "horse," and Clinic 3 is connected to "goat."
24+
Each line connecting elements of the two sets is called a **tuple** and represents an ordered pairing of values from the corresponding domains.
25+
Then a relation can be thought of as a set of tuples.
26+
The number of tuples in a relation is called its *cardinality* and the number of domains participating in the relation is its *order*.
27+
This diagram shows binary relations.
28+
Relations can be binary, ternary, or of higher orders.
29+
30+
Mathematically, a relation between two sets: $A$ (e.g., *clinics*) and $B$ (e.g., *species*) is a subset of their Cartesian product $A \times B$.
31+
This means the relation is a collection of ordered pairs $a, b$, where each $a$ is an element from set $A$, and each $b$ is an element from set $B$.
32+
In the context of the diagram, each pair represents a specific connection, such as (Clinic 1, Dog) or (Clinic 3, Cow).
33+
34+
These relations are not fixed and can change depending on the context or criteria, as shown by the two different values in the diagram. The flexibility and simplicity of relations make them a powerful tool for representing and analyzing connections in various domains.
35+
36+
The concept of relations has a rich history that dates back to the mid-19th century. The groundwork for relational theory was first laid by Augustus De Morgan, an English mathematician and logician, who introduced early ideas related to relations in his work on logic and algebra. De Morgan's contributions were instrumental in setting the stage for the formalization of relations in mathematics.
37+
38+
```{figure} ../images/demorgan.jpg
39+
:name: Augustus De Morgan
40+
:width: 300px
41+
42+
[Augustus De Morgan](https://en.wikipedia.org/wiki/Augustus_De_Morgan) (1806-1871) developed the original fundamental concepts of relational theory, including operations on relations.
43+
```
44+
45+
The development of relational theory as a formal mathematical framework is largely credited to Georg Cantor, a German mathematician, in the late 19th century. Cantor is known as the father of set theory, which is the broader mathematical context in which relations are defined. His work provided a rigorous foundation for understanding how sets (collections of objects) interact with each other through relations.
46+
47+
Cantor's set theory introduced the idea that relations could be seen as subsets of Cartesian products, where the Cartesian product of two sets $A$ and $B$ is the set of all possible ordered pairs $(a, b)$ where $a$ is from $A$ and $b$ is from $B$. This formalization allowed for the systematic study of relations and their properties, leading to the development of modern mathematical logic, database theory, and many other fields.
48+
49+
```{figure} ../images/georg_cantor.jpg
50+
:name: Georg Cantor
51+
:width: 300px
52+
53+
[Georg Cantor](https://en.wikipedia.org/wiki/Georg_Cantor) (1845-1918) reframed relations in the context of Set Theory
54+
```
55+
56+
## Mathematical Foundations
57+
Relational theory is not just a mathematical curiosity; it is a powerful tool that underpins many important concepts in mathematics and computer science. The ability to describe and analyze how different objects are connected is fundamental to many areas of study.
58+
59+
One of the most important applications of relational theory is in the concept of **functions**. A function is a specific type of relation where each element in the domain (the first set) is associated with exactly one element in the codomain (the second set). Functions are essential in nearly every area of mathematics, from calculus to linear algebra.
60+
61+
Relational theory is a superset of **graph theory**, where relationships between objects can be visualized as graphs.
62+
A directed graph can be thought of as a binary relation between a set of vertices and the same set of vertices again.
63+
Each tuple in such relation represents an edge in the graph.
64+
Graph theory helps in understanding complex networks, such as social networks, computer networks, and even biological networks.
65+
Thus theorems discovered or proven in relational theory also apply to graphs.
66+
67+
Relational theory also extends to concepts like **equivalence relations** and **order relations**. Equivalence relations partition a set into disjoint subsets called equivalence classes, while order relations arrange elements in a sequence. These concepts are fundamental in areas such as algebra, topology, and analysis.
68+
69+
Relational theory has been shown to deeply interconnected to **first-order logic** and **predicate calculus** at the foundations of mathematics and logic.
70+
Relational theory, which focuses on the study of relations between elements of sets, forms the basis for the predicates used in first-order logic and predicate calculus.
71+
In first-order logic, predicates represent relations, and the logical statements describe how these relations interact.
72+
The equivalence between relational theory and first-order logic was notably formalized by Alfred Tarski in the 1930s.
73+
Tarski demonstrated that every relation can be described by a formula in first-order logic, establishing a profound connection between these mathematical frameworks that has since underpinned much of modern theoretical computer science and logic.
74+
75+
## Relational Algebra and Calculus
76+
77+
**Relational algebra** is a set of operations that can be used to transform relations in a formal way.
78+
It provides the foundation for querying relational databases, allowing us to combine, modify, and retrieve data stored in tables (relations).
79+
80+
Examples of relational operators:
81+
82+
- **Selection (σ):** Selects tuples from a relation that satisfy a given condition.
83+
- **Projection (π):** Selects specific attributes from a relation.
84+
- **Union (∪):** Combines the tuples from two relations into a single relation.
85+
- **Set Difference (−):** Returns the tuples that are in one relation but not in another.
86+
- **Cartesian Product (×):** Combines every tuple from one relation with every tuple from another.
87+
- **Rename (ρ):** Renames the attributes of a relation.
88+
- **Join (⨝):** Combines related tuples from two relations based on a common attribute.
89+
90+
Such operators together represent an algebra: ways to transform relations into other relations.
91+
Some operators are binary, i.e. they accept two relations as inputs to produce another relation as output.
92+
The operators are *algebraically closed*, i.e. the operators take relations as inputs and produce relations as outputs.
93+
This means elementary operators can be combined in sophisticated ways to compose complex expressions.
94+
**Algebraic closure** is an important concept behind the expressive power of relational operators.
95+
96+
To illustrate a relational operator, let’s consider the **union operator (∪)** using the two relational values from the diagram. The union of these two relations would combine all the connections from both diagrams into a single relation.
97+
98+
In the first value (left side of the diagram), we have the following connections:
99+
100+
- Clinic 1 → Dog
101+
- Clinic 2 → Horse
102+
- Clinic 3 → Goat, Cow
103+
104+
In the second value (right side of the diagram), the connections are:
105+
106+
- Clinic 1 → Dog, Cat, Goat
107+
- Clinic 2 → Dog, Horse
108+
- Clinic 3 → Goat
109+
110+
The union of these two relations would include all the connections:
111+
112+
- Clinic 1 → Dog, Cat, Goat
113+
- Clinic 2 → Dog, Horse
114+
- Clinic 3 → Goat, Cow
115+
116+
This operation effectively merges the connections from both sets of values, providing a comprehensive view of all possible relations between clinics and species.
117+
118+
Relational algebra, with its powerful operators, allows us to query and manipulate data in a structured and efficient way, forming the backbone of modern database systems. By understanding and applying these operators, we can perform complex data analysis and retrieval tasks with precision and clarity.
119+
120+
Another formal language for deriving new relations from scratch or from from other relations is **relational calculus**.
121+
Rather than using relational operators, it relies on a *set-building notation* to generate relations.
122+
123+
:::{note}
124+
The query notation of the SQL programming language combines concepts from both relational algebra and relational calculus.
125+
However, DataJoint's query language is based purely on relational algebra.
126+
:::
127+
128+
## Relational Database Model
129+
The **relational data model** is the brainchild of the British-American mathematician and engineer [Edgar F. Codd.](https://amturing.acm.org/award_winners/codd_1000892.cfm), earning him the prestigeous Turing Award in 1981.
130+
131+
Working at IBM, Codd explored the possibility of translating the mathematic rigor of relational theory into powerful system for large-scale data management and operation [@10.1145/362384.362685].
132+
133+
```{figure} ../images/Ted-Codd.jpg
134+
:name: Edgar F. Codd
135+
:width: 300px
136+
137+
[Edgar F. Codd](https://en.wikipedia.org/wiki/Edgar_F._Codd) (1923-2003) revolutionized database theory and practice by translating the mathematical theory of relations to data management and operations.
138+
```
139+
140+
Codd's model was derived from relational theory but differed sufficiently in its basic definitions to give birth to a new type of algebra.
141+
The relational data model gave mathematicians a rigorous theory for optimizing data organization and storage and to construct queries.
142+
Through the 1970s, before relational databases became practical, theorists derived fundamental rules for rigorous data organization and queries from first principles using mathematical proofs and derivations.
143+
For this reason, early work on relational databases has an abstract academic feel to it with rather simple toy examples: the ubiquitous employees/departments, products/orders, and students/courses.
144+
The design principles were defined through the rigorous but rather abstract principles, the **normal forms** [@10.1145/358024.358054].
145+
146+
The relational data model is one of the most powerful and precise ways to store and manage structured data.
147+
At its core, this model organizes all data into tables--representing mathematical relations---where each table consists of rows (representing mathematical *tuples*) and columns (often called *attributes*).
148+
149+
The relational model is built on several key principles, including:
150+
151+
- **Data Representation:** All data is represented in the form of simple tables, with each table having a unique name and a well-defined structure.
152+
- **Domain Constraints:** Each column in a table is associated with a specific domain (or *datatype*, a set of possible values), ensuring that the data entered is valid.
153+
- **Uniqueness Constraints:** ensure that each row in a table is unique, enforced through a primary key.
154+
- **Referential Constraints:** ensure that relationships between tables remain consistent, enforced through foreign keys.
155+
- **Declarative Queries:** The model allows users to write queries that specify *what* data they want rather than *how* the database will retrieve it.
156+
157+
The most common way to interact with relational databases is through the Structured Query Language (SQL).
158+
SQL is a language specifically designed to define, manipulate, and query data within relational databases.
159+
It includes sublanguages for defining data structure, manipulating data, and querying data.
160+
161+
When speaking with database programmers and computer scientists, you will often run into different terminologies.
162+
Practical database programmers speak of tables and rows while theoretical data modelers may describe the same things as *relations* and *tuples*.
163+
164+
:::{table} The difference in terminology used in relational theory and relational databases.
165+
:widths: auto
166+
:align: center
167+
| Relational Theory | Database Programming & SQL | Description |
168+
|:--|:--|:--|
169+
| **Relation** | **Table** | A set of tuples that share the same attributes from each of the domains in the relation. |
170+
| **Domain** | **Data Type** | The set of permissible values that an attribute can take in any tuple of a relation. |
171+
| **Attribute** | **Column** | The positional values in the tuples of relation drawn from their corresponding domain. |
172+
| **Attribute value** | **Field** | The positional value in a specific tuple. |
173+
| **Tuple** | **Record** or **Row** | A single element of a relation, containing a value for each attribute. |
174+
:::
175+
176+
# Exercises
177+
178+
1. Extend the binary relation `Clinic-Species` to a higher order, e.g. a ternary relation.
179+
180+
:::{hint} Possible soluton
181+
182+
Add a third domain, `Treatment`, for the treatments that clinics offer for each species.
183+
This will allow forming a ternary relation `Clinic-Species-Treatment`.
184+
185+
Now think of yet another way to extend the relation to a higher order.
186+
:::
187+
188+
2. Imagine that you have two binary relations: `Clinic-Species` and `Species-Treatment`.
189+
How can these two binary relations be joined into a ternary relation: `Clinic-Species-Treatment`?
190+
What would the rules be for forming this result?
191+
What will be the cardinality (number of tuples) of the result?
192+
193+
3. Imagine that we decide to remove the domain `Species` from the relation `Clinic-Species-Treatment`, producing a new binary relation `Clinic-Treatment`. How will the number of tuples be affected? What will be the
194+
What would be the rules for this operation?
195+
How would the cardinality (number of elements) change in the result?
196+
197+
4. Work through the example of a database model in Chen's EM notation in @10.1093/jamia/ocx033.
198+
What are its entities and relationships? Explain what operations this database supports.
199+
200+
5. Work through the example of an multiplayer online role-playing game database model in Chen's EM notation listed on the [ERM Wikipedia page](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#Crow's_foot_notation)
201+
202+
5. Learn to create diagrams in Crow's Foot notation using Mermaid: https://mermaid.js.org/syntax/entityRelationshipDiagram.html

0 commit comments

Comments
 (0)