Skip to content

Commit 839f963

Browse files
update the foreign key chapter based on the lecture.
1 parent 2a9ac72 commit 839f963

File tree

1 file changed

+84
-71
lines changed

1 file changed

+84
-71
lines changed

book/30-schema-design/030-foreign-keys.ipynb

Lines changed: 84 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -11,29 +11,22 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"# Referential integrity\n",
14+
"# Modeling Relationships with Foreign Keys\n",
1515
"\n",
16-
"**Referential Integrity** is the guarantee made by the data management process that the entitites represented in the database remain correctly associated and mutually consistent and that relationships between them remain accurate.\n",
16+
"While **entity integrity** ensures that each record uniquely represents a real-world entity, **referential integrity** ensures that the *relationships between* these entities are valid and consistent. It's a guarantee that you won't have an employee assigned to a non-existent department or a task associated with a deleted project.\n",
1717
"\n",
18-
"Referential integrity is predicated on entity integrity. \n",
19-
"Without entity integrity, referential integrity cannot be properly defined nor enforced.\n",
18+
"Crucially, **referential integrity is impossible without entity integrity**. You must first have a reliable way to identify unique entities before you can define their relationships.\n",
2019
"\n",
21-
"# Foreign keys\n",
22-
"In relational databases, referential integrity is defined and enforced by the means of *foreign keys*, which establishes a reltionship between the *child table* that contains the foreign key and the *parent table* that is referenced by the foreign key. \n",
20+
"In relational databases, these relationships are established and enforced using **foreign keys**. A foreign key creates a link between a **child table** (the one with the reference) and a **parent table** (the one being referenced). Think of `Employee` as the child and `Title` as the parent; an employee must have a valid, existing title.\n",
2321
"\n",
24-
"A **foreign key** is a column or several columns in the child table referencing the primary key column(s) in the parent table.\n",
22+
"A foreign key is a column (or set of columns) in the child table that refers to the primary key of the parent table. In DataJoint, a foreign key *always* references a parent's primary key, which is a highly recommended practice for clarity and consistency.\n",
2523
"\n",
26-
"In DataJoint, the foreign *always* references the primary key of the parent table and that's the only way foreign keys are used. \n",
27-
"However, more generally in SQL and relational theory, foreign keys can reference other sets of columns.\n",
28-
"Such uses are esoteric and we avoid using them. \n",
29-
"All foreign key references in this book will reference the primary key of the parent table.\n",
30-
"\n",
31-
"In the following example, we create a foreign key between an employee and their work title. We first define a lookup table `Title` that lists all possible titles and the table `Employee` that containts references `Title`."
24+
"In the following example, we define the parent table `Title` and the child table `Employee`, which references `Title`."
3225
]
3326
},
3427
{
3528
"cell_type": "code",
36-
"execution_count": 1,
29+
"execution_count": 3,
3730
"metadata": {},
3831
"outputs": [
3932
{
@@ -42,14 +35,6 @@
4235
"text": [
4336
"Exception reporting mode: Minimal\n"
4437
]
45-
},
46-
{
47-
"name": "stderr",
48-
"output_type": "stream",
49-
"text": [
50-
"[2024-10-21 04:07:03,488][INFO]: Connecting root@localhost:3306\n",
51-
"[2024-10-21 04:07:03,506][INFO]: Connected root@localhost:3306\n"
52-
]
5338
}
5439
],
5540
"source": [
@@ -77,8 +62,6 @@
7762
" (\"HR-Mgr\", \"Human Resources Manager\")\n",
7863
" ]\n",
7964
"\n",
80-
"\n",
81-
"\n",
8265
"@schema\n",
8366
"class Employee(dj.Manual):\n",
8467
" definition = \"\"\"\n",
@@ -101,46 +84,46 @@
10184
},
10285
{
10386
"cell_type": "code",
104-
"execution_count": 2,
87+
"execution_count": 4,
10588
"metadata": {},
10689
"outputs": [
10790
{
10891
"data": {
10992
"image/svg+xml": [
110-
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"83pt\" height=\"113pt\" viewBox=\"0.00 0.00 83.25 113.12\">\n",
111-
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 109.12)\">\n",
112-
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-109.12 79.25,-109.12 79.25,4 -4,4\"/>\n",
93+
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"84pt\" height=\"114pt\" viewBox=\"0.00 0.00 84.00 114.00\">\n",
94+
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 110)\">\n",
95+
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-110 80,-110 80,4 -4,4\"/>\n",
11396
"<!-- Title -->\n",
11497
"<g id=\"node1\" class=\"node\">\n",
11598
"<title>Title</title>\n",
116-
"<g id=\"a_node1\"><a xlink:title=\"title_code           \r------------------------------\rfull_title           \r\">\n",
117-
"<polygon fill=\"#000000\" fill-opacity=\"0.125490\" stroke=\"none\" points=\"56.5,-105.12 18.75,-105.12 18.75,-70.56 56.5,-70.56 56.5,-105.12\"/>\n",
118-
"<text text-anchor=\"start\" x=\"26.75\" y=\"-85.72\" font-family=\"arial\" text-decoration=\"underline\" font-size=\"10.00\">Title</text>\n",
99+
"<g id=\"a_node1\"><a xlink:title=\"title_code           &#13;------------------------------&#13;full_title           &#13;\">\n",
100+
"<polygon fill=\"#000000\" fill-opacity=\"0.125490\" stroke=\"transparent\" points=\"57,-106 19,-106 19,-71 57,-71 57,-106\"/>\n",
101+
"<text text-anchor=\"start\" x=\"27\" y=\"-87\" font-family=\"arial\" text-decoration=\"underline\" font-size=\"10.00\">Title</text>\n",
119102
"</a>\n",
120103
"</g>\n",
121104
"</g>\n",
122105
"<!-- Employee -->\n",
123106
"<g id=\"node2\" class=\"node\">\n",
124107
"<title>Employee</title>\n",
125-
"<g id=\"a_node2\"><a xlink:title=\"person_id            \r------------------------------\rfirst_name           \rlast_name            \r→ Title\r\">\n",
126-
"<polygon fill=\"#00ff00\" fill-opacity=\"0.188235\" stroke=\"#00ff00\" stroke-opacity=\"0.188235\" points=\"75.25,-34.56 0,-34.56 0,0 75.25,0 75.25,-34.56\"/>\n",
127-
"<text text-anchor=\"start\" x=\"8\" y=\"-14.01\" font-family=\"arial\" text-decoration=\"underline\" font-size=\"12.00\" fill=\"darkgreen\">Employee</text>\n",
108+
"<g id=\"a_node2\"><a xlink:title=\"person_id            &#13;------------------------------&#13;first_name           &#13;last_name            &#13;→ Title&#13;\">\n",
109+
"<polygon fill=\"#00ff00\" fill-opacity=\"0.188235\" stroke=\"#00ff00\" stroke-opacity=\"0.188235\" points=\"76,-35 0,-35 0,0 76,0 76,-35\"/>\n",
110+
"<text text-anchor=\"start\" x=\"8\" y=\"-15.4\" font-family=\"arial\" text-decoration=\"underline\" font-size=\"12.00\" fill=\"darkgreen\">Employee</text>\n",
128111
"</a>\n",
129112
"</g>\n",
130113
"</g>\n",
131114
"<!-- Title&#45;&gt;Employee -->\n",
132115
"<g id=\"edge1\" class=\"edge\">\n",
133116
"<title>Title-&gt;Employee</title>\n",
134-
"<path fill=\"none\" stroke=\"#000000\" stroke-width=\"0.75\" stroke-dasharray=\"5,2\" stroke-opacity=\"0.250980\" d=\"M37.62,-70.59C37.62,-59.82 37.62,-45.73 37.62,-34.89\"/>\n",
117+
"<path fill=\"none\" stroke=\"#000000\" stroke-width=\"0.75\" stroke-dasharray=\"5,2\" stroke-opacity=\"0.250980\" d=\"M38,-70.8C38,-59.95 38,-45.87 38,-35.05\"/>\n",
135118
"</g>\n",
136119
"</g>\n",
137120
"</svg>"
138121
],
139122
"text/plain": [
140-
"<datajoint.diagram.Diagram at 0x7fe136962e50>"
123+
"<datajoint.diagram.Diagram at 0xffff5ba52660>"
141124
]
142125
},
143-
"execution_count": 2,
126+
"execution_count": 4,
144127
"metadata": {},
145128
"output_type": "execute_result"
146129
}
@@ -156,6 +139,23 @@
156139
"The parent table `Title` is above and the child table `Employee` is below."
157140
]
158141
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"```{admonition} A Logical Constraint, not a Physical Pointer\n",
147+
":class: tip\n",
148+
"\n",
149+
"A revolutionary concept in the relational model is that a foreign key is **not a physical pointer** to a location on a disk. Instead, it is a **logical constraint** enforced at runtime.\n",
150+
"\n",
151+
"When you try to insert a row into a child table, the database doesn't follow a pre-existing \"link.\" It performs a search on the parent table to see if a record with a matching primary key exists. If a match is found, the insert is allowed; otherwise, it is rejected.\n",
152+
"\n",
153+
"This is fundamentally different from other data models like HDF5, where data is often linked by direct pointers or paths [^1]. The logical nature of foreign keys gives relational databases their flexibility and integrity.\n",
154+
"\n",
155+
"[^1]: The HDF Group. \"HDF5 User's Guide: Groups and Links\". [https://docs.hdfgroup.org/hdf5/develop/H5.intro.html#intro-groups](https://docs.hdfgroup.org/hdf5/develop/H5.intro.html#intro-groups)\n",
156+
"```"
157+
]
158+
},
159159
{
160160
"cell_type": "markdown",
161161
"metadata": {},
@@ -174,7 +174,7 @@
174174
},
175175
{
176176
"cell_type": "code",
177-
"execution_count": 3,
177+
"execution_count": 5,
178178
"metadata": {},
179179
"outputs": [
180180
{
@@ -261,7 +261,7 @@
261261
" (Total: 0)"
262262
]
263263
},
264-
"execution_count": 3,
264+
"execution_count": 5,
265265
"metadata": {},
266266
"output_type": "execute_result"
267267
}
@@ -274,16 +274,16 @@
274274
"cell_type": "markdown",
275275
"metadata": {},
276276
"source": [
277+
"### Effect 2: Inserts into the Child Table are Restricted\n",
277278
"\n",
278-
"### Effect 2: Inserts into the child table are restricted unless there is a match in the parent table \n",
279-
"When inserting a new row into the child table, the database ensures that the foreign key value **must match a primary key** in the parent table. If no matching parent row exists, the insert is rejected, preventing **orphaned records** in the child table.\n",
279+
"A foreign key ensures that no \"orphaned\" records are created. An insert into the child table is only permitted if the foreign key value corresponds to an existing primary key in the parent table.\n",
280280
"\n",
281-
"For example, let's try inserting two employees. The first will use an existing title where as the other will use a new title."
281+
"The rule is simple: **Inserts are restricted in the child, not the parent.** You can always add new job titles, but you cannot add an employee with a `title_code` that doesn't exist in the `Title` table."
282282
]
283283
},
284284
{
285285
"cell_type": "code",
286-
"execution_count": 4,
286+
"execution_count": 6,
287287
"metadata": {},
288288
"outputs": [],
289289
"source": [
@@ -293,15 +293,15 @@
293293
},
294294
{
295295
"cell_type": "code",
296-
"execution_count": 5,
296+
"execution_count": 7,
297297
"metadata": {},
298298
"outputs": [
299299
{
300300
"ename": "IntegrityError",
301301
"evalue": "Cannot add or update a child row: a foreign key constraint fails (`company`.`employee`, CONSTRAINT `employee_ibfk_1` FOREIGN KEY (`title_code`) REFERENCES `#title` (`title_code`) ON DELETE RESTRICT ON UPDATE CASCADE)",
302302
"output_type": "error",
303303
"traceback": [
304-
"\u001b[0;31mIntegrityError\u001b[0m\u001b[0;31m:\u001b[0m Cannot add or update a child row: a foreign key constraint fails (`company`.`employee`, CONSTRAINT `employee_ibfk_1` FOREIGN KEY (`title_code`) REFERENCES `#title` (`title_code`) ON DELETE RESTRICT ON UPDATE CASCADE)\n"
304+
"\u001b[31mIntegrityError\u001b[39m\u001b[31m:\u001b[39m Cannot add or update a child row: a foreign key constraint fails (`company`.`employee`, CONSTRAINT `employee_ibfk_1` FOREIGN KEY (`title_code`) REFERENCES `#title` (`title_code`) ON DELETE RESTRICT ON UPDATE CASCADE)\n"
305305
]
306306
}
307307
],
@@ -314,20 +314,40 @@
314314
"cell_type": "markdown",
315315
"metadata": {},
316316
"source": [
317-
"### Effect 3. Deletes from the parent table are restricted for rows that have matching children \n",
318-
"A parent record cannot be deleted if it is referenced by any child records. This restriction prevents **broken relationships** between tables. The only way to delete the parent is to first delete or update the dependent child records, or to use a **cascading delete** that removes both parent and child rows.\n",
317+
"### Effect 3: Deletes from the Parent Table are Restricted\n",
319318
"\n",
320-
"- **Cascading Delete Option**: With cascading delete enabled, deleting a parent row automatically removes all its associated child rows.\n",
319+
"To prevent broken relationships, a parent record cannot be deleted if any child records still refer to it.\n",
321320
"\n",
321+
"The rule is the inverse of the insert rule: **Deletes are restricted in the parent, not the child.** You can always delete an employee, but you cannot delete a title if it is still assigned to an employee.\n",
322322
"\n",
323-
"Deleting from `Title` will generate a warning that children in `Employee` will be deleted too. Such cascading will go down many levels through the hierarchy."
323+
"In standard SQL, this operation would fail with a constraint error. DataJoint, however, implements a **cascading delete**. It will warn you that deleting the parent record will also delete all dependent child records, which can cascade through many levels of a deep hierarchy."
324324
]
325325
},
326326
{
327327
"cell_type": "code",
328-
"execution_count": null,
328+
"execution_count": 8,
329329
"metadata": {},
330-
"outputs": [],
330+
"outputs": [
331+
{
332+
"name": "stderr",
333+
"output_type": "stream",
334+
"text": [
335+
"[2025-09-18 14:31:21,418][INFO]: Deleting 1 rows from `company`.`employee`\n",
336+
"[2025-09-18 14:31:21,422][INFO]: Deleting 7 rows from `company`.`#title`\n",
337+
"[2025-09-18 14:31:29,056][WARNING]: Delete cancelled\n"
338+
]
339+
},
340+
{
341+
"data": {
342+
"text/plain": [
343+
"7"
344+
]
345+
},
346+
"execution_count": 8,
347+
"metadata": {},
348+
"output_type": "execute_result"
349+
}
350+
],
331351
"source": [
332352
"Title.delete()"
333353
]
@@ -336,35 +356,28 @@
336356
"cell_type": "markdown",
337357
"metadata": {},
338358
"source": [
339-
"### Effect 4. Restrict Updates to the Foreign Key and the Referenced Primary Key \n",
340-
"Foreign keys **restrict updates** on both the child and parent tables to maintain data consistency.\n",
341-
"\n",
342-
"DataJoint does not support updates of primary key values since such updates have the potential for breaking down referential integrity.\n",
343-
"Normal data manipulations are performed by deletes and inserts. However SQL and relational theory more generally supports such operations. \n",
344-
"\n",
345-
"- **Updates in the Parent Table**: If the primary key of a parent record is updated, all dependent child records must be updated to maintain referential integrity. However, unless **cascading updates** are configured, these updates are blocked to prevent inconsistency.\n",
359+
"### Effect 4: Updates to Referenced Keys are Restricted\n",
346360
"\n",
347-
"- **Updates in the Child Table**: Similarly, updating the foreign key value in a child record is restricted to ensure that the new value matches a valid parent row.\n",
361+
"To maintain referential integrity, updates to a parent's primary key or a child's foreign key are restricted.\n",
348362
"\n",
349-
"- **Cascading Update Option**: With cascading updates enabled, changes to a parents primary key will automatically propagate to the related child records.\n",
363+
"In general relational theory, databases can be configured to handle this with **cascading updates**, where changing a parent's primary key automatically propagates that change to all child records.\n",
350364
"\n",
351-
"\n",
352-
"### Effect 5. Create an index in the child table for fast searches on the foreign key \n",
353-
"To optimize performance, the database **automatically creates an index** on the foreign key column(s) in the child table. This index allows the database to efficiently find child records that refer to a specific parent row, improving query performance during joins and lookups."
365+
"However, DataJoint does not support updating primary key values, as this can risk breaking referential integrity in complex scientific workflows. The preferred and safer pattern in DataJoint is to **delete the old record and insert a new one** with the updated information."
354366
]
355367
},
356368
{
357369
"cell_type": "markdown",
358370
"metadata": {},
359371
"source": [
360-
"# Summary \n",
361-
"Foreign keys ensure **referential integrity** by imposing constraints on how data is added, modified, and deleted across related tables. The five key effects are:\n",
372+
"# Summary\n",
373+
"\n",
374+
"Foreign keys ensure referential integrity by linking a child table to a parent table. This link imposes five key effects:\n",
362375
"\n",
363-
"1. **Schema Design**: The parents primary key becomes part of the child table as a foreign key. \n",
364-
"2. **Insert Restriction**: Inserts into the child table are blocked unless a matching parent row exists. \n",
365-
"3. **Delete Restriction**: Deleting a parent row is blocked unless dependent child rows are handled (or cascading delete is enabled). \n",
366-
"4. **Update Restriction**: Updates to the primary key in the parent table and foreign key in the child table are restricted to maintain consistency, unless cascading updates are explicitly allowed.\n",
367-
"5. **Performance Optimization**: An index on the foreign key in the child table ensures fast searches and efficient joins. \n"
376+
"1. **Schema Embedding**: The parent's primary key is added as columns to the child table.\n",
377+
"2. **Insert Restriction**: A row cannot be added to the **child** if its foreign key doesn't match a primary key in the **parent**.\n",
378+
"3. **Delete Restriction**: A row cannot be deleted from the **parent** if it is still referenced by any rows in the **child**.\n",
379+
"4. **Update Restriction**: Updates to the primary and foreign keys are restricted to prevent inconsistencies.\n",
380+
"5. **Performance Optimization**: An index is automatically created on the foreign key in the child table to speed up searches and joins."
368381
]
369382
}
370383
],
@@ -384,7 +397,7 @@
384397
"name": "python",
385398
"nbconvert_exporter": "python",
386399
"pygments_lexer": "ipython3",
387-
"version": "3.11.10"
400+
"version": "3.13.2"
388401
},
389402
"orig_nbformat": 4
390403
},

0 commit comments

Comments
 (0)