Skip to content

Commit 39b9008

Browse files
Enhance the Normalization chapter
1 parent 1c2e07a commit 39b9008

File tree

1 file changed

+10
-57
lines changed

1 file changed

+10
-57
lines changed

book/30-schema-design/055-normalization.ipynb

Lines changed: 10 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -285,22 +285,23 @@
285285
"cell_type": "markdown",
286286
"metadata": {},
287287
"source": [
288-
"## Practical Example: Animal Research Lab\n",
288+
"## DataJoint's Workflow Perspective\n",
289289
"\n",
290-
"Let's apply these principles to design a schema for tracking mice in a research lab.\n"
291-
]
292-
},
293-
{
294-
"cell_type": "markdown",
295-
"metadata": {},
296-
"source": [
297-
"### ❌ Poor Design (Violates Normalization)\n"
290+
"A fundamental insight underlying DataJoint's normalization approach: **databases are workflows where downstream data depends on the integrity of upstream data**.\n",
291+
"\n",
292+
"This workflow-centric view fundamentally shapes normalization principles and explains why DataJoint emphasizes immutability and avoidance of updates.\n"
298293
]
299294
},
300295
{
301296
"cell_type": "markdown",
302297
"metadata": {},
303298
"source": [
299+
"## Practical Example: Animal Research Lab\n",
300+
"\n",
301+
"Let's apply these principles to design a schema for tracking mice in a research lab.\n",
302+
"\n",
303+
"### ❌ Poor Design (Violates Normalization)\n",
304+
"\n",
304305
"```python\n",
305306
"@schema\n",
306307
"class Mouse(dj.Manual):\n",
@@ -317,17 +318,6 @@
317318
"```"
318319
]
319320
},
320-
{
321-
"cell_type": "markdown",
322-
"metadata": {},
323-
"source": [
324-
"## DataJoint's Workflow Perspective\n",
325-
"\n",
326-
"A fundamental insight underlying DataJoint's normalization approach: **databases are workflows where downstream data depends on the integrity of upstream data**.\n",
327-
"\n",
328-
"This workflow-centric view fundamentally shapes normalization principles and explains why DataJoint emphasizes immutability and avoidance of updates.\n"
329-
]
330-
},
331321
{
332322
"cell_type": "markdown",
333323
"metadata": {},
@@ -423,43 +413,6 @@
423413
"The dependency chain is **explicit** and **enforced**.\n"
424414
]
425415
},
426-
{
427-
"cell_type": "markdown",
428-
"metadata": {},
429-
"source": [
430-
"### How Workflow Thinking Leads to Normalization Principles\n",
431-
"\n",
432-
"The workflow perspective directly motivates DataJoint's normalization principles:\n",
433-
"\n",
434-
"**1. Immutability (INSERT/DELETE, not UPDATE)**\n",
435-
"- **Why**: Updates hide broken dependencies in the workflow\n",
436-
"- **Workflow view**: Upstream data is \"input\" to downstream computations—changing input invalidates output\n",
437-
"- **Solution**: DELETE forces explicit handling of all dependent data\n",
438-
"\n",
439-
"**2. Separate Changeable Attributes (Rule 3)** \n",
440-
"- **Why**: Time-varying properties represent different states in the workflow\n",
441-
"- **Workflow view**: Each state is a distinct input that produces distinct outputs\n",
442-
"- **Solution**: Model states as separate records (INSERTs), not updates to existing records\n",
443-
"\n",
444-
"**3. Entities with Only Intrinsic Properties (Rules 1 & 2)**\n",
445-
"- **Why**: Properties of different entities are at different nodes in the dependency graph\n",
446-
"- **Workflow view**: Each entity type represents a distinct stage or data type in the pipeline\n",
447-
"- **Solution**: Separate entity types into separate tables to make dependencies explicit\n",
448-
"\n",
449-
"### The Key Insight\n",
450-
"\n",
451-
"> **In workflow-centric databases, referential integrity isn't just about preventing orphaned records—it's about ensuring computational validity.**\n",
452-
"\n",
453-
"Foreign keys don't just link data; they represent **data provenance**:\n",
454-
"- \"This result was computed FROM this input\"\n",
455-
"- \"This analysis is BASED ON this measurement\" \n",
456-
"- \"This conclusion DEPENDS ON this observation\"\n",
457-
"\n",
458-
"When you UPDATE input data but leave outputs unchanged, you break the provenance chain. The outputs claim to be based on inputs that no longer exist (in their original form).\n",
459-
"\n",
460-
"**DataJoint's normalization principles ensure that data dependencies remain explicit and enforceable, making workflows scientifically reproducible and computationally sound.**\n"
461-
]
462-
},
463416
{
464417
"cell_type": "markdown",
465418
"metadata": {},

0 commit comments

Comments
 (0)