Skip to content

Commit 469a64d

Browse files
reorder chapters in Design
1 parent 202e70d commit 469a64d

File tree

3 files changed

+195
-56
lines changed

3 files changed

+195
-56
lines changed
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
---
2+
title: Data Integrity
3+
date: 2025-10-31
4+
authors:
5+
- name: Dimitri Yatsenko
6+
---
7+
8+
# Why Data Integrity Matters
9+
10+
Imagine a neuroscience lab where recording sessions are tracked in a database. Without proper safeguards, you might encounter:
11+
- An experiment record pointing to a non-existent mouse
12+
- Two different experiments claiming the same unique identifier
13+
- A recording session missing its timestamp
14+
- Concurrent processes writing conflicting data simultaneously
15+
16+
Each scenario represents a failure of **data integrity** — the database's ability to maintain accurate, consistent, and reliable data that faithfully represents reality.
17+
18+
```{card} The Challenge
19+
**Data Integrity** is the ability of a database to define, express, and enforce rules for valid data states and transformations.
20+
^^^
21+
22+
Scientific databases face unique challenges:
23+
- **Multiple users** entering data concurrently
24+
- **Long-running experiments** generating data over months or years
25+
- **Complex relationships** between experimental entities
26+
- **Evolving protocols** requiring schema updates
27+
- **Collaborative teams** with different data entry practices
28+
29+
Without robust integrity mechanisms, these challenges lead to:
30+
- Invalid or incomplete data entry
31+
- Loss of data during updates
32+
- Unwarranted alteration of historical records
33+
- Misidentification or mismatch of experimental subjects
34+
- Data duplication across tables
35+
- Broken references between related datasets
36+
```
37+
38+
# From Real-World Rules to Database Constraints
39+
40+
The core challenge of database design is translating organizational rules into enforceable constraints. Consider a simple example:
41+
42+
**Lab Rule:** "Each mouse must have a unique ID, and every recording session must reference a valid mouse."
43+
44+
**Database Implementation:**
45+
- Mouse table with **primary key** constraint (entity integrity)
46+
- RecordingSession table with **foreign key** to Mouse (referential integrity)
47+
- Mouse ID **cannot be null** (completeness)
48+
- Recording timestamp **must be datetime type** (domain integrity)
49+
50+
Relational databases excel at expressing and enforcing such rules through **integrity constraints** — declarative rules that the database automatically enforces.
51+
52+
# Types of Data Integrity Constraints
53+
54+
This section introduces six fundamental types of integrity constraints. Each will be covered in detail in subsequent chapters, with DataJoint implementation examples.
55+
56+
## 1. Domain Integrity
57+
**Ensures values are within valid ranges and types.**
58+
59+
Domain integrity restricts attribute values to predefined valid sets using:
60+
- **Data types**: `int`, `float`, `varchar`, `date`, `enum`
61+
- **Range constraints**: `unsigned`, `decimal(10,2)`
62+
- **Pattern matching**: Regular expressions for formatted strings
63+
64+
**Example:** Recording temperature must be between 20-25°C.
65+
66+
**Covered in:** [Tables](015-table.ipynb) — Data type specification
67+
68+
---
69+
70+
## 2. Completeness
71+
**Guarantees required data is present.**
72+
73+
Completeness prevents missing values that could invalidate analyses:
74+
- **Required fields** cannot be left empty (non-nullable)
75+
- **Default values** provide sensible fallbacks
76+
- **NOT NULL constraints** enforce data presence
77+
78+
**Example:** Every experiment must have a start date.
79+
80+
**Covered in:**
81+
- [Tables](015-table.ipynb) — Required vs. optional attributes
82+
- [Default Values](020-default-values.ipynb) — Handling optional data
83+
84+
---
85+
86+
## 3. Entity Integrity
87+
**Each entity has a unique, reliable identifier.**
88+
89+
Entity integrity ensures one-to-one mapping between database records and real-world entities:
90+
- **Primary keys** uniquely identify each row
91+
- **Uniqueness constraints** prevent duplicates
92+
- **Identification strategies** (auto-increment, UUIDs, natural keys)
93+
94+
**Example:** Each mouse has exactly one unique ID.
95+
96+
**Covered in:**
97+
- [Primary Keys](025-primary-key.md) — Identification strategies
98+
- [UUID](030-uuid.ipynb) — Universally unique identifiers
99+
100+
---
101+
102+
## 4. Referential Integrity
103+
**Relationships between entities remain consistent.**
104+
105+
Referential integrity maintains logical associations across tables:
106+
- **Foreign keys** link related records
107+
- **Cascade operations** propagate changes
108+
- **Referential constraints** prevent orphaned records
109+
110+
**Example:** A recording session cannot reference a non-existent mouse.
111+
112+
**Covered in:**
113+
- [Foreign Keys](035-foreign-keys.ipynb) — Cross-table relationships
114+
- [Relationships](050-relationships.ipynb) — Dependency patterns
115+
116+
---
117+
118+
## 5. Compositional Integrity
119+
**Complex entities remain complete with all parts.**
120+
121+
Compositional integrity ensures multi-part entities are never partially stored:
122+
- **Transactions** bundle multiple operations
123+
- **Atomicity** guarantees all-or-nothing completion
124+
- **Part tables** maintain parent-child relationships
125+
126+
**Example:** An imaging session's metadata and all acquired frames are stored together or not at all.
127+
128+
**Covered in:**
129+
- [Part Tables](055-part-tables.ipynb) — Hierarchical compositions
130+
- [Transactions](../operations/045-transactions.ipynb) — Atomic operations
131+
132+
---
133+
134+
## 6. Consistency
135+
**All users see the same valid data state.**
136+
137+
Consistency provides a unified view during concurrent access:
138+
- **Isolation levels** control transaction visibility
139+
- **Locking mechanisms** prevent conflicting updates
140+
- **ACID properties** guarantee reliable state transitions
141+
142+
**Example:** Two researchers inserting experiments simultaneously don't create duplicates.
143+
144+
**Covered in:**
145+
- [Concurrency](../operations/050-concurrency.ipynb) — Multi-user operations
146+
- [Transactions](../operations/045-transactions.ipynb) — ACID guarantees
147+
148+
# The Power of Declarative Constraints
149+
150+
Unlike application-level validation (checking rules in Python code), database constraints are:
151+
152+
1. **Always enforced** — Cannot be bypassed by any application
153+
2. **Automatically checked** — No developer implementation needed
154+
3. **Concurrent-safe** — Work correctly with multiple users
155+
4. **Self-documenting** — Schema explicitly declares rules
156+
5. **Performance-optimized** — Database engine enforces efficiently
157+
158+
**Example Contrast:**
159+
160+
```python
161+
# Application-level (fragile)
162+
if mouse_id not in existing_mice:
163+
raise ValueError("Invalid mouse ID")
164+
# Can be bypassed by other applications
165+
166+
# Database-level (robust)
167+
# RecordingSession.mouse → FOREIGN KEY → Mouse.mouse_id
168+
# Automatically enforced for all applications
169+
```
170+
171+
# DataJoint's Approach to Integrity
172+
173+
DataJoint builds on SQL's integrity mechanisms with additional features:
174+
175+
- **Automatic foreign keys** from table dependencies
176+
- **Cascading deletes** that respect data pipelines
177+
- **Transaction management** for atomic operations
178+
- **Schema validation** catching errors before database creation
179+
- **Entity relationships** expressed in intuitive Python syntax
180+
181+
As you progress through the following chapters, you'll see how DataJoint implements each integrity type through concise, expressive table declarations.
182+
183+
---
184+
185+
```{admonition} Next Steps
186+
:class: tip
187+
188+
Now that you understand *why* integrity matters, the following chapters show *how* to implement each constraint type:
189+
190+
1. **[Tables](015-table.ipynb)** — Basic structure with domain integrity
191+
2. **[Primary Keys](025-primary-key.md)** — Entity integrity through unique identification
192+
3. **[Foreign Keys](035-foreign-keys.ipynb)** — Referential integrity across tables
193+
194+
Each chapter builds on these foundational integrity concepts.
195+
```

book/30-database-design/040-integrity.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

0 commit comments

Comments
 (0)