Skip to content

Commit a0ade0e

Browse files
enhance the Data Models chapter
1 parent b848ca9 commit a0ade0e

File tree

1 file changed

+42
-7
lines changed

1 file changed

+42
-7
lines changed

book/20-concepts/00-models.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@ A data model is defined by considering the following key aspects:
1717
* What mechanisms exist to enforce the structure and rules governing valid data interactions?
1818
```
1919

20-
Innovations in data models have spurred progress by creating new mental tools for us to think about data and to communicate with machines and with each other.
21-
Scientists and engineers who become well-versed in effective data models can collaborate more efficiently because they share a common conceptual framework.
20+
Innovations in data models have spurred progress by creating new mental tools for us to think about data and to communicate with machines and with each other. Scientists and engineers who become well-versed in effective data models can collaborate more efficiently because they share a common conceptual framework.
21+
22+
**This book focuses on DataJoint**, a data model that reinterprets relational database theory through the lens of **computational workflows**. While rooted in classical relational concepts, DataJoint introduces new constructs and semantics specifically designed for scientific computing, where tracking provenance and maintaining computational validity are as important as storing and querying data.
2223

2324
:::{hint} What data models do you already know?
2425
Before moving forward, take a moment to consider the different data models you're already familiar with. Perhaps you've worked with a spreadsheet, a database, or a programming language but didn't know that they were distinct data models.
@@ -134,7 +135,10 @@ DataFrames support a wide range of operations, making them a powerful tool for d
134135
DataFrames have become an essential tool in modern data analysis, providing a structured yet flexible way to handle and manipulate data. Their ability to work with heterogeneous data types, combined with a rich set of operations, makes them ideal for tasks ranging from simple data exploration to complex data transformations and machine learning preparation. Whether in Python, R, or Julia, DataFrames have become a cornerstone of data science workflows.
135136

136137
## Example: Relational Data Model
137-
The rest of this book is about the relational data model and we introduce it properly in following sections.
138+
139+
The **relational data model**, introduced by Edgar F. Codd in 1970, revolutionized data management by organizing data into tables (relations) with well-defined relationships. This model emphasizes data integrity, consistency, and powerful query capabilities through a formal mathematical foundation.
140+
141+
The rest of this book focuses on the relational model, but specifically through **DataJoint**—a modern interpretation that extends classical relational theory to explicitly support computational workflows. We introduce these concepts properly in following sections.
138142

139143
## Example: Document Data Model (JSON and BSON)
140144

@@ -386,11 +390,42 @@ Structured models, which come with predefined schemas, allow the organization of
386390
As studies progress and new insights are gained, schemas can be adjusted to reflect the emerging structure and logic of the study.
387391
This approach not only ensures consistency and integrity but also simplifies data sharing and publication.
388392

389-
## DataJoint supports structured databases in research
393+
However, traditional structured databases, designed primarily for business transactions, don't naturally capture the **computational workflows** central to scientific research. Science isn't just about storing data—it's about transforming raw observations into analyzed results through defined processing steps. This is where DataJoint's reinterpretation of relational databases becomes essential.
394+
395+
## Example: DataJoint—Relational Databases as Computational Workflows
396+
397+
**DataJoint** represents a distinctive reinterpretation of the relational data model, specifically designed for scientific computing. While built on Codd's relational theory, DataJoint introduces a fundamentally different perspective: **databases as computational workflows that mix manual and automated steps**.
398+
399+
### Key Innovation: Workflows, Not Just Storage
400+
401+
Traditional relational databases focus on **storing and retrieving data**. DataJoint extends this to explicitly model **computational pipelines** with provenance tracking and computational validity. What makes DataJoint unique is treating the **schema itself as the workflow specification**.
402+
403+
```
404+
Session (manual)
405+
406+
Recording (imported from instruments)
407+
408+
FilteredRecording (computed automatically)
409+
410+
SpikeSorting (computed automatically)
411+
412+
NeuronStatistics (computed automatically)
413+
```
414+
415+
This isn't just documentation—it's the actual dependency structure enforced by the database. When upstream data changes, downstream results **must** be recomputed or deleted. There's no way to silently invalidate the pipeline.
416+
This transforms ad-hoc research workflows into **rigorous, reproducible scientific operations**, bridging the gap between exploratory science and production-grade data management.
417+
418+
### DataJoint as a New Data Model
419+
420+
While rooted in relational theory, DataJoint constitutes a **distinct data model** because it:
421+
422+
1. **Redefines the purpose**: From data storage → workflow specification
423+
2. **Introduces new constructs**: Table types with computational roles (Manual, Imported, Computed, Lookup)
424+
3. **Changes operational semantics**: Immutability as the default, UPDATE as exception
425+
4. **Adds new operations**: `populate()` for automatic computation
426+
5. **Enforces new constraints**: Computational validity, not just relational consistency
390427

391-
DataJoint is dedicate to the idea that data discipline must start early in science projects, even in fast-evolving phases of research, for explicitly structuring data to continuously maintain data integrity and consistency.
392-
Structureed data is essential for effective collaboration while still allowing the data to adapt quickly as the project progresses.
393-
By adopting structured data models that are flexible enough to evolve, scientists can enjoy the best of both worlds—retaining the freedom to explore and experiment while ensuring that their data remains organized, consistent, and ready for dissemination.
428+
DataJoint demonstrates that data discipline can start early in research projects, even during fast-evolving exploratory phases. By providing structured, workflow-aware data management that can evolve alongside the science, DataJoint offers researchers the best of both worlds: the freedom to explore and the rigor to ensure their findings remain valid and reproducible.
394429

395430
# Exercises
396431

0 commit comments

Comments
 (0)