You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/20-concepts/00-models.md
+42-7Lines changed: 42 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,8 +17,9 @@ A data model is defined by considering the following key aspects:
17
17
* What mechanisms exist to enforce the structure and rules governing valid data interactions?
18
18
```
19
19
20
-
Innovations in data models have spurred progress by creating new mental tools for us to think about data and to communicate with machines and with each other.
21
-
Scientists and engineers who become well-versed in effective data models can collaborate more efficiently because they share a common conceptual framework.
20
+
Innovations in data models have spurred progress by creating new mental tools for us to think about data and to communicate with machines and with each other. Scientists and engineers who become well-versed in effective data models can collaborate more efficiently because they share a common conceptual framework.
21
+
22
+
**This book focuses on DataJoint**, a data model that reinterprets relational database theory through the lens of **computational workflows**. While rooted in classical relational concepts, DataJoint introduces new constructs and semantics specifically designed for scientific computing, where tracking provenance and maintaining computational validity are as important as storing and querying data.
22
23
23
24
:::{hint} What data models do you already know?
24
25
Before moving forward, take a moment to consider the different data models you're already familiar with. Perhaps you've worked with a spreadsheet, a database, or a programming language but didn't know that they were distinct data models.
@@ -134,7 +135,10 @@ DataFrames support a wide range of operations, making them a powerful tool for d
134
135
DataFrames have become an essential tool in modern data analysis, providing a structured yet flexible way to handle and manipulate data. Their ability to work with heterogeneous data types, combined with a rich set of operations, makes them ideal for tasks ranging from simple data exploration to complex data transformations and machine learning preparation. Whether in Python, R, or Julia, DataFrames have become a cornerstone of data science workflows.
135
136
136
137
## Example: Relational Data Model
137
-
The rest of this book is about the relational data model and we introduce it properly in following sections.
138
+
139
+
The **relational data model**, introduced by Edgar F. Codd in 1970, revolutionized data management by organizing data into tables (relations) with well-defined relationships. This model emphasizes data integrity, consistency, and powerful query capabilities through a formal mathematical foundation.
140
+
141
+
The rest of this book focuses on the relational model, but specifically through **DataJoint**—a modern interpretation that extends classical relational theory to explicitly support computational workflows. We introduce these concepts properly in following sections.
138
142
139
143
## Example: Document Data Model (JSON and BSON)
140
144
@@ -386,11 +390,42 @@ Structured models, which come with predefined schemas, allow the organization of
386
390
As studies progress and new insights are gained, schemas can be adjusted to reflect the emerging structure and logic of the study.
387
391
This approach not only ensures consistency and integrity but also simplifies data sharing and publication.
388
392
389
-
## DataJoint supports structured databases in research
393
+
However, traditional structured databases, designed primarily for business transactions, don't naturally capture the **computational workflows** central to scientific research. Science isn't just about storing data—it's about transforming raw observations into analyzed results through defined processing steps. This is where DataJoint's reinterpretation of relational databases becomes essential.
394
+
395
+
## Example: DataJoint—Relational Databases as Computational Workflows
396
+
397
+
**DataJoint** represents a distinctive reinterpretation of the relational data model, specifically designed for scientific computing. While built on Codd's relational theory, DataJoint introduces a fundamentally different perspective: **databases as computational workflows that mix manual and automated steps**.
398
+
399
+
### Key Innovation: Workflows, Not Just Storage
400
+
401
+
Traditional relational databases focus on **storing and retrieving data**. DataJoint extends this to explicitly model **computational pipelines** with provenance tracking and computational validity. What makes DataJoint unique is treating the **schema itself as the workflow specification**.
402
+
403
+
```
404
+
Session (manual)
405
+
↓
406
+
Recording (imported from instruments)
407
+
↓
408
+
FilteredRecording (computed automatically)
409
+
↓
410
+
SpikeSorting (computed automatically)
411
+
↓
412
+
NeuronStatistics (computed automatically)
413
+
```
414
+
415
+
This isn't just documentation—it's the actual dependency structure enforced by the database. When upstream data changes, downstream results **must** be recomputed or deleted. There's no way to silently invalidate the pipeline.
416
+
This transforms ad-hoc research workflows into **rigorous, reproducible scientific operations**, bridging the gap between exploratory science and production-grade data management.
417
+
418
+
### DataJoint as a New Data Model
419
+
420
+
While rooted in relational theory, DataJoint constitutes a **distinct data model** because it:
421
+
422
+
1.**Redefines the purpose**: From data storage → workflow specification
423
+
2.**Introduces new constructs**: Table types with computational roles (Manual, Imported, Computed, Lookup)
424
+
3.**Changes operational semantics**: Immutability as the default, UPDATE as exception
425
+
4.**Adds new operations**: `populate()` for automatic computation
426
+
5.**Enforces new constraints**: Computational validity, not just relational consistency
390
427
391
-
DataJoint is dedicate to the idea that data discipline must start early in science projects, even in fast-evolving phases of research, for explicitly structuring data to continuously maintain data integrity and consistency.
392
-
Structureed data is essential for effective collaboration while still allowing the data to adapt quickly as the project progresses.
393
-
By adopting structured data models that are flexible enough to evolve, scientists can enjoy the best of both worlds—retaining the freedom to explore and experiment while ensuring that their data remains organized, consistent, and ready for dissemination.
428
+
DataJoint demonstrates that data discipline can start early in research projects, even during fast-evolving exploratory phases. By providing structured, workflow-aware data management that can evolve alongside the science, DataJoint offers researchers the best of both worlds: the freedom to explore and the rigor to ensure their findings remain valid and reproducible.
0 commit comments