You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/00-introduction/00-purpose.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,13 +2,21 @@
2
2
title: Purpose
3
3
---
4
4
5
+
## What is DataJoint?
6
+
7
+
**DataJoint is a computational database language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides data integrity, automated computation, reproducibility, and seamless collaboration through a relational database approach that coordinates relational databases, code repositories, and object storage.
8
+
9
+
## Who This Book Is For
10
+
11
+
Scientists and engineers working with data-intensive research—neuroscience, machine learning, bioinformatics, or any field where data complexity demands rigor. We assume you know Python but have never touched databases. By the end of this book, you'll be fluent in both DataJoint and SQL, building robust data workflows using relational databases, Python code, Git, and object storage.
12
+
5
13
## Why This Book Exists
6
14
7
15
Most research starts with scripts, spreadsheets, and folder structures—an approach that works until it doesn't. For small projects with a single researcher, these ad-hoc methods suffice. But as data grows and teams expand, the cracks appear: lost data, irreproducible results, and pipelines that break whenever priorities shift.
8
16
9
17
This reality hit hard during **MICrONS (Machine Intelligence from Cortical Networks)**[@10.1038/s41586-025-08790-w], a nine-year effort to map brain circuitry that generated petabytes of data from electron microscopy, neurophysiology, and behavior. Traditional methods collapsed under this complexity. The project demanded something better: a framework that could maintain data integrity, track computational provenance, and enable a large team to collaborate effectively.
10
18
11
-
That framework was **DataJoint**—a tool that brings the rigor of relational databases to the dynamic, evolving world of scientific research. This book teaches you to build the same kind of robust, scalable data workflows, whether you're processing terabytes or gigabytes, working solo or in a team.
19
+
That framework was **DataJoint**—the tool that brings the rigor of relational databases to the dynamic, evolving world of scientific research. This book teaches you to build the same kind of robust, scalable data workflows, whether you're processing terabytes or gigabytes, working solo or in a team.
12
20
13
21
```{admonition} Key Innovation
14
22
DataJoint treats computational dependencies as a first-class feature of the database. You define not just data structures, but entire processing pipelines—from raw inputs through intermediate steps to final results. Every computation is trackable, reproducible, and automatically managed. [@10.48550/arXiv.1807.11104]
@@ -27,7 +35,7 @@ Traditional databases store and retrieve data. DataJoint does that too, but it a
27
35
28
36
This workflow perspective shapes everything:
29
37
30
-
**Schema as Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
38
+
**Schema as a Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
31
39
32
40
**Intelligent Diagrams**: Different table types get distinct visual styles. One glance tells you what's manual, what's automatic, and how everything connects.
33
41
@@ -51,9 +59,6 @@ DataJoint emphasizes clarity: your database structure should directly reflect yo
51
59
52
60
This book provides the skills to transform research operations: from fragile scripts to robust, queryable, collaborative systems. Not because you need enterprise-scale infrastructure, but because clear thinking and good design make science better.
53
61
54
-
## Who This Book Is For
55
-
56
-
Scientists and engineers working with data-intensive research—neuroscience, machine learning, bioinformatics, or any field where data complexity demands rigor. We assume you know Python but have never touched databases. By the end, you'll be fluent in both DataJoint and SQL.
57
62
58
63
## DataJoint and SQL: Two Languages, One Foundation
Copy file name to clipboardExpand all lines: book/00-introduction/20-prerequisites.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Prerequisites and Essential Skills
2
2
3
-
To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
3
+
This book teaches DataJoint and SQL for scientific data workflows. To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
4
4
While we will focus on database principles, we assume a working knowledge of the following.
5
5
If you're new to these, we highly recommend exploring MIT's ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/) to get up to speed.
Copy file name to clipboardExpand all lines: book/00-introduction/49-connect.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@
23
23
"- password: `devpass`\n",
24
24
"\n",
25
25
"\n",
26
-
"These These credentials are not secret since this database is not exposed to the external world.\n",
26
+
"These credentials are not secret since this database is not exposed to the external world.\n",
27
27
"\n",
28
28
"In this container environment, the user credentials are set in as the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS` and that's what the DataJoint client library will use to connect to the database.\n"
Copy file name to clipboardExpand all lines: book/20-concepts/00-databases.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,3 +85,9 @@ Since its meteoric rise between 2008 and 2015, the term "NoSQL" has gradually fa
85
85
The website [DB-Engines Ranking](https://db-engines.com/en/ranking) tracks the popularity of various database management systems. While the relational data model continues to dominate, many popular databases now support multiple data models, allowing for deviations from strict relational structures.
86
86
87
87
Notably, the two most popular open-source relational databases, MySQL (along with its sister MariaDB) and PostgreSQL, remain at the forefront of this evolving landscape.
88
+
89
+
## Preview: DataJoint and This Book
90
+
91
+
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
92
+
93
+
We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.
0 commit comments