Skip to content

Commit 9b815b7

Browse files
reorder chapters in intro and concepts
1 parent a21a6b1 commit 9b815b7

File tree

7 files changed

+497
-7
lines changed

7 files changed

+497
-7
lines changed

book/00-introduction/00-purpose.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,21 @@
22
title: Purpose
33
---
44

5+
## What is DataJoint?
6+
7+
**DataJoint is a computational database language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides data integrity, automated computation, reproducibility, and seamless collaboration through a relational database approach that coordinates relational databases, code repositories, and object storage.
8+
9+
## Who This Book Is For
10+
11+
Scientists and engineers working with data-intensive research—neuroscience, machine learning, bioinformatics, or any field where data complexity demands rigor. We assume you know Python but have never touched databases. By the end of this book, you'll be fluent in both DataJoint and SQL, building robust data workflows using relational databases, Python code, Git, and object storage.
12+
513
## Why This Book Exists
614

715
Most research starts with scripts, spreadsheets, and folder structures—an approach that works until it doesn't. For small projects with a single researcher, these ad-hoc methods suffice. But as data grows and teams expand, the cracks appear: lost data, irreproducible results, and pipelines that break whenever priorities shift.
816

917
This reality hit hard during **MICrONS (Machine Intelligence from Cortical Networks)** [@10.1038/s41586-025-08790-w], a nine-year effort to map brain circuitry that generated petabytes of data from electron microscopy, neurophysiology, and behavior. Traditional methods collapsed under this complexity. The project demanded something better: a framework that could maintain data integrity, track computational provenance, and enable a large team to collaborate effectively.
1018

11-
That framework was **DataJoint**a tool that brings the rigor of relational databases to the dynamic, evolving world of scientific research. This book teaches you to build the same kind of robust, scalable data workflows, whether you're processing terabytes or gigabytes, working solo or in a team.
19+
That framework was **DataJoint**the tool that brings the rigor of relational databases to the dynamic, evolving world of scientific research. This book teaches you to build the same kind of robust, scalable data workflows, whether you're processing terabytes or gigabytes, working solo or in a team.
1220

1321
```{admonition} Key Innovation
1422
DataJoint treats computational dependencies as a first-class feature of the database. You define not just data structures, but entire processing pipelines—from raw inputs through intermediate steps to final results. Every computation is trackable, reproducible, and automatically managed. [@10.48550/arXiv.1807.11104]
@@ -27,7 +35,7 @@ Traditional databases store and retrieve data. DataJoint does that too, but it a
2735

2836
This workflow perspective shapes everything:
2937

30-
**Schema as Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
38+
**Schema as a Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
3139

3240
**Intelligent Diagrams**: Different table types get distinct visual styles. One glance tells you what's manual, what's automatic, and how everything connects.
3341

@@ -51,9 +59,6 @@ DataJoint emphasizes clarity: your database structure should directly reflect yo
5159

5260
This book provides the skills to transform research operations: from fragile scripts to robust, queryable, collaborative systems. Not because you need enterprise-scale infrastructure, but because clear thinking and good design make science better.
5361

54-
## Who This Book Is For
55-
56-
Scientists and engineers working with data-intensive research—neuroscience, machine learning, bioinformatics, or any field where data complexity demands rigor. We assume you know Python but have never touched databases. By the end, you'll be fluent in both DataJoint and SQL.
5762

5863
## DataJoint and SQL: Two Languages, One Foundation
5964

book/00-introduction/20-prerequisites.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Prerequisites and Essential Skills
22

3-
To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
3+
This book teaches DataJoint and SQL for scientific data workflows. To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
44
While we will focus on database principles, we assume a working knowledge of the following.
55
If you're new to these, we highly recommend exploring MIT's ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/) to get up to speed.
66

book/00-introduction/01-history.md renamed to book/00-introduction/25-history.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: DataJoint History
33
---
44

5+
DataJoint's evolution from lab tool to commercial platform reflects its growing role in scientific research.
56

67
```{image} ../images/cave-art.jpg
78
---

book/00-introduction/49-connect.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
"- password: `devpass`\n",
2424
"\n",
2525
"\n",
26-
"These These credentials are not secret since this database is not exposed to the external world.\n",
26+
"These credentials are not secret since this database is not exposed to the external world.\n",
2727
"\n",
2828
"In this container environment, the user credentials are set in as the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS` and that's what the DataJoint client library will use to connect to the database.\n"
2929
]

book/20-concepts/00-databases.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,9 @@ Since its meteoric rise between 2008 and 2015, the term "NoSQL" has gradually fa
8585
The website [DB-Engines Ranking](https://db-engines.com/en/ranking) tracks the popularity of various database management systems. While the relational data model continues to dominate, many popular databases now support multiple data models, allowing for deviations from strict relational structures.
8686

8787
Notably, the two most popular open-source relational databases, MySQL (along with its sister MariaDB) and PostgreSQL, remain at the forefront of this evolving landscape.
88+
89+
## Preview: DataJoint and This Book
90+
91+
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
92+
93+
We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.

0 commit comments

Comments
 (0)