You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/00-introduction/00-purpose.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,12 @@ This book is a comprehensive guide to *DataJoint for Python* — a specialized f
13
13
14
14
This book aims to introduce database programming for data science and scientific computing, using DataJoint as the central tool. DataJoint can be viewed as a data model, a database programming language, and a software framework all in one. Throughout the book, you will learn a rigorous database design methodology, which also serves as a general introduction to relational database programming—albeit with a unique perspective. To support this learning, code examples are provided in both DataJoint and SQL (Structured Query Language), the most common language for interacting with relational databases.
15
15
16
+
The principles in this book aren't just academic; they were forged in the crucible of one of the most ambitious data challenges in modern science. My journey began in computer science, but a mid-career PhD in neuroscience led me to confront the reality of large-scale scientific data.
17
+
18
+
In projects like **MICrONS (Machine Intelligence from Cortical Networks)**[@10.1038/s41586-025-08790-w]—a nine-year effort to map a piece of the brain—we faced a deluge of data from electron microscopy, neurophysiology, and animal behavior. The traditional approach of managing data with scripts, spreadsheets, and ad-hoc folder structures simply collapsed under this complexity. It was slow, error-prone, and made collaboration nearly impossible.
19
+
20
+
This challenge led directly to the creation of **DataJoint**, a tool designed to bring the rigor of relational databases to the dynamic and evolving world of scientific research.
21
+
16
22
# The Art of Programming
17
23
18
24
Programming is often thought of as a way to communicate with computers, but it is, more importantly, the art of thinking clearly and communicating precisely with other humans (an now AI agents too) about our intent and approach. Different programming paradigms offer different tools for this communication. While generating valid and efficient instructions for machines is crucial, the primary goal is to write code that humans can easily read, understand, and extend. This is especially important in dynamic, collaborative projects that evolve over time.
@@ -47,10 +53,17 @@ One of the unique advantages of using DataJoint is that practitioners can become
47
53
To make this book a comprehensive introduction to databases, we will also teach the equivalent SQL concepts and syntax alongside DataJoint. Throughout the chapters, you'll find executable examples and clear explanations of how SQL and DataJoint work together. As a result, not only will you learn how to use DataJoint effectively, but you'll also gain a solid foundation in SQL programming.
48
54
49
55
# The Birth of SciOps
56
+
Throughout this book, our goal is to learn how to implement **Scientific Operations (SciOps)**.
57
+
This is the practice of building reliable, efficient, and scalable data workflows.
58
+
Most research begins at "Level 1" maturity with ad-hoc processes.
59
+
By applying the principles of database design, we can progress toward automated, shareable, and eventually AI-enabled pipelines that accelerate discovery.
60
+
50
61
DataJoint plays a crucial role in the operation of scientific projects, fitting into a broader process that coordinates efforts in data acquisition, processing, analysis, visualization, sharing, and publishing. It acts as a foundational building block, helping to transform research labs into efficient data generation machines. DataJoint’s unique strength lies in its ability to dynamically manage the entire data pipeline, including the evolving data structure, code, software dependencies, and collaborative interactions.
51
62
52
63
Recognizing the need for more structured and scalable approaches in scientific research, we recently partnered with other neuroinformatics leaders to define a roadmap for enhancing operations in neuroscience projects. This roadmap is designed to guide research teams from ad hoc processes toward automated and scalable collaborations, enabling them to tackle more significant and complex problems while collaborating more broadly. The ultimate goal is to achieve closed-loop studies that seamlessly integrate human ingenuity with AI efficiency [@10.48550/arXiv.2401.00077].
53
64
65
+
This book provides the foundational database skills to build that ladder, moving your research from fragile scripts to a robust, queryable, and collaborative scientific enterprise.
66
+
54
67
# Focus on Neuroscience
55
68
56
69
While the tools and concepts in this book are applicable to any computationally intensive field, DataJoint has its roots and most widespread applications in systems neuroscience. The development of DataJoint was closely tied to the needs of neuroscience research, and much of the support for this work has come from neuroscience-focused funding sources.
@@ -66,4 +79,4 @@ In this book, we will explore the profound impact of AI on database schema desig
66
79
I, Dimitri Yatsenko, am the principal author of this book, although some of the text is carried over from prior documentation written by our broader team, so my role is both as author and editor.
67
80
I welcome and appreciate your contributions, whether as a reviewer or as a contributor.
68
81
All contributions will be gratefully acknowledged.
69
-
Please feel free to contact me directly or submit an issue in the book's GitHub repository.
82
+
Please feel free to contact me directly or submit an issue in the book's GitHub repository.
To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
4
+
While we will focus on database principles, we assume a working knowledge of the following.
5
+
If you're new to these, we highly recommend exploring MIT's ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/) to get up to speed.
6
+
7
+
### Command-Line Proficiency
8
+
9
+
You'll frequently interact with systems through a terminal or shell. You don't need to be a guru, but you should know how to navigate directories (`cd`), list files (`ls`), and run basic commands. The shell is the universal language for automation and remote computing.
10
+
11
+
### Python Fundamentals
12
+
13
+
Python is our language for interacting with databases. You should understand variables, data types (strings, integers, lists, dictionaries), loops, and functions. We will touch on more advanced concepts like decorators, but a solid foundation is key.
14
+
15
+
### Git and GitHub
16
+
17
+
In collaborative science and software, version control is non-negotiable. We expect you to have a GitHub account and be familiar with the basic workflow: `clone`, `add`, `commit`, and `push`. This is how you'll manage your code and assignments.
18
+
19
+
### Jupyter Notebooks
20
+
21
+
This textbook itself is built using Jupyter. You should know how to launch, navigate, and run code within Jupyter Notebooks or JupyterLab. The concept of "literate programming"—mixing executable code, text, and results—is central to reproducible science.
0 commit comments