This repository was archived by the owner on Jan 21, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathcompute.tex
More file actions
72 lines (58 loc) · 4.25 KB
/
compute.tex
File metadata and controls
72 lines (58 loc) · 4.25 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
\subsection*{Entering and computing data}
DataJoint distinguishes between \emph{manual} and \emph{automated} base relations.
In the entity relationship diagram (Fig.\ \ref{schema}\,B), manual base relations are displayed as green nodes whereas automated base relations are displayed in red.
Manual base relations contain data entered by the experimenter or by acquisition software.
They store data that are derived from external sources and are typically at the head of the dependency hierarchy.
Users commonly edit manual relations directly in the form of a spreadsheet using third-party interfaces such as MySQL Workbench, Navicat, SequelPro, HeidiSQL, and others.
Automated base relations are filled automatically from MATLAB or Python with the help of their \matlab{populate} method.
For example, Figures \ref{power-m} and \ref{power-py} list the complete implementation of the \matlab{Power} base relation, which computes the average power of the LFP signal for various frequency bands in our schema (Fig.\ \ref{schema}\,B).
\begin{figure*}
\ifthenelse{\boolean{plosone}}{
\begin{adjustwidth}{-2.25in}{0in}
}{}
\inputminted[frame=single,linenos=true]{matlab}{Power.m}
\caption{The class for the base relation \matlab{Power} in MATLAB.}
\label{power-m}
\ifthenelse{\boolean{plosone}}{
\end{adjustwidth}
}{}
\end{figure*}
\begin{figure*}
\ifthenelse{\boolean{plosone}}{
\begin{adjustwidth}{-2.25in}{0in}
}{}
\inputminted[frame=single,linenos=true]{python}{power.py}
\caption{The class for the base relation \matlab{Power} in Python.}
\label{power-py}
\ifthenelse{\boolean{plosone}}{
\end{adjustwidth}
}{}
\end{figure*}
Execution of the following commands will fill \matlab{Power} for all available data.
\begin{minted}{python}
rel = Power()
rel.populate()
\end{minted}
The \matlab{populate} method always ``knows'' what needs to be computed using the base relation's dependencies.
It compares the contents of the base relation to those of its immediate neighbors upstream in the dependency hierarchy.
The job list is defined as the join of the immediate upstream neighbors of the populated relation \emph{minus} the population itself.
For example, \matlab{Power} depends on \matlab{LFP} and \matlab{FrequencyBand}.
Then the restricted join
\mint{matlab}|missing = LFP() * FrequencyBand() - Power()|
will express all combinations of tuples in \matlab{LFP} and \matlab{FrequencyBand} for which \matlab{Power} does not yet have any entries.
Each tuple in \matlab{missing} specifies an isolated job to be performed for \matlab{Power}.
This logic is implemented internally and is provided here only to help users understand what happens under the hood of a \matlab{populate} call.
When \matlab{rel.populate()} is called, it executes \matlab{rel.makeTuples(key)} (in MATLAB) or \matlab{rel._make_tuples(key)} (in Python) for the primary key value of each tuple in \matlab{missing}.
Users specify the computation of new tuples for each item in \matlab{missing} using the \emph{make-tuples} callback method (Figures \ref{power-m} or \ref{power-py}),
which consists of three parts:
\begin{enumerate}
\item fetch the required data from other relations upstream in the dependency hierarchy, always restricting by the argument \matlab{key},
\item use fetched data to compute attributes of the relation that are missing in \matlab{key},
\item create new tuples that combine the newly computed attributes and \matlab{key} and submit them to the database using the \matlab{insert} method.
\end{enumerate}
Each \emph{make-tuples} call runs inside an isolated \emph{transaction} so that its results do not become visible to other processes until the entire call completes successfully.
If an error occurs during a \emph{make-tuples} call, any partially populated data are discarded and never become visible to downstream computations.
The \matlab{populate} method has several options to control its behavior.
In particular, it has the option of using the built-in \emph{job reservation} process to enable efficient distributed execution.
With job reservation enabled, users simply execute \matlab{populate} on multiple computers to run the \emph{make-tuples} jobs in parallel without conflicts.
Please refer to the online documentation for \matlab{populate} options and various techniques for customizing the processing chains.