Skip to content

Commit 3260b96

Browse files
add data operations sections
1 parent e3ee752 commit 3260b96

File tree

10 files changed

+466
-66
lines changed

10 files changed

+466
-66
lines changed

book/30-schema-design/010-table.ipynb

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,20 @@
66
"source": [
77
"# Create a Table\n",
88
"\n",
9-
"In DataJoint, declaring individual tables is the foundational step in building your data pipeline. Each table corresponds to a specific entity or data structure that you want to model within your database. This tutorial will guide you through the basics of declaring individual tables, covering essential components like primary keys, attributes, and basic definitions."
9+
"Declaring individual tables is the foundational step in building your data pipeline. Each table corresponds to a specific entity or data structure that you want to model within your database. This tutorial will guide you through the basics of declaring individual tables, covering essential components like primary keys, attributes, and basic definitions."
1010
]
1111
},
1212
{
1313
"cell_type": "markdown",
1414
"metadata": {},
1515
"source": [
1616
"# Schema Declaration\n",
17-
"Before declaring tables, you need to declare a schema which is a namespace for your tables, giving it a unique name.\n",
18-
"\n",
19-
"The schema groups related tables together and avoids naming conflicts."
17+
"As described in th previous secton, we must first declare a schema object that creates the database schema, a namespace within the current database. Let's define the schema named `\"tutorial\"`."
2018
]
2119
},
2220
{
2321
"cell_type": "code",
24-
"execution_count": 1,
22+
"execution_count": null,
2523
"metadata": {},
2624
"outputs": [
2725
{
@@ -35,9 +33,7 @@
3533
],
3634
"source": [
3735
"import datajoint as dj\n",
38-
"\n",
39-
"# Define the schema\n",
40-
"schema = dj.Schema('my_schema')"
36+
"schema = dj.Schema('tutorial')"
4137
]
4238
},
4339
{
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"---\n",
8+
"title: Schema Examples\n",
9+
"date: 2025-01-11\n",
10+
"authors:\n",
11+
" - name: Dimitri Yatsenko\n",
12+
"---\n",
13+
"\n",
14+
"In this section, we present several well-designed schemas, populated with data that are used in examples throughout the book."
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": []
21+
}
22+
],
23+
"metadata": {
24+
"language_info": {
25+
"name": "python"
26+
}
27+
},
28+
"nbformat": 4,
29+
"nbformat_minor": 2
30+
}

book/35-example-designs/000-example-designs.md

Lines changed: 0 additions & 8 deletions
This file was deleted.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Insert \n",
8+
"\n",
9+
"(This is an AI-generated placeholder -- to be updated soon.)\n",
10+
"\n",
11+
"DataJoint provides two primary commands for adding data to tables: `insert` and `insert1`. Both commands are essential for populating tables while ensuring data integrity, but they are suited for different scenarios depending on the quantity and structure of the data being inserted.\n",
12+
"\n",
13+
"## Overview of `insert1`\n",
14+
"\n",
15+
"The `insert1` command is used for adding a single row of data to a table. It expects a dictionary where each key corresponds to a table attribute and the associated value represents the data to be inserted.\n",
16+
"\n",
17+
"### Syntax\n",
18+
"\n",
19+
"```python\n",
20+
"<Table>.insert1(data, ignore_extra_fields=False)\n",
21+
"```\n",
22+
"\n",
23+
"### Parameters\n",
24+
"\n",
25+
"1. **`data`**: A dictionary representing a single row of data, with keys matching the table's attributes.\n",
26+
"2. **`ignore_extra_fields`** *(default: False)*:\n",
27+
" - If `True`, attributes in the dictionary that are not part of the table schema are ignored.\n",
28+
" - If `False`, the presence of extra fields will result in an error.\n",
29+
"\n",
30+
"### Example\n",
31+
"\n",
32+
"```python\n",
33+
"import datajoint as dj\n",
34+
"\n",
35+
"schema = dj.Schema('example_schema')\n",
36+
"\n",
37+
"@schema\n",
38+
"class Animal(dj.Manual):\n",
39+
" definition = \"\"\"\n",
40+
" animal_id: int # Unique identifier for the animal\n",
41+
" ---\n",
42+
" species: varchar(64) # Species of the animal\n",
43+
" age: int # Age of the animal in years\n",
44+
" \"\"\"\n",
45+
"\n",
46+
"# Insert a single row into the Animal table\n",
47+
"Animal.insert1({\n",
48+
" 'animal_id': 1,\n",
49+
" 'species': 'Dog',\n",
50+
" 'age': 5\n",
51+
"})\n",
52+
"```\n",
53+
"\n",
54+
"### Key Points\n",
55+
"\n",
56+
"- `insert1` is ideal for inserting a single, well-defined record.\n",
57+
"- It ensures clarity when adding individual entries, reducing ambiguity in debugging.\n",
58+
"\n",
59+
"## Overview of `insert`\n",
60+
"\n",
61+
"The `insert` command is designed for batch insertion, allowing multiple rows to be added in a single operation. It accepts a list of dictionaries, where each dictionary represents a single row of data.\n",
62+
"\n",
63+
"### Syntax\n",
64+
"\n",
65+
"```python\n",
66+
"<Table>.insert(data, ignore_extra_fields=False, skip_duplicates=False)\n",
67+
"```\n",
68+
"\n",
69+
"### Parameters\n",
70+
"\n",
71+
"1. **`data`**: A list of dictionaries, where each dictionary corresponds to a row of data to insert.\n",
72+
"2. **`ignore_extra_fields`** *(default: False)*:\n",
73+
" - If `True`, any extra keys in the dictionaries are ignored.\n",
74+
" - If `False`, extra keys result in an error.\n",
75+
"3. **`skip_duplicates`** *(default: False)*:\n",
76+
" - If `True`, rows with duplicate primary keys are skipped.\n",
77+
" - If `False`, duplicate rows trigger an error.\n",
78+
"\n",
79+
"### Example\n",
80+
"\n",
81+
"```python\n",
82+
"# Insert multiple rows into the Animal table\n",
83+
"Animal.insert([\n",
84+
" {'animal_id': 2, 'species': 'Cat', 'age': 3},\n",
85+
" {'animal_id': 3, 'species': 'Rabbit', 'age': 2}\n",
86+
"])\n",
87+
"```\n",
88+
"\n",
89+
"### Key Points\n",
90+
"\n",
91+
"- `insert` is efficient for adding multiple records in a single operation.\n",
92+
"- Use `skip_duplicates=True` to gracefully handle re-insertions of existing data.\n",
93+
"\n",
94+
"## Best Practices\n",
95+
"\n",
96+
"1. **Use ****`insert1`**** for Single Rows**: Prefer `insert1` when working with individual entries to maintain clarity.\n",
97+
"2. **Validate Data Consistency**: Ensure the input data adheres to the schema definition.\n",
98+
"3. **Batch Insert for Performance**: Use `insert` for larger datasets to minimize database interactions.\n",
99+
"4. **Handle Extra Fields Carefully**: Use `ignore_extra_fields=False` to detect unexpected keys.\n",
100+
"5. **Avoid Duplicates**: Use `skip_duplicates=True` when re-inserting known data to avoid errors.\n",
101+
"\n",
102+
"## Summary\n",
103+
"\n",
104+
"- Use `insert1` for single-row insertions and `insert` for batch operations.\n",
105+
"- Both commands enforce schema constraints and maintain the integrity of the database.\n",
106+
"- Proper use of these commands ensures efficient, accurate, and scalable data entry into your DataJoint pi\n"
107+
]
108+
}
109+
],
110+
"metadata": {
111+
"language_info": {
112+
"name": "python"
113+
}
114+
},
115+
"nbformat": 4,
116+
"nbformat_minor": 2
117+
}
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Delete\n",
8+
"\n",
9+
"(This is an AI-generated placeholder to be edited.)\n",
10+
"\n",
11+
"The `delete` command in DataJoint provides a robust mechanism for removing data from tables. It ensures that deletions respect the dependency structure defined by the relational schema, preserving the integrity of your database. This command is powerful and should be used with a clear understanding of its effects on downstream dependencies.\n",
12+
"\n",
13+
"## Overview of `delete`\n",
14+
"\n",
15+
"The `delete` command removes entries from a table. When executed, it ensures that all dependent data in downstream tables is also removed, unless explicitly restricted.\n",
16+
"\n",
17+
"### Syntax\n",
18+
"\n",
19+
"```python\n",
20+
"<Table>.delete(safemode=True, quick=False)\n",
21+
"```\n",
22+
"\n",
23+
"### Parameters\n",
24+
"\n",
25+
"1. **`safemode`** *(default: True)*:\n",
26+
" - If `True`, prompts the user for confirmation before deleting any data.\n",
27+
" - If `False`, proceeds with deletion without prompting.\n",
28+
"2. **`quick`** *(default: False)*:\n",
29+
" - If `True`, accelerates deletion by skipping certain checks, such as confirming dependencies.\n",
30+
" - Use this option cautiously as it bypasses safety mechanisms.\n",
31+
"\n",
32+
"## Example Usage\n",
33+
"\n",
34+
"### Deleting Specific Entries\n",
35+
"\n",
36+
"To delete specific rows based on a condition:\n",
37+
"\n",
38+
"```python\n",
39+
"import datajoint as dj\n",
40+
"\n",
41+
"schema = dj.Schema('example_schema')\n",
42+
"\n",
43+
"@schema\n",
44+
"class Animal(dj.Manual):\n",
45+
" definition = \"\"\"\n",
46+
" animal_id: int # Unique identifier for the animal\n",
47+
" ---\n",
48+
" species: varchar(64) # Species of the animal\n",
49+
" age: int # Age of the animal in years\n",
50+
" \"\"\"\n",
51+
"\n",
52+
"# Insert example data\n",
53+
"Animal.insert([\n",
54+
" {'animal_id': 1, 'species': 'Dog', 'age': 5},\n",
55+
" {'animal_id': 2, 'species': 'Cat', 'age': 3},\n",
56+
"])\n",
57+
"\n",
58+
"# Delete rows where species is 'Cat'\n",
59+
"(Animal & {'species': 'Cat'}).delete()\n",
60+
"```\n",
61+
"\n",
62+
"### Deleting All Entries\n",
63+
"\n",
64+
"To delete all entries from a table:\n",
65+
"\n",
66+
"```python\n",
67+
"Animal.delete()\n",
68+
"```\n",
69+
"\n",
70+
"### Using `safemode`\n",
71+
"\n",
72+
"By default, `safemode=True` will prompt the user for confirmation before deletion. To bypass the prompt:\n",
73+
"\n",
74+
"```python\n",
75+
"Animal.delete(safemode=False)\n",
76+
"```\n",
77+
"\n",
78+
"## Dependency Management\n",
79+
"\n",
80+
"One of the key features of `delete` is its handling of dependencies. When deleting data, DataJoint ensures that:\n",
81+
"\n",
82+
"1. **Downstream Data is Removed**: Any dependent entries in other tables are recursively deleted to maintain referential integrity.\n",
83+
"2. **Deletion is Acyclic**: The dependency graph is traversed in topological order to avoid cyclic deletion issues.\n",
84+
"\n",
85+
"### Restricting Deletions\n",
86+
"\n",
87+
"To delete specific entries while preserving others:\n",
88+
"\n",
89+
"```python\n",
90+
"(Animal & {'animal_id': 1}).delete()\n",
91+
"```\n",
92+
"\n",
93+
"In this example, only the entry with `animal_id=1` is deleted, and other rows remain intact.\n",
94+
"\n",
95+
"## Best Practices\n",
96+
"\n",
97+
"1. **Use `safemode=True`**: Always use `safemode` when testing or in uncertain situations to prevent accidental data loss.\n",
98+
"2. **Test Deletion Queries**: Before running `delete`, test your restrictions with `fetch` to ensure you are targeting the correct data.\n",
99+
"3. **Be Cautious with `quick=True`**: Use the `quick` parameter sparingly, as it skips important safety checks.\n",
100+
"4. **Understand Dependencies**: Review your schema's dependency structure to anticipate the cascading effects of deletions.\n",
101+
"\n",
102+
"## Summary\n",
103+
"\n",
104+
"The `delete` command is a powerful tool for managing data lifecycle in a DataJoint pipeline. By respecting dependencies and offering safety mechanisms, it ensures that data deletions are controlled and consistent. Proper use of this command helps maintain the integrity and cleanliness of your database.\n",
105+
"\n"
106+
]
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"metadata": {},
111+
"source": []
112+
}
113+
],
114+
"metadata": {
115+
"language_info": {
116+
"name": "python"
117+
}
118+
},
119+
"nbformat": 4,
120+
"nbformat_minor": 2
121+
}

book/40-operations/50-update-one.ipynb renamed to book/40-operations/030-updates.ipynb

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# `update1`\n",
8-
"## Updating values in a table"
7+
"# Updates\n",
8+
"\n",
9+
"## Updating existing rows"
910
]
1011
},
1112
{

book/40-operations/009-Transactions.ipynb renamed to book/40-operations/040-transactions.ipynb

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,51 @@
66
"source": [
77
"# Transactions\n",
88
"\n",
9-
"Some sequences of operations must be performed carefully with isolation from outside interventions and must not be left incomplete.\n",
9+
"Databases are not merely storage systems; they should accurately represent an enterprise's current state.\n",
10+
"This means that all users, irrespective of their interactions, should view and engage with the same data simultaneously seeing the results of each other's interactions without breaking data integrity.\n",
11+
"This principle is known as **data consistency**.\n",
12+
"\n",
13+
"```{card} Data Consistency\n",
14+
"**Data Consistency:** A database's capability to present a singular, valid, and current version of its data to all users, even during concurrent access and modifications.\n",
15+
"Successful read queries should reflect the database's most recent state, while successful writes should immediately influence all subsequent read actions.\n",
16+
"```\n",
17+
"The underlying data may be distributed and true consistency may be deferred but the system\n",
18+
"\n",
19+
"Understanding data consistency becomes clearer when examining its breaches.\n",
20+
"For instance, during early morning hours, I've observed my bank's website displaying the previous day's pending transactions, but the account balance doesn't reflect these changes until a couple of hours later.\n",
21+
"This discrepancy between transaction views and account balances exemplifies data inconsistency.\n",
22+
"Fortunately, such inconsistencies, in this case, seem to be confined to the web interface, as the system eventually reaches a consistent state.\n",
23+
"\n",
24+
"Ensuring data consistency is straightforward in certain scenarios.\n",
25+
"By avoiding conditions that might compromise it, consistency is preserved.\n",
26+
"For example, if only one party generates data and the rest merely access it, the likelihood of conflicts leading to inconsistency is minimal.\n",
27+
"Delayed queries still provide a consistent, albeit older, state.\n",
28+
"This is typical in scientific projects, where one lab produces data while others analyze it.\n",
29+
"\n",
30+
"Complexities arise when multiple entities, be they human or digital, access and modify data simultaneously.\n",
31+
"Maintaining consistency amidst such concurrent interactions becomes challenging.\n",
32+
"To achieve this, databases might temporarily limit access for some users during another's transaction or force users to resolve discrepancies before data integration.\n",
33+
"\n",
34+
"Modern relational databases adhere to the **ACID model** to maintain consistency:\n",
35+
"\n",
36+
"```{card} ACID Model for Database Transactions\n",
37+
"- **A**tomic\n",
38+
"- **C**onsistent\n",
39+
"- **I**solated\n",
40+
"- **D**urable\n",
41+
"```\n",
42+
"\n",
43+
"Ensuring consistency becomes notably challenging in geographically dispersed systems with distributed data storage, especially when faced with slow or intermittent network connections.\n",
44+
"Historically, it was believed that data systems spanning vast areas couldn't maintain consistency.\n",
45+
"The **CAP Theorem** suggested that in such systems, there's an irreconcilable trade-off between system responsiveness (availability) and data consistency.\n",
46+
"\n",
47+
"Traditional relational database systems, like Oracle, MySQL, and others, maintained strong consistency but weren't tailored for distributed setups. This limitation spurred the rise of **NoSQL** in the 2000s and 2010s, emphasizing responsiveness in distributed systems, albeit with weaker consistency.\n",
48+
"\n",
49+
"However, recent advancements have bridged this gap. Modern distributed systems, like Spanner and CockroachDB, leverage data replication and consensus algorithms (e.g., Paxos, Raft) to offer high availability while maintaining strict consistency.\n",
50+
"\n",
51+
"DataJoint adheres to the classic ACID consistency model, leveraging serializable transactions or the master-part relationship, detailed further in the \"Transactions\" section.\n",
52+
"\n",
53+
"[Some sequences of operations must be performed carefully with isolation from outside interventions and must not be left incomplete.\n",
1054
"\n",
1155
"- A = Atomic\n",
1256
"- C = Consistent\n",

0 commit comments

Comments
 (0)