|
4 | 4 | "cell_type": "markdown", |
5 | 5 | "metadata": {}, |
6 | 6 | "source": [ |
7 | | - "# Schema Modules\n", |
8 | | - "\n", |
9 | | - "\n" |
| 7 | + "# Multi-Schema Designs" |
10 | 8 | ] |
11 | 9 | }, |
12 | 10 | { |
|
17 | 15 | } |
18 | 16 | }, |
19 | 17 | "source": [ |
20 | | - "A large database schema can be composed of multiple modules. We often call each module a schema " |
| 18 | + "# Defining Complex Databases with Multiple Schemas in DataJoint\n", |
| 19 | + "\n", |
| 20 | + "In DataJoint, defining **multiple schemas across separate Python modules** ensures that large, complex projects remain well-organized, modular, and maintainable. Each schema should be defined in a **dedicated Python module** to adhere to best practices. This structure ensures that every module maintains **only one `schema` object**, and **downstream schemas import upstream schemas** to manage dependencies correctly. This approach improves code clarity, enables better version control, and simplifies collaboration across teams.\n", |
| 21 | + "\n", |
| 22 | + "\n", |
| 23 | + "## 1. Why Use Multiple Schemas in Separate Modules?\n", |
| 24 | + "\n", |
| 25 | + "Using multiple schemas across separate modules offers the following benefits:\n", |
| 26 | + "\n", |
| 27 | + "1. **Modularity and Code Organization**: Each module contains only the tables relevant to a specific schema, making the codebase easier to manage and navigate.\n", |
| 28 | + "2. **Clear Boundaries Between Schemas**: Ensures a separation of concerns, where each schema focuses on a specific aspect of the pipeline (e.g., acquisition, processing, analysis).\n", |
| 29 | + "3. **Dependency Management**: Downstream schemas explicitly **import upstream schemas** to manage table dependencies and data flow.\n", |
| 30 | + "4. **Collaboration**: Multiple developers or teams can work on separate modules without conflicts.\n", |
| 31 | + "5. **Scalability and Maintainability**: Isolating schemas into modules simplifies future updates and troubleshooting.\n", |
| 32 | + "\n", |
| 33 | + "\n", |
| 34 | + "## 2. How to Structure Modules for Multiple Schemas\n", |
| 35 | + "\n", |
| 36 | + "Below is an example that demonstrates how to organize multiple schemas in separate Python modules.\n", |
| 37 | + "\n", |
| 38 | + "### Project Structure\n", |
| 39 | + "\n", |
| 40 | + "```\n", |
| 41 | + "my_pipeline/\n", |
| 42 | + "│\n", |
| 43 | + "├── subject.py # Defines subject_management schema\n", |
| 44 | + "├── acquisition.py # Defines acquisition schema (depends on subject_management)\n", |
| 45 | + "├── processing.py # Defines processing schema (depends on acquisition)\n", |
| 46 | + "└── analysis.py # Defines analysis schema (depends on processing)\n", |
| 47 | + "```\n", |
| 48 | + "\n", |
| 49 | + "### Step-by-Step Implementation\n", |
| 50 | + "\n", |
| 51 | + "1. **Define** `subject_management.py`\n", |
| 52 | + "This module defines the subject_management schema and contains the Subject table.\n" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "markdown", |
| 57 | + "metadata": {}, |
| 58 | + "source": [] |
| 59 | + }, |
| 60 | + { |
| 61 | + "cell_type": "code", |
| 62 | + "execution_count": 16, |
| 63 | + "metadata": {}, |
| 64 | + "outputs": [ |
| 65 | + { |
| 66 | + "name": "stdout", |
| 67 | + "output_type": "stream", |
| 68 | + "text": [ |
| 69 | + "\u001b[0;32mimport\u001b[0m \u001b[0mdatajoint\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mdj\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 70 | + "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 71 | + "\u001b[0;34m\u001b[0m\u001b[0;31m# Define the subject management schema\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 72 | + "\u001b[0;34m\u001b[0m\u001b[0mschema\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSchema\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"subject_management\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 73 | + "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 74 | + "\u001b[0;34m\u001b[0m\u001b[0;34m@\u001b[0m\u001b[0mschema\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 75 | + "\u001b[0;34m\u001b[0m\u001b[0;32mclass\u001b[0m \u001b[0mSubject\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mManual\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", |
| 76 | + "\u001b[0;34m\u001b[0m \u001b[0mdefinition\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"\"\"\u001b[0m\n", |
| 77 | + "\u001b[0;34m subject_id : int\u001b[0m\n", |
| 78 | + "\u001b[0;34m ---\u001b[0m\n", |
| 79 | + "\u001b[0;34m subject_name : varchar(50)\u001b[0m\n", |
| 80 | + "\u001b[0;34m species : varchar(50)\u001b[0m\n", |
| 81 | + "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n" |
| 82 | + ] |
| 83 | + } |
| 84 | + ], |
| 85 | + "source": [ |
| 86 | + "%pycat code/subject.py" |
21 | 87 | ] |
| 88 | + }, |
| 89 | + { |
| 90 | + "cell_type": "code", |
| 91 | + "execution_count": null, |
| 92 | + "metadata": {}, |
| 93 | + "outputs": [], |
| 94 | + "source": [] |
22 | 95 | } |
23 | 96 | ], |
24 | 97 | "metadata": { |
25 | 98 | "kernelspec": { |
26 | | - "display_name": "Python 3", |
| 99 | + "display_name": "base", |
27 | 100 | "language": "python", |
28 | 101 | "name": "python3" |
29 | 102 | }, |
|
37 | 110 | "name": "python", |
38 | 111 | "nbconvert_exporter": "python", |
39 | 112 | "pygments_lexer": "ipython3", |
40 | | - "version": "3.9.17" |
| 113 | + "version": "3.11.10" |
41 | 114 | }, |
42 | 115 | "orig_nbformat": 4 |
43 | 116 | }, |
|
0 commit comments