Skip to content

Commit 40b6d3e

Browse files
expand the Diagramming chapter.
1 parent 9d7f30c commit 40b6d3e

File tree

2 files changed

+240
-5
lines changed

2 files changed

+240
-5
lines changed

book/30-schema-design/035-diagrams.ipynb

Lines changed: 236 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
"\n",
7171
"```\n",
7272
"┌─────────────┐ ┌─────────────┐\n",
73-
"│ Customer │1 *│ Order │\n",
73+
"│ Customer │1 *│ Order │\n",
7474
"│─────────────│◆────────│─────────────│\n",
7575
"│ customerId │ │ orderId │\n",
7676
"│ name │ │ orderDate │\n",
@@ -636,6 +636,241 @@
636636
"- You want diagrams to be executable (code generates diagrams)"
637637
]
638638
},
639+
{
640+
"cell_type": "markdown",
641+
"metadata": {},
642+
"source": [
643+
"## Conceptual Design vs. Implementation: A Key Philosophical Difference\n",
644+
"\n",
645+
"Database design is traditionally taught as a **two-phase process**:\n",
646+
"\n",
647+
"1. **Conceptual Design Phase**: Create ER diagrams to model entities and relationships\n",
648+
"2. **Implementation Phase**: Translate the conceptual model into SQL CREATE TABLE statements\n",
649+
"\n",
650+
"This separation reflects a workflow where design and implementation are distinct activities, often performed by different people or at different times.\n",
651+
"\n",
652+
"### Traditional Two-Phase Approach\n",
653+
"\n",
654+
"In most database textbooks and courses, the process looks like this:\n",
655+
"\n",
656+
"```\n",
657+
"Step 1: Conceptual Design\n",
658+
"├─ Use Chen's ER diagrams or Crow's Foot notation\n",
659+
"├─ Focus on entities, relationships, cardinalities\n",
660+
"├─ Design without worrying about implementation details\n",
661+
"└─ Create diagrams for discussion and approval\n",
662+
"\n",
663+
" ↓ (Manual Translation)\n",
664+
"\n",
665+
"Step 2: Implementation\n",
666+
"├─ Write SQL CREATE TABLE statements\n",
667+
"├─ Define primary keys and foreign keys\n",
668+
"├─ Implement constraints and indexes\n",
669+
"└─ Hope the implementation matches the design!\n",
670+
"\n",
671+
" ↓ (Potential Divergence)\n",
672+
"\n",
673+
"Problem: Diagrams and Implementation Can Drift Apart\n",
674+
"├─ Diagrams updated → SQL not updated (documentation out of sync)\n",
675+
"├─ SQL updated → Diagrams not updated (design drift)\n",
676+
"└─ Requires discipline to keep them synchronized\n",
677+
"```\n",
678+
"\n",
679+
"**Characteristics**:\n",
680+
"- **Two separate artifacts**: Diagram (conceptual) and SQL code (implementation)\n",
681+
"- **Manual synchronization**: Changes must be made in both places\n",
682+
"- **Documentation debt**: Over time, diagrams often become outdated\n",
683+
"- **Waterfall-oriented**: Design must be \"complete\" before implementation\n",
684+
"- **Communication gap**: Designers and implementers may be different people\n",
685+
"\n",
686+
"### DataJoint's Unified Approach\n",
687+
"\n",
688+
"DataJoint fundamentally changes this by **merging conceptual design and implementation**:\n",
689+
"\n",
690+
"```\n",
691+
"Single Step: Unified Design-Implementation\n",
692+
"├─ Write Python class definitions (or SQL if preferred)\n",
693+
"├─ DataJoint automatically creates tables in database\n",
694+
"├─ DataJoint automatically generates diagrams from live schema\n",
695+
"└─ Diagram and implementation are ALWAYS in sync\n",
696+
"\n",
697+
" ↓ (No Translation Needed)\n",
698+
"\n",
699+
"Result: Diagrams ARE the Implementation\n",
700+
"├─ Change the code → Diagram updates automatically\n",
701+
"├─ Diagram always reflects actual database structure\n",
702+
"└─ Zero documentation debt\n",
703+
"```\n",
704+
"\n",
705+
"**Characteristics**:\n",
706+
"- **Single source of truth**: The code IS the design\n",
707+
"- **Automatic synchronization**: Diagrams generated from actual database schema\n",
708+
"- **Always current**: Diagrams cannot become outdated\n",
709+
"- **Agile-friendly**: Can iterate on design rapidly\n",
710+
"- **Executable documentation**: Diagrams are generated from running code\n",
711+
"\n",
712+
"### Practical Implications\n",
713+
"\n",
714+
"#### Traditional Approach Example:\n",
715+
"\n",
716+
"**Phase 1 - Conceptual Design** (ER Diagram):\n",
717+
"```\n",
718+
"[Student] ──enrolls in─── [Course]\n",
719+
" 1 M:N 1\n",
720+
"```\n",
721+
"\n",
722+
"**Phase 2 - Implementation** (Manual SQL):\n",
723+
"```sql\n",
724+
"CREATE TABLE student (\n",
725+
" student_id INT PRIMARY KEY,\n",
726+
" name VARCHAR(100)\n",
727+
");\n",
728+
"\n",
729+
"CREATE TABLE course (\n",
730+
" course_id INT PRIMARY KEY,\n",
731+
" title VARCHAR(100)\n",
732+
");\n",
733+
"\n",
734+
"CREATE TABLE enrollment (\n",
735+
" student_id INT,\n",
736+
" course_id INT,\n",
737+
" PRIMARY KEY (student_id, course_id),\n",
738+
" FOREIGN KEY (student_id) REFERENCES student(student_id),\n",
739+
" FOREIGN KEY (course_id) REFERENCES course(course_id)\n",
740+
");\n",
741+
"```\n",
742+
"\n",
743+
"**Problem**: If you later add a `grade` field to enrollment, you must:\n",
744+
"1. Update the SQL code\n",
745+
"2. Update the ER diagram manually\n",
746+
"3. Update all documentation\n",
747+
"4. Risk: Steps 2-3 often get skipped\n",
748+
"\n",
749+
"#### DataJoint Unified Approach:\n",
750+
"\n",
751+
"**Single Definition** (Code + Diagram in one):\n",
752+
"```python\n",
753+
"@schema\n",
754+
"class Student(dj.Manual):\n",
755+
" definition = \"\"\"\n",
756+
" student_id : int\n",
757+
" ---\n",
758+
" name : varchar(100)\n",
759+
" \"\"\"\n",
760+
"\n",
761+
"@schema\n",
762+
"class Course(dj.Manual):\n",
763+
" definition = \"\"\"\n",
764+
" course_id : int\n",
765+
" ---\n",
766+
" title : varchar(100)\n",
767+
" \"\"\"\n",
768+
"\n",
769+
"@schema\n",
770+
"class Enrollment(dj.Manual):\n",
771+
" definition = \"\"\"\n",
772+
" -> Student\n",
773+
" -> Course\n",
774+
" ---\n",
775+
" grade : char(1) # Added later\n",
776+
" \"\"\"\n",
777+
"\n",
778+
"# Diagram is automatically generated\n",
779+
"dj.Diagram(schema)\n",
780+
"```\n",
781+
"\n",
782+
"**Advantage**: \n",
783+
"- Add `grade` field → Save file → Diagram updates automatically\n",
784+
"- **Impossible** for diagram to be out of sync with implementation\n",
785+
"- Code review catches design changes (they're in the same artifact)\n",
786+
"\n",
787+
"### Enabling Agile Database Design\n",
788+
"\n",
789+
"This unified approach enables an **agile, iterative workflow**:\n",
790+
"\n",
791+
"**Traditional Approach** (Waterfall):\n",
792+
"```\n",
793+
"Design → Review → Approve → Implement → Test → Deploy\n",
794+
" ↑ |\n",
795+
" └────────────── Difficult to go back ───────────┘\n",
796+
"```\n",
797+
"\n",
798+
"**DataJoint Approach** (Agile):\n",
799+
"```\n",
800+
"Design+Implement → Test → Iterate → Deploy\n",
801+
" ↓ ↑\n",
802+
" └──── Easy iteration ──┘\n",
803+
"```\n",
804+
"\n",
805+
"Benefits:\n",
806+
"1. **Rapid prototyping**: Define a table, see the diagram immediately\n",
807+
"2. **Safe experimentation**: Change foreign keys, instantly see impact on diagram\n",
808+
"3. **Continuous refinement**: Iterate on design as you learn more about your domain\n",
809+
"4. **Team collaboration**: Everyone works with the same code that generates diagrams\n",
810+
"5. **Version control**: Git tracks both design and implementation (they're the same file)\n",
811+
"\n",
812+
"### The Bi-Directional Property\n",
813+
"\n",
814+
"DataJoint's approach is **bi-directional**:\n",
815+
"\n",
816+
"**Code → Diagram** (Normal workflow):\n",
817+
"```python\n",
818+
"# Write Python class definition\n",
819+
"@schema\n",
820+
"class MyTable(dj.Manual):\n",
821+
" definition = \"...\"\n",
822+
"\n",
823+
"# Generate diagram\n",
824+
"dj.Diagram(schema) # Automatically reflects code\n",
825+
"```\n",
826+
"\n",
827+
"**Database → Code → Diagram** (Reverse engineering):\n",
828+
"```python\n",
829+
"# Connect to existing database\n",
830+
"schema = dj.Schema('existing_db')\n",
831+
"\n",
832+
"# Spawn Python classes from tables\n",
833+
"schema.spawn_missing_classes()\n",
834+
"\n",
835+
"# Generate diagram\n",
836+
"dj.Diagram(schema) # Reflects actual database structure\n",
837+
"```\n",
838+
"\n",
839+
"This means you can:\n",
840+
"- Import existing databases and immediately visualize them\n",
841+
"- Start from either code or database and get the diagram\n",
842+
"- Ensure documentation always matches reality\n",
843+
"\n",
844+
"### Comparison Summary\n",
845+
"\n",
846+
"| Aspect | Traditional Two-Phase | DataJoint Unified |\n",
847+
"|--------|----------------------|-------------------|\n",
848+
"| **Design artifact** | ER/Crow's Foot diagram | Python/SQL code |\n",
849+
"| **Implementation artifact** | SQL statements | Same as design |\n",
850+
"| **Diagram generation** | Manual (tools like Visio) | Automatic from code |\n",
851+
"| **Synchronization** | Manual discipline | Automatic |\n",
852+
"| **Change process** | Update both separately | Update code once |\n",
853+
"| **Version control** | Separate files | Single source |\n",
854+
"| **Agility** | Waterfall-oriented | Iteration-friendly |\n",
855+
"| **Documentation debt** | Accumulates over time | Impossible to accrue |\n",
856+
"| **Learning curve** | Learn notation, then SQL | Learn one syntax |\n",
857+
"\n",
858+
"### Implications for This Chapter\n",
859+
"\n",
860+
"Because DataJoint diagrams are automatically generated from implementation, this chapter teaches you:\n",
861+
"\n",
862+
"1. **How to read** what the diagram tells you about the actual database\n",
863+
"2. **How to design** by choosing appropriate line styles (which determines implementation)\n",
864+
"3. **How to think** about the semantic meaning of relationships (not just cardinality)\n",
865+
"\n",
866+
"When you learn to read DataJoint diagrams, you're simultaneously learning:\n",
867+
"- How the database is structured (implementation)\n",
868+
"- How entities relate to each other (conceptual model)\n",
869+
"- How to query the data (query patterns follow diagram structure)\n",
870+
"\n",
871+
"**The bottom line**: In DataJoint, the diagram is not a separate design document—it's a **live view** of your implemented schema. This makes diagrams more trustworthy, more useful, and more integral to the development process."
872+
]
873+
},
639874
{
640875
"cell_type": "markdown",
641876
"metadata": {},

book/30-schema-design/050-relationships.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1504,7 +1504,7 @@
15041504
},
15051505
{
15061506
"cell_type": "code",
1507-
"execution_count": 15,
1507+
"execution_count": null,
15081508
"metadata": {},
15091509
"outputs": [
15101510
{
@@ -1660,7 +1660,7 @@
16601660
},
16611661
{
16621662
"cell_type": "code",
1663-
"execution_count": 16,
1663+
"execution_count": null,
16641664
"metadata": {},
16651665
"outputs": [
16661666
{
@@ -1799,7 +1799,7 @@
17991799
},
18001800
{
18011801
"cell_type": "code",
1802-
"execution_count": 17,
1802+
"execution_count": null,
18031803
"metadata": {},
18041804
"outputs": [
18051805
{
@@ -2035,7 +2035,7 @@
20352035
},
20362036
{
20372037
"cell_type": "code",
2038-
"execution_count": 20,
2038+
"execution_count": null,
20392039
"metadata": {},
20402040
"outputs": [
20412041
{

0 commit comments

Comments
 (0)