Skip to content

Commit fa685eb

Browse files
add Server/Client architectures in the Concepts.
1 parent b15fe5f commit fa685eb

File tree

2 files changed

+77
-12
lines changed

2 files changed

+77
-12
lines changed

book/20-concepts/00-databases.md

Lines changed: 76 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,37 +13,106 @@ The database not only tracks the current state of the enterprise's processes but
1313
**Key traits of databases**:
1414
- Structured data reflects the logic of the enterprise's operations
1515
- Supports the organization's operations by reflecting and enforcing its rules and constraints (data integrity)
16+
- **Precise access control ensures only authorized users can view or modify specific data**
1617
- Ability to evolve over time
1718
- Facilitates distributed, concurrent access by multiple users
1819
- Centralized data consistency, appearing as a single source of data even if physically distributed, reflecting all changes
1920
- Allows specific and precise queries through various interfaces for different users
2021
```
2122

22-
Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, and only valid transactions are allowed, thereby preventing errors or inconsistencies. Databases are designed to support the critical operations of data-driven organizations, enabling effective collaboration among multiple users.
23+
Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, only valid transactions are allowed, and **sensitive data is protected** from unauthorized access. This combination of data integrity and data security makes databases indispensable for any operation where data reliability and confidentiality matter.
2324

2425
## Database Management Systems (DBMS)
2526

2627
```{card} Database Management System
2728
A Database Management System is a software system that serves as the computational engine powering a database.
2829
It defines and enforces the structure of the data, ensuring that the organization's rules are consistently applied.
2930
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the data's structure and integrity, particularly in environments with multiple concurrent users.
31+
32+
**Critically, a DBMS also manages user authentication and authorization**, controlling who can access which data and what operations they can perform.
3033
```
3134

3235
Consider an airline's database for flight schedules and ticket bookings. The airline must adhere to several key rules:
3336

34-
* A seat cannot be booked by two passengers for the same flight.
35-
* A seat is considered reserved only after all details are verified and payment is processed.
37+
* A seat cannot be booked by two passengers for the same flight
38+
* A seat is considered reserved only after all details are verified and payment is processed
39+
* **Only authorized ticketing agents can modify reservations**
40+
* **Passengers can view only their own booking information**
41+
* **Financial data is accessible only to accounting staff**
42+
43+
A robust DBMS enforces such rules reliably, ensuring smooth operations while interacting with multiple users and systems at once. The same system that prevents double-booking also prevents unauthorized access to passenger records.
44+
45+
Databases are dynamic, with data continuously updated by both users and systems. Even in the face of disruptions like power outages, errors, or cyberattacks, the DBMS ensures that the system recovers quickly and returns to a stable state. For users, the database should function seamlessly, allowing actions to be performed without interference from others working on the system simultaneously—**while ensuring they can only perform actions they're authorized to do**.
46+
47+
## Data Security and Access Management
48+
49+
One of the most critical features distinguishing databases from simple file storage is **precise access control**. In scientific research, healthcare, finance, and many other domains, not all data should be accessible to all users.
50+
51+
### Authentication and Authorization
52+
53+
Before you can work with a database, you must **authenticate**—prove your identity with a username and password. Once authenticated, the database enforces **authorization** rules that determine what you can do:
54+
55+
- **Read**: View specific tables or columns
56+
- **Write**: Add new data to certain tables
57+
- **Modify**: Change existing data (where permitted)
58+
- **Delete**: Remove data (if authorized)
59+
60+
For example, in a research lab database:
61+
- A principal investigator might have full access to all experimental data
62+
- A graduate student might read and write only to their assigned experiments
63+
- An external collaborator might have read-only access to published results
64+
- An undergraduate assistant might only insert data for specific protocols
65+
66+
### Why Database-Level Security Matters
67+
68+
Without centralized access control, you'd need to implement security restrictions in every script, notebook, and application that touches your data. If someone writes a new analysis program, they'd need to correctly re-implement all security logic—a recipe for errors and breaches.
69+
70+
Database-level security means the database itself enforces these rules uniformly, regardless of how users connect. This is especially important for:
71+
72+
- **Regulatory compliance**: HIPAA for patient data, GDPR for personal information
73+
- **Collaborative research**: Different partners may have access to different datasets
74+
- **Sensitive data**: Unpublished results, proprietary information, personally identifiable data
75+
- **Accountability**: Knowing who accessed or modified what data, and when
3676

37-
A robust DBMS enforces such rules reliably, ensuring smooth operations while interacting with multiple users and systems at once.
77+
## Database Architecture
3878

39-
Databases are dynamic, with data continuously updated by both users and systems. Even in the face of disruptions like power outages, errors, or cyberattacks, the DBMS ensures that the system recovers quickly and returns to a stable state. For users, the database should function seamlessly, allowing actions to be performed without interference from others working on the system simultaneously.
79+
Modern databases typically separate data management from data use through distinct architectural roles. Understanding these roles helps clarify how databases maintain consistency and security across multiple users and applications.
80+
81+
### Common Architectures
82+
83+
**Server-Client Architecture** (most common): A database server program manages all data operations, while client programs (your scripts, applications, notebooks) connect to request data or submit changes. The server enforces all rules and access permissions consistently for every client. This is like a library where the librarian (server) manages the books and enforces checkout policies, while patrons (clients) request materials.
84+
85+
**Embedded Databases**: The database engine runs within your application itself—no separate server. This works for single-user applications like mobile apps or desktop software, but doesn't support multiple users accessing shared data simultaneously. SQLite is a common embedded database.
86+
87+
**Distributed Databases**: Data and processing are spread across multiple servers working together. This provides high availability and can handle massive scale, but adds significant complexity. Systems like Google Spanner and Amazon DynamoDB use this approach.
88+
89+
For collaborative scientific research, the server-client architecture dominates because it naturally supports multiple researchers working with shared data while maintaining consistent integrity and security rules.
90+
91+
### Why Architectural Separation Matters
92+
93+
Separating data management from data use provides critical advantages:
94+
95+
**Centralized Control**: All data lives in one managed location. Updates are immediately visible to everyone. There's no confusion about which copy of the data is current.
96+
97+
**Consistent Rules**: The database enforces integrity constraints and access permissions uniformly. Whether you connect through Python, R, a web interface, or a command-line tool, the same rules apply.
98+
99+
**Specialized Optimization**: The database system focuses exclusively on efficient, reliable data management—fast queries, safe concurrent access, automatic backups. Your applications focus on research logic and user interfaces.
100+
101+
**Language Independence**: The same database serves Python scripts, R analyses, web dashboards, and automated pipelines. Each tool does what it does best, all working with the same reliable, secure data.
40102

41103
## Preview: DataJoint and This Book
42104

43105
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
44106

45-
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability while maintaining the core principles that make them reliable and powerful.
107+
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.
108+
109+
DataJoint extends this proven foundation with workflow-aware capabilities that scientific computing requires. Throughout this book, you'll learn how to:
110+
- Design database schemas that represent your research workflows
111+
- Leverage the server-client architecture for collaborative research
112+
- Use access control to manage sensitive research data appropriately
113+
- Ensure data integrity and computational validity
114+
- Build reproducible data pipelines
46115

47-
DataJoint extends this proven foundation with workflow-aware capabilities that scientific computing requires. We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.
116+
We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.
48117

49118
The next chapters explore what data models are, why the relational model is particularly well-suited for scientific work, and how DataJoint builds on relational theory to support the computational workflows central to modern research.

book/20-concepts/concept-quiz.md renamed to book/20-concepts/concepts-quiz.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,4 @@
1-
---
2-
title: Knowledge Check: Database Concepts
3-
---
4-
5-
# Knowledge Test: Database Concepts
1+
# Knowledge Check: Concepts
62

73
This assessment covers Chapters 0-4 of the Database Concepts section. Questions include both single-answer and multiple-answer formats.
84

0 commit comments

Comments
 (0)