Skip to content

Commit ad47e23

Browse files
revise Database Intro
1 parent c729cbd commit ad47e23

File tree

1 file changed

+21
-65
lines changed

1 file changed

+21
-65
lines changed

book/20-concepts/00-databases.md

Lines changed: 21 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,49 @@
1-
# Databases
2-
## Definition
3-
In this book, we will use the term *database* in a more specialized sense than what you might find through a casual search or in other texts.
4-
Our definition is stricter, emphasizing the critical role of databases in complex operations.
1+
---
2+
title: Databases
3+
---
4+
5+
## What is a Database?
56

67
```{card} Database
78
A **database** is a dynamic (i.e. *time-varying*), systematically organized collection of data that plays an integral role in the operation of an enterprise.
8-
It supports the enterprises operations and is accessed by a variety of users in different ways. Examples of enterprises that rely on databases include hotels, airlines, stores, hospitals, universities, banks, and scientific studies.
9+
It supports the enterprise's operations and is accessed by a variety of users in different ways. Examples of enterprises that rely on databases include hotels, airlines, stores, hospitals, universities, banks, and scientific studies.
910
10-
The database not only tracks the current state of the enterprises processes but also enforces essential *business rules*, ensuring that only valid transactions occur and preventing errors or inconsistencies. It serves as the **system of record**, the **single source of truth**, accurately reflecting the current state and ongoing activities.
11+
The database not only tracks the current state of the enterprise's processes but also enforces essential *business rules*, ensuring that only valid transactions occur and preventing errors or inconsistencies. It serves as the **system of record**, the **single source of truth**, accurately reflecting the current state and ongoing activities.
1112
1213
**Key traits of databases**:
13-
- Structured data reflects the logic of the enterprises operations
14-
- Supports the organizations operations by reflecting and enforcing its rules and constraints (data integrity)
14+
- Structured data reflects the logic of the enterprise's operations
15+
- Supports the organization's operations by reflecting and enforcing its rules and constraints (data integrity)
1516
- Ability to evolve over time
1617
- Facilitates distributed, concurrent access by multiple users
1718
- Centralized data consistency, appearing as a single source of data even if physically distributed, reflecting all changes
1819
- Allows specific and precise queries through various interfaces for different users
1920
```
2021

21-
You might ask: aren’t databases also used for simpler tasks, like keeping a personal recipe book or an address list? While that’s true, our definition focuses on the capabilities of a full-featured database system, which are essential for more complex operations. For simpler needs, an electronic spreadsheet or a collection of shared files might be sufficient. However, when managing something as intricate as a bank or an airline, a comprehensive database system becomes indispensable.
22-
23-
Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, and only valid transactions are allowed, thereby preventing errors or illegal actions.
22+
Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, and only valid transactions are allowed, thereby preventing errors or inconsistencies. Databases are designed to support the critical operations of data-driven organizations, enabling effective collaboration among multiple users.
2423

25-
In summary, databases are designed to support the critical operations of data-driven organizations, enabling effective collaboration among multiple users.
24+
## Database Management Systems (DBMS)
2625

27-
```{card} Database Management Systems (DBMS)
26+
```{card} Database Management System
2827
A Database Management System is a software system that serves as the computational engine powering a database.
29-
It defines and enforces the structure of the data, ensuring that the organizations rules are consistently applied.
30-
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the datas structure and integrity, particularly in environments with multiple concurrent users.
28+
It defines and enforces the structure of the data, ensuring that the organization's rules are consistently applied.
29+
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the data's structure and integrity, particularly in environments with multiple concurrent users.
3130
```
31+
3232
Consider an airline's database for flight schedules and ticket bookings. The airline must adhere to several key rules:
3333

3434
* A seat cannot be booked by two passengers for the same flight.
3535
* A seat is considered reserved only after all details are verified and payment is processed.
3636

37-
A robust DBMS enforces such rules reliably, ensuring smooth operations, while interacting with multiple users and systems at once.
37+
A robust DBMS enforces such rules reliably, ensuring smooth operations while interacting with multiple users and systems at once.
3838

3939
Databases are dynamic, with data continuously updated by both users and systems. Even in the face of disruptions like power outages, errors, or cyberattacks, the DBMS ensures that the system recovers quickly and returns to a stable state. For users, the database should function seamlessly, allowing actions to be performed without interference from others working on the system simultaneously.
4040

41-
# Data Models for Databases
42-
43-
Databases have evolved through various data models over the decades. As Guy Harrison outlines in his 2015 book, *Next Generation Databases* [@10.1007/978-1-4842-1329-2], the database industry has experienced three major revolutions:
44-
45-
1. **Pre-relational (1950-1972)**
46-
2. **Relational (1972-2005)**
47-
3. **The Next Generation (2005-future)**
48-
49-
The relational data model has had a profound impact, shaping the last two revolutions in database technology.
50-
Initially, the industry embraced the relational model, which offered a structured, standardized way to organize and query data.
51-
However, as data needs evolved, the limitations of the relational model prompted the rise of alternative approaches, leading to the NoSQL revolution in the early 2000s.
52-
53-
## The NoSQL Revolution
54-
55-
The NOSQL movement emerged in response to several key challenges:
56-
57-
- **Scalability:** The need to scale databases beyond the capabilities of existing relational database management systems (RDBMS) at the time.
58-
- **Diverse Data Structures:** The necessity to represent data structures that are difficult to express in relational terms, such as graphs, JSON documents, and data streams.
59-
- **Simplicity:** The demand for simpler data models where the complexity of relational databases was unnecessary, such as key-value stores.
60-
61-
This revolution led to an explosion of new database architectures, each tailored to specific use cases that traditional relational databases struggled to address.
62-
63-
## Evolution of Relational Databases
64-
65-
Despite the rise of NoSQL, relational databases have not remained static.
66-
They have evolved to incorporate new capabilities for scalability and versatility.
67-
Modern relational database management systems (RDBMS) are now highly adaptable, accommodating diverse data models and handling a wide range of data management tasks.
68-
In many cases, they can replace a variety of specialized software systems, simplifying system design.
69-
70-
An example of this adaptability is the growing trend of using relational databases to streamline system architectures, as highlighted in articles like [“Just Use Postgres for Everything”](https://www.amazingcto.com/postgres-for-everything/).
71-
72-
:::{iframe} https://www.youtube.com/embed/lYsQ_riVC4Y
73-
:width: 100%
74-
Just use Postgres for everything
75-
:::
76-
77-
## Scalable Architectures in Relational Databases
78-
79-
To meet the growing demand for scalable architectures, relational databases have evolved to incorporate distributed systems. These systems use consensus algorithms, such as Paxos and [Raft](https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro), to ensure data consistency across globally distributed, high-performance databases. Notable examples of these advanced systems include Google Spanner [@10.1145/3035918.3056103] and CockroachDB [@10.1145/3318464.3386134].
80-
81-
Since its meteoric rise between 2008 and 2015, the term "NoSQL" has gradually fallen out of favor, as it no longer effectively describes the diverse landscape of modern databases. Today, we operate in a world with multiple data models, where the relational model remains dominant due to its mathematical rigor and versatility. However, it now coexists with more specialized and simpler models that cater to specific use cases.
82-
83-
## Current Landscape of Database Models
84-
85-
The website [DB-Engines Ranking](https://db-engines.com/en/ranking) tracks the popularity of various database management systems. While the relational data model continues to dominate, many popular databases now support multiple data models, allowing for deviations from strict relational structures.
86-
87-
Notably, the two most popular open-source relational databases, MySQL (along with its sister MariaDB) and PostgreSQL, remain at the forefront of this evolving landscape.
88-
8941
## Preview: DataJoint and This Book
9042

9143
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
9244

93-
We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.
45+
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability while maintaining the core principles that make them reliable and powerful.
46+
47+
DataJoint extends this proven foundation with workflow-aware capabilities that scientific computing requires. We'll first introduce relational database concepts and operations, then show how DataJoint transforms these concepts into a powerful tool for scientific computing. By the end, you'll understand both the mathematical foundations and their practical application to your research.
48+
49+
The next chapters explore what data models are, why the relational model is particularly well-suited for scientific work, and how DataJoint builds on relational theory to support the computational workflows central to modern research.

0 commit comments

Comments
 (0)