Skip to content

Commit 1f5d92f

Browse files
EstrellaXDclaude
andcommitted
docs(dev): add database developer guide
Comprehensive documentation covering: - Database architecture and components - Model schemas (Bangumi, RSSItem, Torrent, User) - Common CRUD operations for each sub-database - Caching strategy and invalidation - Migration system and how to add new migrations - Performance patterns (batch queries, regex matching, indexes) - Testing setup with factories - Common issues and solutions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 0c8ebb7 commit 1f5d92f

1 file changed

Lines changed: 394 additions & 0 deletions

File tree

docs/dev/database.md

Lines changed: 394 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,394 @@
1+
# Database Developer Guide
2+
3+
This guide covers the database architecture, models, and operations in AutoBangumi.
4+
5+
## Overview
6+
7+
AutoBangumi uses **SQLite** as its database with **SQLModel** (Pydantic + SQLAlchemy hybrid) for ORM. The database file is located at `data/data.db`.
8+
9+
### Architecture
10+
11+
```
12+
module/database/
13+
├── engine.py # SQLAlchemy engine configuration
14+
├── combine.py # Database class, migrations, session management
15+
├── bangumi.py # Bangumi (anime subscription) operations
16+
├── rss.py # RSS feed operations
17+
├── torrent.py # Torrent tracking operations
18+
└── user.py # User authentication operations
19+
```
20+
21+
## Core Components
22+
23+
### Database Class
24+
25+
The `Database` class in `combine.py` is the main entry point. It inherits from SQLModel's `Session` and provides access to all sub-databases:
26+
27+
```python
28+
from module.database import Database
29+
30+
with Database() as db:
31+
# Access sub-databases
32+
bangumis = db.bangumi.search_all()
33+
rss_items = db.rss.search_active()
34+
torrents = db.torrent.search_all()
35+
```
36+
37+
### Sub-Database Classes
38+
39+
| Class | Model | Purpose |
40+
|-------|-------|---------|
41+
| `BangumiDatabase` | `Bangumi` | Anime subscription rules |
42+
| `RSSDatabase` | `RSSItem` | RSS feed sources |
43+
| `TorrentDatabase` | `Torrent` | Downloaded torrent tracking |
44+
| `UserDatabase` | `User` | Authentication |
45+
46+
## Models
47+
48+
### Bangumi Model
49+
50+
Core model for anime subscriptions:
51+
52+
```python
53+
class Bangumi(SQLModel, table=True):
54+
id: int # Primary key
55+
official_title: str # Display name (e.g., "Mushoku Tensei")
56+
title_raw: str # Raw title for torrent matching (indexed)
57+
season: int = 1 # Season number
58+
episode_offset: int = 0 # Episode numbering adjustment
59+
season_offset: int = 0 # Season numbering adjustment
60+
rss_link: str # Comma-separated RSS feed URLs
61+
filter: str # Exclusion filter (e.g., "720,\\d+-\\d+")
62+
poster_link: str # TMDB poster URL
63+
save_path: str # Download destination path
64+
rule_name: str # qBittorrent RSS rule name
65+
added: bool = False # Whether rule is added to downloader
66+
deleted: bool = False # Soft delete flag (indexed)
67+
archived: bool = False # For completed series (indexed)
68+
needs_review: bool = False # Offset mismatch detected
69+
needs_review_reason: str # Reason for review
70+
suggested_season_offset: int # Suggested season offset
71+
suggested_episode_offset: int # Suggested episode offset
72+
air_weekday: int # Airing day (0=Sunday, 6=Saturday)
73+
```
74+
75+
### RSSItem Model
76+
77+
RSS feed subscriptions:
78+
79+
```python
80+
class RSSItem(SQLModel, table=True):
81+
id: int # Primary key
82+
name: str # Display name
83+
url: str # Feed URL (unique, indexed)
84+
aggregate: bool = True # Whether to parse torrents
85+
parser: str = "mikan" # Parser type: mikan, dmhy, nyaa
86+
enabled: bool = True # Active flag
87+
connection_status: str # "healthy" or "error"
88+
last_checked_at: str # ISO timestamp
89+
last_error: str # Last error message
90+
```
91+
92+
### Torrent Model
93+
94+
Tracks downloaded torrents:
95+
96+
```python
97+
class Torrent(SQLModel, table=True):
98+
id: int # Primary key
99+
name: str # Torrent name (indexed)
100+
url: str # Torrent/magnet URL (unique, indexed)
101+
rss_id: int # Source RSS feed ID
102+
bangumi_id: int # Linked Bangumi ID (nullable)
103+
qb_hash: str # qBittorrent info hash (indexed)
104+
downloaded: bool = False # Download completed
105+
```
106+
107+
## Common Operations
108+
109+
### BangumiDatabase
110+
111+
```python
112+
with Database() as db:
113+
# Create
114+
db.bangumi.add(bangumi) # Single insert
115+
db.bangumi.add_all(bangumi_list) # Batch insert (deduplicates)
116+
117+
# Read
118+
db.bangumi.search_all() # All records (cached, 5min TTL)
119+
db.bangumi.search_id(123) # By ID
120+
db.bangumi.match_torrent("torrent name") # Find by title_raw match
121+
db.bangumi.not_complete() # Incomplete series
122+
db.bangumi.get_needs_review() # Flagged for review
123+
124+
# Update
125+
db.bangumi.update(bangumi) # Update single record
126+
db.bangumi.update_all(bangumi_list) # Batch update
127+
128+
# Delete
129+
db.bangumi.delete_one(123) # Hard delete
130+
db.bangumi.disable_rule(123) # Soft delete (deleted=True)
131+
```
132+
133+
### RSSDatabase
134+
135+
```python
136+
with Database() as db:
137+
# Create
138+
db.rss.add(rss_item) # Single insert
139+
db.rss.add_all(rss_items) # Batch insert (deduplicates)
140+
141+
# Read
142+
db.rss.search_all() # All feeds
143+
db.rss.search_active() # Enabled feeds only
144+
db.rss.search_aggregate() # Enabled + aggregate=True
145+
146+
# Update
147+
db.rss.update(id, rss_update) # Partial update
148+
db.rss.enable(id) # Enable feed
149+
db.rss.disable(id) # Disable feed
150+
db.rss.enable_batch([1, 2, 3]) # Batch enable
151+
db.rss.disable_batch([1, 2, 3]) # Batch disable
152+
```
153+
154+
### TorrentDatabase
155+
156+
```python
157+
with Database() as db:
158+
# Create
159+
db.torrent.add(torrent) # Single insert
160+
db.torrent.add_all(torrents) # Batch insert
161+
162+
# Read
163+
db.torrent.search_all() # All torrents
164+
db.torrent.search_by_qb_hash(hash) # By qBittorrent hash
165+
db.torrent.search_by_url(url) # By URL
166+
db.torrent.check_new(torrents) # Filter out existing
167+
168+
# Update
169+
db.torrent.update_qb_hash(id, hash) # Set qb_hash
170+
```
171+
172+
## Caching
173+
174+
### Bangumi Cache
175+
176+
`search_all()` results are cached at the module level with a 5-minute TTL:
177+
178+
```python
179+
# Module-level cache in bangumi.py
180+
_bangumi_cache: list[Bangumi] | None = None
181+
_bangumi_cache_time: float = 0
182+
_BANGUMI_CACHE_TTL: float = 300.0 # 5 minutes
183+
184+
# Cache invalidation
185+
def _invalidate_bangumi_cache():
186+
global _bangumi_cache, _bangumi_cache_time
187+
_bangumi_cache = None
188+
_bangumi_cache_time = 0
189+
```
190+
191+
**Important:** The cache is automatically invalidated on:
192+
- `add()`, `add_all()`
193+
- `update()`, `update_all()`
194+
- `delete_one()`, `delete_all()`
195+
- `archive_one()`, `unarchive_one()`
196+
- Any RSS link update operations
197+
198+
### Session Expunge
199+
200+
Cached objects are **expunged** from the session to prevent `DetachedInstanceError`:
201+
202+
```python
203+
for b in bangumis:
204+
self.session.expunge(b) # Detach from session
205+
```
206+
207+
## Migration System
208+
209+
### Schema Versioning
210+
211+
Migrations are tracked via a `schema_version` table:
212+
213+
```python
214+
CURRENT_SCHEMA_VERSION = 7
215+
216+
# Each migration: (version, description, [SQL statements])
217+
MIGRATIONS = [
218+
(1, "add air_weekday column", [...]),
219+
(2, "add connection status columns", [...]),
220+
(3, "create passkey table", [...]),
221+
(4, "add archived column", [...]),
222+
(5, "rename offset to episode_offset", [...]),
223+
(6, "add qb_hash column", [...]),
224+
(7, "add suggested offset columns", [...]),
225+
]
226+
```
227+
228+
### Adding a New Migration
229+
230+
1. Increment `CURRENT_SCHEMA_VERSION` in `combine.py`
231+
2. Add migration tuple to `MIGRATIONS` list:
232+
233+
```python
234+
MIGRATIONS = [
235+
# ... existing migrations ...
236+
(
237+
8,
238+
"add my_new_column to bangumi",
239+
[
240+
"ALTER TABLE bangumi ADD COLUMN my_new_column TEXT DEFAULT NULL",
241+
],
242+
),
243+
]
244+
```
245+
246+
3. Add idempotency check in `run_migrations()`:
247+
248+
```python
249+
if "bangumi" in tables and version == 8:
250+
columns = [col["name"] for col in inspector.get_columns("bangumi")]
251+
if "my_new_column" in columns:
252+
needs_run = False
253+
```
254+
255+
4. Update the corresponding Pydantic model in `module/models/`
256+
257+
### Default Value Backfill
258+
259+
After migrations, `_fill_null_with_defaults()` automatically fills NULL values based on model defaults:
260+
261+
```python
262+
# If model defines:
263+
class Bangumi(SQLModel, table=True):
264+
my_field: bool = False
265+
266+
# Then existing rows with NULL will be updated to False
267+
```
268+
269+
## Performance Patterns
270+
271+
### Batch Queries
272+
273+
`add_all()` uses a single query to check for duplicates instead of N queries:
274+
275+
```python
276+
# Efficient: single SELECT
277+
keys_to_check = [(d.title_raw, d.group_name) for d in datas]
278+
conditions = [
279+
and_(Bangumi.title_raw == tr, Bangumi.group_name == gn)
280+
for tr, gn in keys_to_check
281+
]
282+
statement = select(Bangumi.title_raw, Bangumi.group_name).where(or_(*conditions))
283+
```
284+
285+
### Regex Matching
286+
287+
`match_list()` compiles a single regex pattern for all title matches:
288+
289+
```python
290+
# Compile once, match many
291+
sorted_titles = sorted(title_index.keys(), key=len, reverse=True)
292+
pattern = "|".join(re.escape(title) for title in sorted_titles)
293+
title_regex = re.compile(pattern)
294+
295+
# O(1) lookup per torrent instead of O(n)
296+
for torrent in torrent_list:
297+
match = title_regex.search(torrent.name)
298+
```
299+
300+
### Indexed Columns
301+
302+
The following columns have indexes for fast lookups:
303+
304+
| Table | Column | Index Type |
305+
|-------|--------|------------|
306+
| `bangumi` | `title_raw` | Regular |
307+
| `bangumi` | `deleted` | Regular |
308+
| `bangumi` | `archived` | Regular |
309+
| `rssitem` | `url` | Unique |
310+
| `torrent` | `name` | Regular |
311+
| `torrent` | `url` | Unique |
312+
| `torrent` | `qb_hash` | Regular |
313+
314+
## Testing
315+
316+
### Test Database Setup
317+
318+
Tests use an in-memory SQLite database:
319+
320+
```python
321+
# conftest.py
322+
@pytest.fixture
323+
def db_engine():
324+
engine = create_engine("sqlite:///:memory:")
325+
SQLModel.metadata.create_all(engine)
326+
yield engine
327+
engine.dispose()
328+
329+
@pytest.fixture
330+
def db_session(db_engine):
331+
with Session(db_engine) as session:
332+
yield session
333+
```
334+
335+
### Factory Functions
336+
337+
Use factory functions for creating test data:
338+
339+
```python
340+
from test.factories import make_bangumi, make_torrent, make_rss_item
341+
342+
def test_bangumi_search():
343+
bangumi = make_bangumi(title_raw="Test Title", season=2)
344+
# ... test logic
345+
```
346+
347+
## Design Notes
348+
349+
### No Foreign Keys
350+
351+
SQLite foreign key enforcement is disabled by default. Relationships (like `Torrent.bangumi_id`) are managed in application logic rather than database constraints.
352+
353+
### Soft Deletes
354+
355+
The `Bangumi.deleted` flag enables soft deletes. Queries should filter by `deleted=False` for user-facing data:
356+
357+
```python
358+
statement = select(Bangumi).where(Bangumi.deleted == false())
359+
```
360+
361+
### Torrent Tagging
362+
363+
Torrents are tagged in qBittorrent with `ab:{bangumi_id}` for offset lookup during rename operations. This enables fast bangumi identification without database queries.
364+
365+
## Common Issues
366+
367+
### DetachedInstanceError
368+
369+
If you access cached objects from a different session:
370+
371+
```python
372+
# Wrong: accessing cached object in new session
373+
bangumis = db.bangumi.search_all() # Cached
374+
with Database() as new_db:
375+
new_db.session.add(bangumis[0]) # Error!
376+
377+
# Right: objects are expunged, work independently
378+
bangumis = db.bangumi.search_all()
379+
bangumis[0].title_raw = "New Title" # OK, but won't persist
380+
```
381+
382+
### Cache Staleness
383+
384+
If manual SQL updates bypass the ORM, invalidate the cache:
385+
386+
```python
387+
from module.database.bangumi import _invalidate_bangumi_cache
388+
389+
with engine.connect() as conn:
390+
conn.execute(text("UPDATE bangumi SET ..."))
391+
conn.commit()
392+
393+
_invalidate_bangumi_cache() # Important!
394+
```

0 commit comments

Comments
 (0)