docs(dev): add database developer guide

EstrellaXD · claude · EstrellaXD · commit 1f5d92f50b29 · 2026-01-26T14:51:50.000+01:00
Comprehensive documentation covering:
- Database architecture and components
- Model schemas (Bangumi, RSSItem, Torrent, User)
- Common CRUD operations for each sub-database
- Caching strategy and invalidation
- Migration system and how to add new migrations
- Performance patterns (batch queries, regex matching, indexes)
- Testing setup with factories
- Common issues and solutions

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/dev/database.md b/docs/dev/database.md
@@ -0,0 +1,394 @@
+# Database Developer Guide
+
+This guide covers the database architecture, models, and operations in AutoBangumi.
+
+## Overview
+
+AutoBangumi uses **SQLite** as its database with **SQLModel** (Pydantic + SQLAlchemy hybrid) for ORM. The database file is located at `data/data.db`.
+
+### Architecture
+
+```
+module/database/
+├── engine.py       # SQLAlchemy engine configuration
+├── combine.py      # Database class, migrations, session management
+├── bangumi.py      # Bangumi (anime subscription) operations
+├── rss.py          # RSS feed operations
+├── torrent.py      # Torrent tracking operations
+└── user.py         # User authentication operations
+```
+
+## Core Components
+
+### Database Class
+
+The `Database` class in `combine.py` is the main entry point. It inherits from SQLModel's `Session` and provides access to all sub-databases:
+
+```python
+from module.database import Database
+
+with Database() as db:
+    # Access sub-databases
+    bangumis = db.bangumi.search_all()
+    rss_items = db.rss.search_active()
+    torrents = db.torrent.search_all()
+```
+
+### Sub-Database Classes
+
+| Class | Model | Purpose |
+|-------|-------|---------|
+| `BangumiDatabase` | `Bangumi` | Anime subscription rules |
+| `RSSDatabase` | `RSSItem` | RSS feed sources |
+| `TorrentDatabase` | `Torrent` | Downloaded torrent tracking |
+| `UserDatabase` | `User` | Authentication |
+
+## Models
+
+### Bangumi Model
+
+Core model for anime subscriptions:
+
+```python
+class Bangumi(SQLModel, table=True):
+    id: int                          # Primary key
+    official_title: str              # Display name (e.g., "Mushoku Tensei")
+    title_raw: str                   # Raw title for torrent matching (indexed)
+    season: int = 1                  # Season number
+    episode_offset: int = 0          # Episode numbering adjustment
+    season_offset: int = 0           # Season numbering adjustment
+    rss_link: str                    # Comma-separated RSS feed URLs
+    filter: str                      # Exclusion filter (e.g., "720,\\d+-\\d+")
+    poster_link: str                 # TMDB poster URL
+    save_path: str                   # Download destination path
+    rule_name: str                   # qBittorrent RSS rule name
+    added: bool = False              # Whether rule is added to downloader
+    deleted: bool = False            # Soft delete flag (indexed)
+    archived: bool = False           # For completed series (indexed)
+    needs_review: bool = False       # Offset mismatch detected
+    needs_review_reason: str         # Reason for review
+    suggested_season_offset: int     # Suggested season offset
+    suggested_episode_offset: int    # Suggested episode offset
+    air_weekday: int                 # Airing day (0=Sunday, 6=Saturday)
+```
+
+### RSSItem Model
+
+RSS feed subscriptions:
+
+```python
+class RSSItem(SQLModel, table=True):
+    id: int                          # Primary key
+    name: str                        # Display name
+    url: str                         # Feed URL (unique, indexed)
+    aggregate: bool = True           # Whether to parse torrents
+    parser: str = "mikan"            # Parser type: mikan, dmhy, nyaa
+    enabled: bool = True             # Active flag
+    connection_status: str           # "healthy" or "error"
+    last_checked_at: str             # ISO timestamp
+    last_error: str                  # Last error message
+```
+
+### Torrent Model
+
+Tracks downloaded torrents:
+
+```python
+class Torrent(SQLModel, table=True):
+    id: int                          # Primary key
+    name: str                        # Torrent name (indexed)
+    url: str                         # Torrent/magnet URL (unique, indexed)
+    rss_id: int                      # Source RSS feed ID
+    bangumi_id: int                  # Linked Bangumi ID (nullable)
+    qb_hash: str                     # qBittorrent info hash (indexed)
+    downloaded: bool = False         # Download completed
+```
+
+## Common Operations
+
+### BangumiDatabase
+
+```python
+with Database() as db:
+    # Create
+    db.bangumi.add(bangumi)              # Single insert
+    db.bangumi.add_all(bangumi_list)     # Batch insert (deduplicates)
+
+    # Read
+    db.bangumi.search_all()              # All records (cached, 5min TTL)
+    db.bangumi.search_id(123)            # By ID
+    db.bangumi.match_torrent("torrent name")  # Find by title_raw match
+    db.bangumi.not_complete()            # Incomplete series
+    db.bangumi.get_needs_review()        # Flagged for review
+
+    # Update
+    db.bangumi.update(bangumi)           # Update single record
+    db.bangumi.update_all(bangumi_list)  # Batch update
+
+    # Delete
+    db.bangumi.delete_one(123)           # Hard delete
+    db.bangumi.disable_rule(123)         # Soft delete (deleted=True)
+```
+
+### RSSDatabase
+
+```python
+with Database() as db:
+    # Create
+    db.rss.add(rss_item)                 # Single insert
+    db.rss.add_all(rss_items)            # Batch insert (deduplicates)
+
+    # Read
+    db.rss.search_all()                  # All feeds
+    db.rss.search_active()               # Enabled feeds only
+    db.rss.search_aggregate()            # Enabled + aggregate=True
+
+    # Update
+    db.rss.update(id, rss_update)        # Partial update
+    db.rss.enable(id)                    # Enable feed
+    db.rss.disable(id)                   # Disable feed
+    db.rss.enable_batch([1, 2, 3])       # Batch enable
+    db.rss.disable_batch([1, 2, 3])      # Batch disable
+```
+
+### TorrentDatabase
+
+```python
+with Database() as db:
+    # Create
+    db.torrent.add(torrent)              # Single insert
+    db.torrent.add_all(torrents)         # Batch insert
+
+    # Read
+    db.torrent.search_all()              # All torrents
+    db.torrent.search_by_qb_hash(hash)   # By qBittorrent hash
+    db.torrent.search_by_url(url)        # By URL
+    db.torrent.check_new(torrents)       # Filter out existing
+
+    # Update
+    db.torrent.update_qb_hash(id, hash)  # Set qb_hash
+```
+
+## Caching
+
+### Bangumi Cache
+
+`search_all()` results are cached at the module level with a 5-minute TTL:
+
+```python
+# Module-level cache in bangumi.py
+_bangumi_cache: list[Bangumi] | None = None
+_bangumi_cache_time: float = 0
+_BANGUMI_CACHE_TTL: float = 300.0  # 5 minutes
+
+# Cache invalidation
+def _invalidate_bangumi_cache():
+    global _bangumi_cache, _bangumi_cache_time
+    _bangumi_cache = None
+    _bangumi_cache_time = 0
+```
+
+**Important:** The cache is automatically invalidated on:
+- `add()`, `add_all()`
+- `update()`, `update_all()`
+- `delete_one()`, `delete_all()`
+- `archive_one()`, `unarchive_one()`
+- Any RSS link update operations
+
+### Session Expunge
+
+Cached objects are **expunged** from the session to prevent `DetachedInstanceError`:
+
+```python
+for b in bangumis:
+    self.session.expunge(b)  # Detach from session
+```
+
+## Migration System
+
+### Schema Versioning
+
+Migrations are tracked via a `schema_version` table:
+
+```python
+CURRENT_SCHEMA_VERSION = 7
+
+# Each migration: (version, description, [SQL statements])
+MIGRATIONS = [
+    (1, "add air_weekday column", [...]),
+    (2, "add connection status columns", [...]),
+    (3, "create passkey table", [...]),
+    (4, "add archived column", [...]),
+    (5, "rename offset to episode_offset", [...]),
+    (6, "add qb_hash column", [...]),
+    (7, "add suggested offset columns", [...]),
+]
+```
+
+### Adding a New Migration
+
+1. Increment `CURRENT_SCHEMA_VERSION` in `combine.py`
+2. Add migration tuple to `MIGRATIONS` list:
+
+```python
+MIGRATIONS = [
+    # ... existing migrations ...
+    (
+        8,
+        "add my_new_column to bangumi",
+        [
+            "ALTER TABLE bangumi ADD COLUMN my_new_column TEXT DEFAULT NULL",
+        ],
+    ),
+]
+```
+
+3. Add idempotency check in `run_migrations()`:
+
+```python
+if "bangumi" in tables and version == 8:
+    columns = [col["name"] for col in inspector.get_columns("bangumi")]
+    if "my_new_column" in columns:
+        needs_run = False
+```
+
+4. Update the corresponding Pydantic model in `module/models/`
+
+### Default Value Backfill
+
+After migrations, `_fill_null_with_defaults()` automatically fills NULL values based on model defaults:
+
+```python
+# If model defines:
+class Bangumi(SQLModel, table=True):
+    my_field: bool = False
+
+# Then existing rows with NULL will be updated to False
+```
+
+## Performance Patterns
+
+### Batch Queries
+
+`add_all()` uses a single query to check for duplicates instead of N queries:
+
+```python
+# Efficient: single SELECT
+keys_to_check = [(d.title_raw, d.group_name) for d in datas]
+conditions = [
+    and_(Bangumi.title_raw == tr, Bangumi.group_name == gn)
+    for tr, gn in keys_to_check
+]
+statement = select(Bangumi.title_raw, Bangumi.group_name).where(or_(*conditions))
+```
+
+### Regex Matching
+
+`match_list()` compiles a single regex pattern for all title matches:
+
+```python
+# Compile once, match many
+sorted_titles = sorted(title_index.keys(), key=len, reverse=True)
+pattern = "|".join(re.escape(title) for title in sorted_titles)
+title_regex = re.compile(pattern)
+
+# O(1) lookup per torrent instead of O(n)
+for torrent in torrent_list:
+    match = title_regex.search(torrent.name)
+```
+
+### Indexed Columns
+
+The following columns have indexes for fast lookups:
+
+| Table | Column | Index Type |
+|-------|--------|------------|
+| `bangumi` | `title_raw` | Regular |
+| `bangumi` | `deleted` | Regular |
+| `bangumi` | `archived` | Regular |
+| `rssitem` | `url` | Unique |
+| `torrent` | `name` | Regular |
+| `torrent` | `url` | Unique |
+| `torrent` | `qb_hash` | Regular |
+
+## Testing
+
+### Test Database Setup
+
+Tests use an in-memory SQLite database:
+
+```python
+# conftest.py
+@pytest.fixture
+def db_engine():
+    engine = create_engine("sqlite:///:memory:")
+    SQLModel.metadata.create_all(engine)
+    yield engine
+    engine.dispose()
+
+@pytest.fixture
+def db_session(db_engine):
+    with Session(db_engine) as session:
+        yield session
+```
+
+### Factory Functions
+
+Use factory functions for creating test data:
+
+```python
+from test.factories import make_bangumi, make_torrent, make_rss_item
+
+def test_bangumi_search():
+    bangumi = make_bangumi(title_raw="Test Title", season=2)
+    # ... test logic
+```
+
+## Design Notes
+
+### No Foreign Keys
+
+SQLite foreign key enforcement is disabled by default. Relationships (like `Torrent.bangumi_id`) are managed in application logic rather than database constraints.
+
+### Soft Deletes
+
+The `Bangumi.deleted` flag enables soft deletes. Queries should filter by `deleted=False` for user-facing data:
+
+```python
+statement = select(Bangumi).where(Bangumi.deleted == false())
+```
+
+### Torrent Tagging
+
+Torrents are tagged in qBittorrent with `ab:{bangumi_id}` for offset lookup during rename operations. This enables fast bangumi identification without database queries.
+
+## Common Issues
+
+### DetachedInstanceError
+
+If you access cached objects from a different session:
+
+```python
+# Wrong: accessing cached object in new session
+bangumis = db.bangumi.search_all()  # Cached
+with Database() as new_db:
+    new_db.session.add(bangumis[0])  # Error!
+
+# Right: objects are expunged, work independently
+bangumis = db.bangumi.search_all()
+bangumis[0].title_raw = "New Title"  # OK, but won't persist
+```
+
+### Cache Staleness
+
+If manual SQL updates bypass the ORM, invalidate the cache:
+
+```python
+from module.database.bangumi import _invalidate_bangumi_cache
+
+with engine.connect() as conn:
+    conn.execute(text("UPDATE bangumi SET ..."))
+    conn.commit()
+
+_invalidate_bangumi_cache()  # Important!
+```