Skip to content

feat: add nhentai#168

Merged
jsonmaki merged 3 commits into
2.10-devfrom
feat-nhentai
May 8, 2026
Merged

feat: add nhentai#168
jsonmaki merged 3 commits into
2.10-devfrom
feat-nhentai

Conversation

@jasoneri
Copy link
Copy Markdown
Owner

@jasoneri jasoneri commented May 8, 2026

Description

refactor: assets flow

Related Issues

Checklist:

  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you linted your code locally prior to submission?
  • Have you successfully ran app with your changes locally?

Summary by Sourcery

为网站集成添加 nhentai 支持,并重构预处理和资源管理。

New Features:

  • 引入 nhentai 提供方、解析器、HTTP 客户端以及用于搜索、预览和下载的 GUI 连接。
  • 为 nhentai 标签和 kemono 作者添加基于 SQLite 的资源处理流水线,并提供用于构建/更新这些数据库的生成器。
  • 在 GUI 流程中支持在预处理完成后排队执行搜索。

Enhancements:

  • 重构站点预处理流程,使其异步化、共享 HTTP 客户端构造逻辑,并集中管理 Hitomi、nhentai 和 Kemono 的发布资源缓存。
  • 更新 kemono 集成逻辑,从结构化的 SQLite 数据库读取数据,而不是临时的 pickle 缓存。
  • 改进异步任务进度上报,加入节流的按字节/百分比的下载进度更新。
  • 调整 hitomi 数据库路径和 CI 工作流,以使用临时资源目录和清单(manifest)生成。

Build:

  • 修改 hitomi 数据库的 CI 工作流,从 __temp 目录而不是 assets 目录生成并上传数据库和清单(manifest)。

CI:

  • 更新 hitomi-db 的 GitHub Actions 工作流路径,使其匹配基于临时目录的新资源位置。
Original summary in English

Summary by Sourcery

Add nhentai support and refactor preprocess and asset management for website integrations.

New Features:

  • Introduce nhentai provider, parser, HTTP client, and GUI wiring for search, preview, and downloads.
  • Add SQLite-backed asset pipelines for nhentai tags and kemono authors, with generators to build/update these databases.
  • Enable queued search execution after preprocess completes in the GUI flow.

Enhancements:

  • Refactor site preprocess flow to be async, share HTTP client construction, and centralize release asset caching for Hitomi, nhentai, and Kemono.
  • Update kemono integration to read from a structured SQLite database instead of ad-hoc pickle caches.
  • Improve async task progress reporting with throttled, byte/percentage-based download progress updates.
  • Adjust hitomi DB paths and CI workflow to work with a temp asset directory and manifest generation.

Build:

  • Change hitomi DB CI workflow to generate and upload DB and manifest from the __temp directory instead of assets.

CI:

  • Update hitomi-db GitHub Actions workflow paths to match the new temp-based asset locations.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 8, 2026

审阅者指南

本次改动引入了一个基于预构建 SQLite 标签数据库的 nhentai provider 和 spider,将预处理(preprocess)流程(Hitomi、nhentai、Kemono)重构为异步、以资源(asset)为驱动的形式,把 Kemono 切换为基于 SQLite 的作者缓存,并调整 GUI/异步基础设施,以支持进度上报、队列搜索以及新的站点索引/资源。

带资源下载与队列搜索的异步预处理时序图

sequenceDiagram
    actor User
    participant GUI_Main as MainWindow
    participant PreMgr as PreprocessManager
    participant AsyncMgr as AsyncTaskManager
    participant Task as AsyncTaskThread
    participant GuiRuntime as GuiSiteRuntime
    participant SitePre as run_site_preprocess
    participant HitomiPre as HitomiDatabasePreprocess
    participant NhentaiPre as NhentaiDatabasePreprocess
    participant AssetCache as ReleaseAssetCache
    participant Probe as PreprocessRuntimeProbe
    participant Reporter as AsyncTaskProgressReporter

    User->>GUI_Main: select site index
    GUI_Main->>PreMgr: handle_choosebox_changed(index, gui_site_runtime)
    PreMgr->>PreMgr: _next_generation()
    PreMgr->>PreMgr: _active_preprocess = (index, generation)
    PreMgr->>AsyncMgr: execute_simple_task(task_func)
    AsyncMgr->>Task: start AsyncTaskThread

    Note over Task: In thread
    Task->>Task: detect progress_callback parameter
    Task->>Reporter: create AsyncTaskProgressReporter(emit_progress)
    Task->>GuiRuntime: preprocess(conf_state, progress_callback=Reporter)
    GuiRuntime->>SitePre: run_site_preprocess(gui_site_runtime, ...)
    SitePre->>Probe: PreprocessRuntimeProbe(gui_site_runtime)
    Probe->>GuiRuntime: access_ready() or manga_copy_cache_hit()

    alt site is HITOMI
        SitePre->>HitomiPre: run()
        HitomiPre->>AssetCache: ensure()
    else site is NHENTAI
        SitePre->>NhentaiPre: run()
        NhentaiPre->>AssetCache: ensure()
    end

    AssetCache->>Reporter: download_start(label, total_bytes)
    loop chunks
        AssetCache->>Reporter: download_advance(chunk_size, label, total_bytes)
        Reporter-->>Task: emit_progress(message)
        Task-->>AsyncMgr: progress_signal(message)
        AsyncMgr-->>GUI_Main: show tooltip update
    end
    AssetCache->>Reporter: download_finish(label)

    SitePre-->>GuiRuntime: PreprocessResult
    GuiRuntime-->>Task: PreprocessResult
    Task-->>AsyncMgr: success_signal(result)
    AsyncMgr-->>PreMgr: _on_preprocess_success(index, generation, result)

    PreMgr->>PreMgr: _dispatch_queued_search(index, generation, ready)
    alt queued search present and ready
        PreMgr->>GUI_Main: safe_single_shot(0, start_and_search(keyword))
    end
    PreMgr->>PreMgr: _clear_active_preprocess(index, generation)

    User->>GUI_Main: click search
    alt preprocess still running
        GUI_Main->>PreMgr: queue_search_after_preprocess(index, keyword)
        PreMgr->>PreMgr: store (generation, index, keyword)
    else no active preprocess
        GUI_Main->>GUI_Main: start_and_search(keyword)
    end
Loading

站点异步预处理与资源缓存的类图

classDiagram
    class GuiSiteRuntime {
        +int site_index
        +ProviderDescriptor provider_descriptor
        +RuntimeContext runtime_context
        +preprocess(conf_state, data_client, progress_callback) PreprocessResult
        +create_thread_site_runtime(preview_client) ThreadSiteRuntime
    }

    class PreprocessRuntimeProbe {
        -GuiSiteRuntime gui_site_runtime
        +PreprocessRuntimeProbe(gui_site_runtime)
        +manga_copy_cache_hit() bool
        +access_ready() bool
        +verified_runtime() ThreadSiteRuntime
    }

    class ReleaseAssetCache {
        +str name
        +Path db_path
        +tuple~str~ download_urls
        +httpx_AsyncClient data_client
        +progress_callback
        +str label
        +int timeout
        +int cache_ttl_hours
        +Cache cache
        +ReleaseAssetCache(name, db_path, download_urls, data_client, progress_callback, label, timeout, cache_ttl_hours)
        +ensure() ReleaseAssetResult
        -_emit_legacy_download_start() void
        -_download() tuple~bool, list~str~~
    }

    class ReleaseAssetResult {
        +bool ready
        +bool cache_hit
        +bool cache_expired
        +Path db_path
        +tuple~str~ errors
    }

    class SiteDatabasePreprocess {
        <<abstract>>
        +str name
        +tuple~str~ download_urls
        +bool data_required
        +str data_ready_action
        +GuiSiteRuntime gui_site_runtime
        +PreprocessRuntimeProbe runtime_probe
        +ReleaseAssetCache asset_cache
        +Path db_path
        +list~dict~ messages
        +list~dict~ actions
        +dict state_flags
        +SiteDatabasePreprocess(gui_site_runtime, data_client, progress_callback)
        +run() PreprocessResult
        +after_data_ready() bool
    }

    class HitomiDatabasePreprocess {
        +str name = "hitomi"
        +tuple~str~ download_urls
        +bool data_required = false
        +str data_ready_action = "add_hitomi_tool"
    }

    class NhentaiDatabasePreprocess {
        +str name = "nhentai"
        +tuple~str~ download_urls
        +after_data_ready() bool
    }

    class KemonoReleaseAsset {
        +KemonoReleaseAsset(data_client, progress_callback)
    }

    class Cache {
        +str cache_f
        +str flag
        +state
        +val
        +with_expiry(expiry_time, write_in) decorator
        +run(func, expiry_time, write_in)
        +_is_expired(cache_path, expiry_time) bool
    }

    class PreprocessResult {
        +bool ready
        +bool block_search
        +bool runtime_ready
        +str domain
        +tuple~dict~ messages
        +tuple~dict~ actions
        +dict state_flags
    }

    GuiSiteRuntime --> PreprocessRuntimeProbe
    PreprocessRuntimeProbe --> ThreadSiteRuntime

    SiteDatabasePreprocess "1" *-- "1" ReleaseAssetCache
    SiteDatabasePreprocess --> PreprocessRuntimeProbe
    SiteDatabasePreprocess --> GuiSiteRuntime
    SiteDatabasePreprocess --> PreprocessResult

    HitomiDatabasePreprocess --|> SiteDatabasePreprocess
    NhentaiDatabasePreprocess --|> SiteDatabasePreprocess

    ReleaseAssetCache --> ReleaseAssetResult
    ReleaseAssetCache --> Cache

    KemonoReleaseAsset --|> ReleaseAssetCache

    Cache <.. ReleaseAssetCache

    class run_site_preprocess {
        +run_site_preprocess(gui_site_runtime, conf_state, data_client, progress_callback) PreprocessResult
    }

    run_site_preprocess --> GuiSiteRuntime
    run_site_preprocess --> HitomiDatabasePreprocess
    run_site_preprocess --> NhentaiDatabasePreprocess
    run_site_preprocess --> PreprocessRuntimeProbe
    run_site_preprocess --> ReleaseAssetCache

    class _preprocess_script {
        +_preprocess_script(data_client, progress_callback) PreprocessResult
    }

    _preprocess_script --> KemonoReleaseAsset
    _preprocess_script --> PreprocessResult
Loading

nhentai provider 栈与标签目录的类图

classDiagram
    class NhentaiBookInfo {
        +str source = "nhentai"
        +str media_id
        +str lang
        +str english_title
        +str japanese_title
        +str pretty_title
        +list pics
        +say str
    }

    class NhentaiTagCatalog {
        +tuple~str~ _tag_types
        +bool loaded
        +Path db_path
        +dict~str, dict~int, str~~ by_type
        +dict~str, set~int~~ valid_ids_by_type
        +__init__()
        +reset() void
        +load(db_path, default_db_path, excluded_language_names) dict~str,int~
        +preload(db_path, default_db_path, excluded_language_names) dict~str,int~
    }

    class NhentaiParser {
        +NhentaiTagCatalog catalog
        +_json_payload(resp_text) dict
        +_required(target, key)
        +_asset_url(asset_path, host) str
        +build_image_url(asset_path) str
        +build_thumbnail_url(asset_path) str
        +_select_title(english_title, japanese_title, pretty_title) str
        +_tag_name_from_ids(tag_ids, tag_type, excluded_names) str
        +parse_search_item(target) NhentaiBookInfo
        +parse_search(resp_text) list~NhentaiBookInfo~
        +_parse_page_assets(pages, media_id) list~dict~
        +parse_book(resp_text) NhentaiBookInfo
        +apply_detail(book, detail) NhentaiBookInfo
        +build_page_image_map(book) dict~int,str~
        +build_page_image_urls(book) list~str~
        +parse_preview_books(resp_text) list~NhentaiBookInfo~
    }

    class NhentaiReqer {
        +cli
        +NhentaiReqer(conf)
        +get_cli(conf, is_async, kwargs) httpx_Client
        +test_index() bool
        +_headers(referer) dict
        +preview_search(keyword, page) list~NhentaiBookInfo~
        +preview_fetch_pages(item) list~str~
    }

    class NhentaiUtils {
        +NhentaiParser parser
        +NhentaiReqer reqer
        +NhentaiTagCatalog catalog
        +str browser_referer_mode
        +NhentaiUtils(conf)
        +reset_tag_catalog() void
        +load_tag_catalog(db_path) dict~str,int~
        +preload_tag_catalog(db_path) dict~str,int~
        +preview_client_config(context) dict
    }

    class NhentaiParseError {
    }

    class _NhentaiContract {
        +set _language_excluded_names
        +str name
        +str proxy_policy
        +str domain
        +str index
        +str api_index
        +str image_host
        +str thumbnail_host
        +str search_url_head
        +tuple turn_page_info
        +dict mappings
        +dict headers
        +dict image_headers
        +dict book_hea
        +set cookies_field
        +uuid_regex
        +str book_url_regex
        +Path tag_db_path
        +str gallery_url_template
        +str gallery_api_url_template
        +build_search_url(keyword, page, sort) str
        +with_referer(referer) dict
    }

    class EroUtils {
    }
    class Cookies {
    }
    class Previewer {
    }
    class Req {
    }

    NhentaiParser --|> _NhentaiContract
    NhentaiReqer --|> _NhentaiContract
    NhentaiUtils --|> _NhentaiContract

    NhentaiUtils --|> EroUtils
    NhentaiUtils --|> Cookies
    NhentaiUtils --|> Previewer

    NhentaiReqer --|> Cookies
    NhentaiReqer --|> Req

    NhentaiParser --> NhentaiTagCatalog
    NhentaiParser --> NhentaiBookInfo
    NhentaiParser --> NhentaiParseError

    NhentaiUtils o--> NhentaiReqer
    NhentaiUtils o--> NhentaiParser
    NhentaiUtils o--> NhentaiTagCatalog

    class NhentaiSpider {
        +str name = "nhentai"
        +dict custom_settings
        +int num_of_row
        +str domain
        +str search_url_head
        +tuple turn_page_info
        +str book_id_url
        +dict mappings
        +ua dict
        +frame_section(response)
    }

    class BaseComicSpider2 {
    }

    NhentaiSpider --|> BaseComicSpider2
    NhentaiSpider --> NhentaiUtils
    NhentaiSpider --> NhentaiParser
Loading

Kemono SQLite 作者缓存类图

classDiagram
    class KemonoAuthor {
        +str id
        +str name
        +str service
        +int updated
        +int favorited
        +avatar str
        +to_payload() dict~str,str|int~
    }

    class KemonoAuthorsDb {
        +Path db_path
        +KemonoAuthorsDb(db_path)
        +ensure_schema() void
        +replace_from_creators(creators) int
        +load_all() dict~str,KemonoAuthor~
    }

    class build_kemono_db_from_creators_bytes {
        +build_kemono_db_from_creators_bytes(db_path, payload) int
    }

    class load_kemono_authors {
        +load_kemono_authors(db_path) dict~str,KemonoAuthor~
    }

    KemonoAuthorsDb --> KemonoAuthor

    build_kemono_db_from_creators_bytes --> KemonoAuthorsDb
    load_kemono_authors --> KemonoAuthorsDb

    class KemonoCreator {
        +Path db_path
        +by_creatorid(order_creatorids)
    }

    KemonoCreator --> load_kemono_authors

    class KemonoTableViewController {
        +_set_kemono_table()
    }

    KemonoTableViewController --> load_kemono_authors
Loading

文件级变更

变更 详情 文件
将站点/脚本预处理流水线重构为异步、基于资源缓存,并在 Hitomi、nhentai 和 Kemono 之间共享。
  • 用异步版本的 run_site_preprocess 替换原实现,其基于 GuiSiteRuntime 运行,使用 httpx.AsyncClient,并在其中集中处理各站点分支,包括新的 Spider.NHENTAI
  • 引入 ReleaseAssetCacheSiteDatabasePreprocessHitomiDatabasePreprocessNhentaiDatabasePreprocessKemonoReleaseAssetPreprocessRuntimeProbe,统一管理 HTTP 下载、缓存 TTL、__temp 下的 DB 路径以及运行时访问检查。
  • 将脚本预处理改为使用 KemonoReleaseAsset,移除旧的 _check_kemono_data creators.txt 流程;脚本服务/依赖检查仍然保留,但现在依赖异步客户端。
  • GuiSiteRuntime.preprocess 改为异步;PreprocessManager 对脚本(索引 7)直接调用 run_script_preprocess,而其他站点委托给 GuiSiteRuntime.preprocess
  • 扩展核心 Cache,以支持 DB 文件缓存、更丰富的过期处理,并对外暴露状态(已验证/已过期/新建)。
utils/website/preprocess.py
GUI/manager/preprocess.py
utils/website/site_runtime.py
utils/website/core/__init__.py
在后端工具、spider 和元数据模型中增加 nhentai 支持,包括一个由 SQLite 资源支撑的标签目录。
  • 引入具备 nhentai 特定字段和 say 格式化的 NhentaiBookInfo
  • 实现 NhentaiTagCatalogNhentaiParserNhentaiReqerNhentaiUtils,通过 nhentai API v2 和 tags SQLite 数据库驱动搜索、详情解析和预览流程。
  • 添加 NhentaiSpider Scrapy spider,与 NhentaiUtils 的 mappings 和 parser 对接,通过解析到的作品详情构建 image map(frame_section)。
  • NhentaiUtils 接入 provider_mapSpider 枚举,包括 specials/cn_proxy 成员关系、状态提示,以及主窗口 chooseBox 的条目文本。
  • 新增 nhentai 标签资源生成脚本(gen_assets.py),用于抓取 nhentai 标签元数据到 __temp/nhentai.db,并确保预处理通过 NhentaiUtils.preload_tag_catalog 加载/预热标签目录。
utils/website/info.py
utils/website/nhentai/__init__.py
utils/website/nhentai/gen_assets.py
utils/website/ins.py
variables/__init__.py
GUI/mainwindow.py
ComicSpider/spiders/nhentai.py
将 Hitomi DB 与工具迁移为使用由 CI 生成的可下载/缓存资源,存放于 __temp
  • HitomiDatabasePreprocess(基于 ReleaseAssetCache 和异步客户端)替换同步的 _preprocess_hitomi_download_hitomi_db;将 hitomi 数据标记为非必需,但在可用时启用 add_hitomi_tool
  • 调整 hitomi 数据集抓取脚本,将输出写入 __temp/hitomi.db,并更新 GitHub workflow,使其从 __temp 而非 assets/ 生成/上传 hitomi.dbhitomi-manifest.json
  • 更新 GUI 中的 hitomi_tool,通过共享的 hitomi_db_path 变量引用 __temp/hitomi.db,不再使用 assets/hitomi.db
utils/website/preprocess.py
utils/website/hitomi/scape_dataset.py
.github/workflows/hitomi-db.yml
GUI/tools/hitomi_tool.py
用基于 SQLite 的 kemono.db 资源替换 Kemono 的 creators.pkl 缓存,并将其接入 GUI/脚本流程及生成脚本。
  • utils.website.kemono.db 中引入 KemonoAuthor 数据类和 KemonoAuthorsDb 工具,用于从 JSON creators payload 填充/加载作者数据。
  • 在运行时(utils.script.image.kemono.Creator.by_creatorid)以及 GUI 表格视图中,用 kemono.db 替代旧的 pickled KemonoAuthorcreators.txt/基于 Motrix 的下载流程。
  • 添加 KemonoReleaseAsset 及相关预处理逻辑,使脚本预处理能够通过 AsyncClient 下载 kemono.db,并在 PreprocessResult 中存储缓存状态/错误;同时移除 _check_kemono_data
  • 提供独立的异步生成脚本 utils.website.kemono.gen_assets,用于从 kemono.cr API 构建 kemono.db,以供 CI 或手动运行。
utils/website/kemono/db.py
utils/website/kemono/gen_assets.py
utils/script/image/kemono.py
GUI/script/kemono/__init__.py
utils/website/preprocess.py
增强异步任务基础设施与预览流水线,以支持更丰富的进度上报、队列搜索及若干小清理。
  • 添加 AsyncTaskProgressReporter,用于节流并格式化下载进度(字节/百分比),在支持时通过 AsyncTaskThread 注入为 progress_callback
  • 修改 PreprocessManager 以跟踪 _active_preprocess_queued_search,并在预处理成功完成后可选地排队执行搜索(使用 safe_single_shot);同时简化数据客户端管理(不再使用全局 httpx.Client)。
  • 暴露预览线程辅助方法 _do_search_do_fetch_episodes 并委托给 thread_site_runtime 方法,同时为 PreviewManager 添加 replace_gui_site_runtime,在运行时被替换时重建 worker。
  • 进行少量格式/清理改动(构造函数单行写法、移除旧 hitomi provider 文件等)。
GUI/manager/async_task.py
GUI/manager/preprocess.py
GUI/manager/preview/__init__.py
GUI/thread/preview.py
utils/website/providers/hitomi.py

技巧与命令

与 Sourcery 交互

  • 触发新一轮代码审查: 在 pull request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 在某条审查评论下回复,请 Sourcery 从这条评论生成一个 issue。你也可以直接回复 @sourcery-ai issue,从该评论创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。也可以在 pull request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文中任意位置写上 @sourcery-ai summary,即可在对应位置生成 PR 摘要。也可以在 pull request 中评论 @sourcery-ai summary 来(重新)生成摘要。
  • 生成审阅者指南: 在 pull request 中评论 @sourcery-ai guide,即可(重新)生成审阅者指南。
  • 一次性解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,将所有 Sourcery 评论标记为已解决。如果你已经处理完所有评论且不想再看到它们,这会很有用。
  • 一次性忽略所有 Sourcery 审查: 在 pull request 中评论 @sourcery-ai dismiss,忽略所有现有的 Sourcery 审查。特别适合希望从头开始新一轮审查的场景——别忘了随后评论 @sourcery-ai review 来触发新的审查!

自定义使用体验

打开你的 控制面板 可以:

  • 启用或禁用诸如 Sourcery 生成的 pull request 摘要、审阅者指南等审查功能。
  • 修改审查语言。
  • 添加、删除或编辑自定义审查说明。
  • 调整其它审查相关设置。

获取帮助

Original review guide in English

Reviewer's Guide

Introduces an nhentai provider and spider backed by a prebuilt SQLite tag database, refactors preprocess flows to be async and asset-driven (Hitomi, nhentai, Kemono), switches Kemono to a SQLite-backed author cache, and adjusts GUI/async infrastructure to support progress reporting, queued searches, and new site indices/assets.

Sequence diagram for async preprocess with asset download and queued search

sequenceDiagram
    actor User
    participant GUI_Main as MainWindow
    participant PreMgr as PreprocessManager
    participant AsyncMgr as AsyncTaskManager
    participant Task as AsyncTaskThread
    participant GuiRuntime as GuiSiteRuntime
    participant SitePre as run_site_preprocess
    participant HitomiPre as HitomiDatabasePreprocess
    participant NhentaiPre as NhentaiDatabasePreprocess
    participant AssetCache as ReleaseAssetCache
    participant Probe as PreprocessRuntimeProbe
    participant Reporter as AsyncTaskProgressReporter

    User->>GUI_Main: select site index
    GUI_Main->>PreMgr: handle_choosebox_changed(index, gui_site_runtime)
    PreMgr->>PreMgr: _next_generation()
    PreMgr->>PreMgr: _active_preprocess = (index, generation)
    PreMgr->>AsyncMgr: execute_simple_task(task_func)
    AsyncMgr->>Task: start AsyncTaskThread

    Note over Task: In thread
    Task->>Task: detect progress_callback parameter
    Task->>Reporter: create AsyncTaskProgressReporter(emit_progress)
    Task->>GuiRuntime: preprocess(conf_state, progress_callback=Reporter)
    GuiRuntime->>SitePre: run_site_preprocess(gui_site_runtime, ...)
    SitePre->>Probe: PreprocessRuntimeProbe(gui_site_runtime)
    Probe->>GuiRuntime: access_ready() or manga_copy_cache_hit()

    alt site is HITOMI
        SitePre->>HitomiPre: run()
        HitomiPre->>AssetCache: ensure()
    else site is NHENTAI
        SitePre->>NhentaiPre: run()
        NhentaiPre->>AssetCache: ensure()
    end

    AssetCache->>Reporter: download_start(label, total_bytes)
    loop chunks
        AssetCache->>Reporter: download_advance(chunk_size, label, total_bytes)
        Reporter-->>Task: emit_progress(message)
        Task-->>AsyncMgr: progress_signal(message)
        AsyncMgr-->>GUI_Main: show tooltip update
    end
    AssetCache->>Reporter: download_finish(label)

    SitePre-->>GuiRuntime: PreprocessResult
    GuiRuntime-->>Task: PreprocessResult
    Task-->>AsyncMgr: success_signal(result)
    AsyncMgr-->>PreMgr: _on_preprocess_success(index, generation, result)

    PreMgr->>PreMgr: _dispatch_queued_search(index, generation, ready)
    alt queued search present and ready
        PreMgr->>GUI_Main: safe_single_shot(0, start_and_search(keyword))
    end
    PreMgr->>PreMgr: _clear_active_preprocess(index, generation)

    User->>GUI_Main: click search
    alt preprocess still running
        GUI_Main->>PreMgr: queue_search_after_preprocess(index, keyword)
        PreMgr->>PreMgr: store (generation, index, keyword)
    else no active preprocess
        GUI_Main->>GUI_Main: start_and_search(keyword)
    end
Loading

Class diagram for async site preprocess and asset caching

classDiagram
    class GuiSiteRuntime {
        +int site_index
        +ProviderDescriptor provider_descriptor
        +RuntimeContext runtime_context
        +preprocess(conf_state, data_client, progress_callback) PreprocessResult
        +create_thread_site_runtime(preview_client) ThreadSiteRuntime
    }

    class PreprocessRuntimeProbe {
        -GuiSiteRuntime gui_site_runtime
        +PreprocessRuntimeProbe(gui_site_runtime)
        +manga_copy_cache_hit() bool
        +access_ready() bool
        +verified_runtime() ThreadSiteRuntime
    }

    class ReleaseAssetCache {
        +str name
        +Path db_path
        +tuple~str~ download_urls
        +httpx_AsyncClient data_client
        +progress_callback
        +str label
        +int timeout
        +int cache_ttl_hours
        +Cache cache
        +ReleaseAssetCache(name, db_path, download_urls, data_client, progress_callback, label, timeout, cache_ttl_hours)
        +ensure() ReleaseAssetResult
        -_emit_legacy_download_start() void
        -_download() tuple~bool, list~str~~
    }

    class ReleaseAssetResult {
        +bool ready
        +bool cache_hit
        +bool cache_expired
        +Path db_path
        +tuple~str~ errors
    }

    class SiteDatabasePreprocess {
        <<abstract>>
        +str name
        +tuple~str~ download_urls
        +bool data_required
        +str data_ready_action
        +GuiSiteRuntime gui_site_runtime
        +PreprocessRuntimeProbe runtime_probe
        +ReleaseAssetCache asset_cache
        +Path db_path
        +list~dict~ messages
        +list~dict~ actions
        +dict state_flags
        +SiteDatabasePreprocess(gui_site_runtime, data_client, progress_callback)
        +run() PreprocessResult
        +after_data_ready() bool
    }

    class HitomiDatabasePreprocess {
        +str name = "hitomi"
        +tuple~str~ download_urls
        +bool data_required = false
        +str data_ready_action = "add_hitomi_tool"
    }

    class NhentaiDatabasePreprocess {
        +str name = "nhentai"
        +tuple~str~ download_urls
        +after_data_ready() bool
    }

    class KemonoReleaseAsset {
        +KemonoReleaseAsset(data_client, progress_callback)
    }

    class Cache {
        +str cache_f
        +str flag
        +state
        +val
        +with_expiry(expiry_time, write_in) decorator
        +run(func, expiry_time, write_in)
        +_is_expired(cache_path, expiry_time) bool
    }

    class PreprocessResult {
        +bool ready
        +bool block_search
        +bool runtime_ready
        +str domain
        +tuple~dict~ messages
        +tuple~dict~ actions
        +dict state_flags
    }

    GuiSiteRuntime --> PreprocessRuntimeProbe
    PreprocessRuntimeProbe --> ThreadSiteRuntime

    SiteDatabasePreprocess "1" *-- "1" ReleaseAssetCache
    SiteDatabasePreprocess --> PreprocessRuntimeProbe
    SiteDatabasePreprocess --> GuiSiteRuntime
    SiteDatabasePreprocess --> PreprocessResult

    HitomiDatabasePreprocess --|> SiteDatabasePreprocess
    NhentaiDatabasePreprocess --|> SiteDatabasePreprocess

    ReleaseAssetCache --> ReleaseAssetResult
    ReleaseAssetCache --> Cache

    KemonoReleaseAsset --|> ReleaseAssetCache

    Cache <.. ReleaseAssetCache

    class run_site_preprocess {
        +run_site_preprocess(gui_site_runtime, conf_state, data_client, progress_callback) PreprocessResult
    }

    run_site_preprocess --> GuiSiteRuntime
    run_site_preprocess --> HitomiDatabasePreprocess
    run_site_preprocess --> NhentaiDatabasePreprocess
    run_site_preprocess --> PreprocessRuntimeProbe
    run_site_preprocess --> ReleaseAssetCache

    class _preprocess_script {
        +_preprocess_script(data_client, progress_callback) PreprocessResult
    }

    _preprocess_script --> KemonoReleaseAsset
    _preprocess_script --> PreprocessResult
Loading

Class diagram for nhentai provider stack and tag catalog

classDiagram
    class NhentaiBookInfo {
        +str source = "nhentai"
        +str media_id
        +str lang
        +str english_title
        +str japanese_title
        +str pretty_title
        +list pics
        +say str
    }

    class NhentaiTagCatalog {
        +tuple~str~ _tag_types
        +bool loaded
        +Path db_path
        +dict~str, dict~int, str~~ by_type
        +dict~str, set~int~~ valid_ids_by_type
        +__init__()
        +reset() void
        +load(db_path, default_db_path, excluded_language_names) dict~str,int~
        +preload(db_path, default_db_path, excluded_language_names) dict~str,int~
    }

    class NhentaiParser {
        +NhentaiTagCatalog catalog
        +_json_payload(resp_text) dict
        +_required(target, key)
        +_asset_url(asset_path, host) str
        +build_image_url(asset_path) str
        +build_thumbnail_url(asset_path) str
        +_select_title(english_title, japanese_title, pretty_title) str
        +_tag_name_from_ids(tag_ids, tag_type, excluded_names) str
        +parse_search_item(target) NhentaiBookInfo
        +parse_search(resp_text) list~NhentaiBookInfo~
        +_parse_page_assets(pages, media_id) list~dict~
        +parse_book(resp_text) NhentaiBookInfo
        +apply_detail(book, detail) NhentaiBookInfo
        +build_page_image_map(book) dict~int,str~
        +build_page_image_urls(book) list~str~
        +parse_preview_books(resp_text) list~NhentaiBookInfo~
    }

    class NhentaiReqer {
        +cli
        +NhentaiReqer(conf)
        +get_cli(conf, is_async, kwargs) httpx_Client
        +test_index() bool
        +_headers(referer) dict
        +preview_search(keyword, page) list~NhentaiBookInfo~
        +preview_fetch_pages(item) list~str~
    }

    class NhentaiUtils {
        +NhentaiParser parser
        +NhentaiReqer reqer
        +NhentaiTagCatalog catalog
        +str browser_referer_mode
        +NhentaiUtils(conf)
        +reset_tag_catalog() void
        +load_tag_catalog(db_path) dict~str,int~
        +preload_tag_catalog(db_path) dict~str,int~
        +preview_client_config(context) dict
    }

    class NhentaiParseError {
    }

    class _NhentaiContract {
        +set _language_excluded_names
        +str name
        +str proxy_policy
        +str domain
        +str index
        +str api_index
        +str image_host
        +str thumbnail_host
        +str search_url_head
        +tuple turn_page_info
        +dict mappings
        +dict headers
        +dict image_headers
        +dict book_hea
        +set cookies_field
        +uuid_regex
        +str book_url_regex
        +Path tag_db_path
        +str gallery_url_template
        +str gallery_api_url_template
        +build_search_url(keyword, page, sort) str
        +with_referer(referer) dict
    }

    class EroUtils {
    }
    class Cookies {
    }
    class Previewer {
    }
    class Req {
    }

    NhentaiParser --|> _NhentaiContract
    NhentaiReqer --|> _NhentaiContract
    NhentaiUtils --|> _NhentaiContract

    NhentaiUtils --|> EroUtils
    NhentaiUtils --|> Cookies
    NhentaiUtils --|> Previewer

    NhentaiReqer --|> Cookies
    NhentaiReqer --|> Req

    NhentaiParser --> NhentaiTagCatalog
    NhentaiParser --> NhentaiBookInfo
    NhentaiParser --> NhentaiParseError

    NhentaiUtils o--> NhentaiReqer
    NhentaiUtils o--> NhentaiParser
    NhentaiUtils o--> NhentaiTagCatalog

    class NhentaiSpider {
        +str name = "nhentai"
        +dict custom_settings
        +int num_of_row
        +str domain
        +str search_url_head
        +tuple turn_page_info
        +str book_id_url
        +dict mappings
        +ua dict
        +frame_section(response)
    }

    class BaseComicSpider2 {
    }

    NhentaiSpider --|> BaseComicSpider2
    NhentaiSpider --> NhentaiUtils
    NhentaiSpider --> NhentaiParser
Loading

Class diagram for Kemono SQLite author cache

classDiagram
    class KemonoAuthor {
        +str id
        +str name
        +str service
        +int updated
        +int favorited
        +avatar str
        +to_payload() dict~str,str|int~
    }

    class KemonoAuthorsDb {
        +Path db_path
        +KemonoAuthorsDb(db_path)
        +ensure_schema() void
        +replace_from_creators(creators) int
        +load_all() dict~str,KemonoAuthor~
    }

    class build_kemono_db_from_creators_bytes {
        +build_kemono_db_from_creators_bytes(db_path, payload) int
    }

    class load_kemono_authors {
        +load_kemono_authors(db_path) dict~str,KemonoAuthor~
    }

    KemonoAuthorsDb --> KemonoAuthor

    build_kemono_db_from_creators_bytes --> KemonoAuthorsDb
    load_kemono_authors --> KemonoAuthorsDb

    class KemonoCreator {
        +Path db_path
        +by_creatorid(order_creatorids)
    }

    KemonoCreator --> load_kemono_authors

    class KemonoTableViewController {
        +_set_kemono_table()
    }

    KemonoTableViewController --> load_kemono_authors
Loading

File-Level Changes

Change Details Files
Refactor site/script preprocess pipeline to async, asset-cache based, and shared between Hitomi, nhentai, and Kemono.
  • Replace run_site_preprocess with an async version that operates on GuiSiteRuntime, uses httpx.AsyncClient, and centralizes per-site branches including new Spider.NHENTAI.
  • Introduce ReleaseAssetCache, SiteDatabasePreprocess, HitomiDatabasePreprocess, NhentaiDatabasePreprocess, KemonoReleaseAsset, and PreprocessRuntimeProbe to manage HTTP downloads, cache TTLs, DB paths under __temp, and runtime access checks.
  • Change script preprocess to use KemonoReleaseAsset and remove the old _check_kemono_data creators.txt flow; script services/dependency checks remain but now depend on an async client.
  • Update GuiSiteRuntime.preprocess to be async and PreprocessManager to call run_script_preprocess directly for script (index 7) while delegating other sites to GuiSiteRuntime.preprocess.
  • Extend Cache core to support DB file caching, richer expiry handling, and expose state (validate/expired/new).
utils/website/preprocess.py
GUI/manager/preprocess.py
utils/website/site_runtime.py
utils/website/core/__init__.py
Add nhentai support across backend utilities, spider, and metadata models, including a tag catalog backed by an SQLite asset.
  • Introduce NhentaiBookInfo with nhentai-specific fields and say formatting.
  • Implement NhentaiTagCatalog, NhentaiParser, NhentaiReqer, and NhentaiUtils to drive search, detail parsing, and preview flows using nhentai API v2 and a tags SQLite DB.
  • Add NhentaiSpider Scrapy spider wired to NhentaiUtils mappings and parser, with frame_section building the image map from parsed book detail.
  • Wire NhentaiUtils into provider_map and Spider enum, including specials/cn_proxy membership, status tips, and mainwindow chooseBox item text.
  • Add a nhentai tag asset generator script (gen_assets.py) to scrape nhentai tag metadata into __temp/nhentai.db, and ensure preprocess loads/prewarms the tag catalog via NhentaiUtils.preload_tag_catalog.
utils/website/info.py
utils/website/nhentai/__init__.py
utils/website/nhentai/gen_assets.py
utils/website/ins.py
variables/__init__.py
GUI/mainwindow.py
ComicSpider/spiders/nhentai.py
Move Hitomi DB and tooling to use downloadable/cached assets in __temp backed by CI-generated releases.
  • Replace synchronous _preprocess_hitomi and _download_hitomi_db with HitomiDatabasePreprocess using ReleaseAssetCache and async client; mark hitomi data as optional but enabling add_hitomi_tool when available.
  • Adjust hitomi dataset scraper to write __temp/hitomi.db and update GitHub workflow to generate/upload hitomi.db and hitomi-manifest.json from __temp rather than assets/.
  • Update GUI hitomi_tool to reference __temp/hitomi.db instead of assets/hitomi.db, using a shared hitomi_db_path variable.
utils/website/preprocess.py
utils/website/hitomi/scape_dataset.py
.github/workflows/hitomi-db.yml
GUI/tools/hitomi_tool.py
Replace Kemono creators.pkl caching with a SQLite-backed kemono.db asset and hook it into GUI/script flows and a generator script.
  • Introduce KemonoAuthor dataclass and KemonoAuthorsDb helper in utils.website.kemono.db with functions to populate/load authors from JSON creators payload.
  • Replace old pickled KemonoAuthor and creators.txt/Motrix-based download path with usage of kemono.db in both runtime (utils.script.image.kemono.Creator.by_creatorid) and GUI table view.
  • Add KemonoReleaseAsset and associated preprocess logic so script preprocess downloads kemono.db via AsyncClient and stores cache state/errors in PreprocessResult; remove _check_kemono_data.
  • Provide a standalone async generator script utils.website.kemono.gen_assets to build kemono.db from kemono.cr API for CI or manual runs.
utils/website/kemono/db.py
utils/website/kemono/gen_assets.py
utils/script/image/kemono.py
GUI/script/kemono/__init__.py
utils/website/preprocess.py
Enhance async task infrastructure and preview pipeline to support richer progress reporting, queued searches, and minor cleanups.
  • Add AsyncTaskProgressReporter to throttle and format download progress (bytes/percent) and inject it as progress_callback in AsyncTaskThread when supported.
  • Modify PreprocessManager to track _active_preprocess and _queued_search, and to optionally queue a search to run after preprocess completes successfully, using safe_single_shot; also simplify data client handling (no global httpx.Client).
  • Expose preview-thread helpers _do_search and _do_fetch_episodes and delegate to thread_site_runtime methods, and add PreviewManager.replace_gui_site_runtime to rebuild workers when runtime is replaced.
  • Apply small formatting/clean-up changes (constructor one-liners, removal of old hitomi provider file, etc.).
GUI/manager/async_task.py
GUI/manager/preprocess.py
GUI/manager/preview/__init__.py
GUI/thread/preview.py
utils/website/providers/hitomi.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了两个问题,并给出了一些整体性反馈:

  • 在 utils/website/kemono/gen_assets.py 中,generate_kemono_db() 在一个 async 函数里对 httpx.AsyncClient 使用了 with _build_data_client() as data_client;这里应该使用 async with(或者让 client 变成同步的),以避免资源清理不正确。
  • Kemono 和 nhentai 的资源生成脚本目前把传输细节硬编码了(例如 _build_data_client 中的代理设置);建议通过配置或 CLI 参数传入这些信息,以便在没有本地代理的环境中也能正确工作。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- In utils/website/kemono/gen_assets.py, generate_kemono_db() uses `with _build_data_client() as data_client` on an httpx.AsyncClient inside an async function; this should be `async with` (or the client should be synchronous) to avoid improper resource cleanup.
- Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.

## Individual Comments

### Comment 1
<location path="utils/website/hitomi/scape_dataset.py" line_range="42-46" />
<code_context>
         p.parent.mkdir(parents=True, exist_ok=True)
-        tmp = p.parent / '__temp'
-        tmp.mkdir(exist_ok=True)
+        tmp = p.parent / 'hitomi'
+        tmp.mkdir(parents=True, exist_ok=True)
         return p, tmp
-    return ori_path.joinpath('assets/hitomi.db'), temp_p
+    db_path = temp_p.joinpath('hitomi.db')
+    return db_path


</code_context>
<issue_to_address>
**issue (bug_risk):** _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()

With `db_path_override` set, `_get_paths` now returns `(p, tmp)`, but `main()` treats `_get_paths(args.db_path)` as a single `db_p` and passes it directly to `Db.create_tables(db_p)` and `scrape_and_save(db_p, ...)`, both of which expect a path-like, not a tuple. In the non-override case it returns only `db_path`, so the return type is inconsistent. Please either (a) always return `(db_path, temp_dir)` and unpack in `main()`, or (b) always return just `db_path` and manage the temp dir separately.
</issue_to_address>

### Comment 2
<location path="utils/website/kemono/gen_assets.py" line_range="33-36" />
<code_context>
+    return resp.content
+
+
+async def generate_kemono_db():
+    resolved_db_path = temp_p.joinpath("kemono.db")
+    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
+    with _build_data_client() as data_client:
+        payload = await fetch_kemono_creators_payload(data_client=data_client)
+    return await asyncio.to_thread(build_kemono_db_from_creators_bytes, resolved_db_path, payload)
</code_context>
<issue_to_address>
**issue (bug_risk):** AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed

In `generate_kemono_db`, `_build_data_client()` returns an `httpx.AsyncClient`, but it’s used with a regular `with` instead of `async with`, so the client is never properly closed. Use `async with _build_data_client() as data_client:` (or explicitly `await data_client.aclose()`) to ensure the connection pool is cleanly shut down. For example:

```python
async def generate_kemono_db():
    resolved_db_path = temp_p.joinpath("kemono.db")
    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
    async with _build_data_client() as data_client:
        payload = await fetch_kemono_creators_payload(data_client=data_client)
    return await asyncio.to_thread(
        build_kemono_db_from_creators_bytes,
        resolved_db_path,
        payload,
    )
```
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得这些 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进之后的代码审查。
Original comment in English

Hey - I've found 2 issues, and left some high level feedback:

  • In utils/website/kemono/gen_assets.py, generate_kemono_db() uses with _build_data_client() as data_client on an httpx.AsyncClient inside an async function; this should be async with (or the client should be synchronous) to avoid improper resource cleanup.
  • Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In utils/website/kemono/gen_assets.py, generate_kemono_db() uses `with _build_data_client() as data_client` on an httpx.AsyncClient inside an async function; this should be `async with` (or the client should be synchronous) to avoid improper resource cleanup.
- Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.

## Individual Comments

### Comment 1
<location path="utils/website/hitomi/scape_dataset.py" line_range="42-46" />
<code_context>
         p.parent.mkdir(parents=True, exist_ok=True)
-        tmp = p.parent / '__temp'
-        tmp.mkdir(exist_ok=True)
+        tmp = p.parent / 'hitomi'
+        tmp.mkdir(parents=True, exist_ok=True)
         return p, tmp
-    return ori_path.joinpath('assets/hitomi.db'), temp_p
+    db_path = temp_p.joinpath('hitomi.db')
+    return db_path


</code_context>
<issue_to_address>
**issue (bug_risk):** _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()

With `db_path_override` set, `_get_paths` now returns `(p, tmp)`, but `main()` treats `_get_paths(args.db_path)` as a single `db_p` and passes it directly to `Db.create_tables(db_p)` and `scrape_and_save(db_p, ...)`, both of which expect a path-like, not a tuple. In the non-override case it returns only `db_path`, so the return type is inconsistent. Please either (a) always return `(db_path, temp_dir)` and unpack in `main()`, or (b) always return just `db_path` and manage the temp dir separately.
</issue_to_address>

### Comment 2
<location path="utils/website/kemono/gen_assets.py" line_range="33-36" />
<code_context>
+    return resp.content
+
+
+async def generate_kemono_db():
+    resolved_db_path = temp_p.joinpath("kemono.db")
+    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
+    with _build_data_client() as data_client:
+        payload = await fetch_kemono_creators_payload(data_client=data_client)
+    return await asyncio.to_thread(build_kemono_db_from_creators_bytes, resolved_db_path, payload)
</code_context>
<issue_to_address>
**issue (bug_risk):** AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed

In `generate_kemono_db`, `_build_data_client()` returns an `httpx.AsyncClient`, but it’s used with a regular `with` instead of `async with`, so the client is never properly closed. Use `async with _build_data_client() as data_client:` (or explicitly `await data_client.aclose()`) to ensure the connection pool is cleanly shut down. For example:

```python
async def generate_kemono_db():
    resolved_db_path = temp_p.joinpath("kemono.db")
    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
    async with _build_data_client() as data_client:
        payload = await fetch_kemono_creators_payload(data_client=data_client)
    return await asyncio.to_thread(
        build_kemono_db_from_creators_bytes,
        resolved_db_path,
        payload,
    )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +42 to +46
tmp = p.parent / 'hitomi'
tmp.mkdir(parents=True, exist_ok=True)
return p, tmp
return ori_path.joinpath('assets/hitomi.db'), temp_p
db_path = temp_p.joinpath('hitomi.db')
return db_path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): _get_paths 现在会根据 db_path_override 返回不同形状的结果,这会导致 main() 和 scrape_and_save() 出错。

当设置了 db_path_override 时,_get_paths 现在返回 (p, tmp),但 main()_get_paths(args.db_path) 当成单个 db_p 来使用,并直接传给 Db.create_tables(db_p)scrape_and_save(db_p, ...),而这两个函数都期望的是一个 path-like 对象,而不是元组。在未覆盖的情况下,它只返回 db_path,因此返回类型不一致。请考虑:(a)始终返回 (db_path, temp_dir),并在 main() 中进行解包;或者(b)始终只返回 db_path,并在其他地方单独管理临时目录。

Original comment in English

issue (bug_risk): _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()

With db_path_override set, _get_paths now returns (p, tmp), but main() treats _get_paths(args.db_path) as a single db_p and passes it directly to Db.create_tables(db_p) and scrape_and_save(db_p, ...), both of which expect a path-like, not a tuple. In the non-override case it returns only db_path, so the return type is inconsistent. Please either (a) always return (db_path, temp_dir) and unpack in main(), or (b) always return just db_path and manage the temp dir separately.

Comment on lines +33 to +36
async def generate_kemono_db():
resolved_db_path = temp_p.joinpath("kemono.db")
resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
with _build_data_client() as data_client:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): AsyncClient 正在与同步上下文管理器一起使用,因此从未被正确 await/关闭。

generate_kemono_db 中,_build_data_client() 返回的是 httpx.AsyncClient,但这里使用的是普通的 with,而不是 async with,所以 client 没有被正确关闭。请使用 async with _build_data_client() as data_client:(或显式调用 await data_client.aclose())以确保连接池被正确关闭。例如:

async def generate_kemono_db():
    resolved_db_path = temp_p.joinpath("kemono.db")
    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
    async with _build_data_client() as data_client:
        payload = await fetch_kemono_creators_payload(data_client=data_client)
    return await asyncio.to_thread(
        build_kemono_db_from_creators_bytes,
        resolved_db_path,
        payload,
    )
Original comment in English

issue (bug_risk): AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed

In generate_kemono_db, _build_data_client() returns an httpx.AsyncClient, but it’s used with a regular with instead of async with, so the client is never properly closed. Use async with _build_data_client() as data_client: (or explicitly await data_client.aclose()) to ensure the connection pool is cleanly shut down. For example:

async def generate_kemono_db():
    resolved_db_path = temp_p.joinpath("kemono.db")
    resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
    async with _build_data_client() as data_client:
        payload = await fetch_kemono_creators_payload(data_client=data_client)
    return await asyncio.to_thread(
        build_kemono_db_from_creators_bytes,
        resolved_db_path,
        payload,
    )

@jasoneri jasoneri added the dev spider add spider label May 8, 2026
@jasoneri jasoneri requested a review from jsonmaki May 8, 2026 18:20
@jsonmaki jsonmaki merged commit 93fff88 into 2.10-dev May 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev spider add spider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants