feat: add nhentai#168
Conversation
审阅者指南本次改动引入了一个基于预构建 SQLite 标签数据库的 nhentai provider 和 spider,将预处理(preprocess)流程(Hitomi、nhentai、Kemono)重构为异步、以资源(asset)为驱动的形式,把 Kemono 切换为基于 SQLite 的作者缓存,并调整 GUI/异步基础设施,以支持进度上报、队列搜索以及新的站点索引/资源。 带资源下载与队列搜索的异步预处理时序图sequenceDiagram
actor User
participant GUI_Main as MainWindow
participant PreMgr as PreprocessManager
participant AsyncMgr as AsyncTaskManager
participant Task as AsyncTaskThread
participant GuiRuntime as GuiSiteRuntime
participant SitePre as run_site_preprocess
participant HitomiPre as HitomiDatabasePreprocess
participant NhentaiPre as NhentaiDatabasePreprocess
participant AssetCache as ReleaseAssetCache
participant Probe as PreprocessRuntimeProbe
participant Reporter as AsyncTaskProgressReporter
User->>GUI_Main: select site index
GUI_Main->>PreMgr: handle_choosebox_changed(index, gui_site_runtime)
PreMgr->>PreMgr: _next_generation()
PreMgr->>PreMgr: _active_preprocess = (index, generation)
PreMgr->>AsyncMgr: execute_simple_task(task_func)
AsyncMgr->>Task: start AsyncTaskThread
Note over Task: In thread
Task->>Task: detect progress_callback parameter
Task->>Reporter: create AsyncTaskProgressReporter(emit_progress)
Task->>GuiRuntime: preprocess(conf_state, progress_callback=Reporter)
GuiRuntime->>SitePre: run_site_preprocess(gui_site_runtime, ...)
SitePre->>Probe: PreprocessRuntimeProbe(gui_site_runtime)
Probe->>GuiRuntime: access_ready() or manga_copy_cache_hit()
alt site is HITOMI
SitePre->>HitomiPre: run()
HitomiPre->>AssetCache: ensure()
else site is NHENTAI
SitePre->>NhentaiPre: run()
NhentaiPre->>AssetCache: ensure()
end
AssetCache->>Reporter: download_start(label, total_bytes)
loop chunks
AssetCache->>Reporter: download_advance(chunk_size, label, total_bytes)
Reporter-->>Task: emit_progress(message)
Task-->>AsyncMgr: progress_signal(message)
AsyncMgr-->>GUI_Main: show tooltip update
end
AssetCache->>Reporter: download_finish(label)
SitePre-->>GuiRuntime: PreprocessResult
GuiRuntime-->>Task: PreprocessResult
Task-->>AsyncMgr: success_signal(result)
AsyncMgr-->>PreMgr: _on_preprocess_success(index, generation, result)
PreMgr->>PreMgr: _dispatch_queued_search(index, generation, ready)
alt queued search present and ready
PreMgr->>GUI_Main: safe_single_shot(0, start_and_search(keyword))
end
PreMgr->>PreMgr: _clear_active_preprocess(index, generation)
User->>GUI_Main: click search
alt preprocess still running
GUI_Main->>PreMgr: queue_search_after_preprocess(index, keyword)
PreMgr->>PreMgr: store (generation, index, keyword)
else no active preprocess
GUI_Main->>GUI_Main: start_and_search(keyword)
end
站点异步预处理与资源缓存的类图classDiagram
class GuiSiteRuntime {
+int site_index
+ProviderDescriptor provider_descriptor
+RuntimeContext runtime_context
+preprocess(conf_state, data_client, progress_callback) PreprocessResult
+create_thread_site_runtime(preview_client) ThreadSiteRuntime
}
class PreprocessRuntimeProbe {
-GuiSiteRuntime gui_site_runtime
+PreprocessRuntimeProbe(gui_site_runtime)
+manga_copy_cache_hit() bool
+access_ready() bool
+verified_runtime() ThreadSiteRuntime
}
class ReleaseAssetCache {
+str name
+Path db_path
+tuple~str~ download_urls
+httpx_AsyncClient data_client
+progress_callback
+str label
+int timeout
+int cache_ttl_hours
+Cache cache
+ReleaseAssetCache(name, db_path, download_urls, data_client, progress_callback, label, timeout, cache_ttl_hours)
+ensure() ReleaseAssetResult
-_emit_legacy_download_start() void
-_download() tuple~bool, list~str~~
}
class ReleaseAssetResult {
+bool ready
+bool cache_hit
+bool cache_expired
+Path db_path
+tuple~str~ errors
}
class SiteDatabasePreprocess {
<<abstract>>
+str name
+tuple~str~ download_urls
+bool data_required
+str data_ready_action
+GuiSiteRuntime gui_site_runtime
+PreprocessRuntimeProbe runtime_probe
+ReleaseAssetCache asset_cache
+Path db_path
+list~dict~ messages
+list~dict~ actions
+dict state_flags
+SiteDatabasePreprocess(gui_site_runtime, data_client, progress_callback)
+run() PreprocessResult
+after_data_ready() bool
}
class HitomiDatabasePreprocess {
+str name = "hitomi"
+tuple~str~ download_urls
+bool data_required = false
+str data_ready_action = "add_hitomi_tool"
}
class NhentaiDatabasePreprocess {
+str name = "nhentai"
+tuple~str~ download_urls
+after_data_ready() bool
}
class KemonoReleaseAsset {
+KemonoReleaseAsset(data_client, progress_callback)
}
class Cache {
+str cache_f
+str flag
+state
+val
+with_expiry(expiry_time, write_in) decorator
+run(func, expiry_time, write_in)
+_is_expired(cache_path, expiry_time) bool
}
class PreprocessResult {
+bool ready
+bool block_search
+bool runtime_ready
+str domain
+tuple~dict~ messages
+tuple~dict~ actions
+dict state_flags
}
GuiSiteRuntime --> PreprocessRuntimeProbe
PreprocessRuntimeProbe --> ThreadSiteRuntime
SiteDatabasePreprocess "1" *-- "1" ReleaseAssetCache
SiteDatabasePreprocess --> PreprocessRuntimeProbe
SiteDatabasePreprocess --> GuiSiteRuntime
SiteDatabasePreprocess --> PreprocessResult
HitomiDatabasePreprocess --|> SiteDatabasePreprocess
NhentaiDatabasePreprocess --|> SiteDatabasePreprocess
ReleaseAssetCache --> ReleaseAssetResult
ReleaseAssetCache --> Cache
KemonoReleaseAsset --|> ReleaseAssetCache
Cache <.. ReleaseAssetCache
class run_site_preprocess {
+run_site_preprocess(gui_site_runtime, conf_state, data_client, progress_callback) PreprocessResult
}
run_site_preprocess --> GuiSiteRuntime
run_site_preprocess --> HitomiDatabasePreprocess
run_site_preprocess --> NhentaiDatabasePreprocess
run_site_preprocess --> PreprocessRuntimeProbe
run_site_preprocess --> ReleaseAssetCache
class _preprocess_script {
+_preprocess_script(data_client, progress_callback) PreprocessResult
}
_preprocess_script --> KemonoReleaseAsset
_preprocess_script --> PreprocessResult
nhentai provider 栈与标签目录的类图classDiagram
class NhentaiBookInfo {
+str source = "nhentai"
+str media_id
+str lang
+str english_title
+str japanese_title
+str pretty_title
+list pics
+say str
}
class NhentaiTagCatalog {
+tuple~str~ _tag_types
+bool loaded
+Path db_path
+dict~str, dict~int, str~~ by_type
+dict~str, set~int~~ valid_ids_by_type
+__init__()
+reset() void
+load(db_path, default_db_path, excluded_language_names) dict~str,int~
+preload(db_path, default_db_path, excluded_language_names) dict~str,int~
}
class NhentaiParser {
+NhentaiTagCatalog catalog
+_json_payload(resp_text) dict
+_required(target, key)
+_asset_url(asset_path, host) str
+build_image_url(asset_path) str
+build_thumbnail_url(asset_path) str
+_select_title(english_title, japanese_title, pretty_title) str
+_tag_name_from_ids(tag_ids, tag_type, excluded_names) str
+parse_search_item(target) NhentaiBookInfo
+parse_search(resp_text) list~NhentaiBookInfo~
+_parse_page_assets(pages, media_id) list~dict~
+parse_book(resp_text) NhentaiBookInfo
+apply_detail(book, detail) NhentaiBookInfo
+build_page_image_map(book) dict~int,str~
+build_page_image_urls(book) list~str~
+parse_preview_books(resp_text) list~NhentaiBookInfo~
}
class NhentaiReqer {
+cli
+NhentaiReqer(conf)
+get_cli(conf, is_async, kwargs) httpx_Client
+test_index() bool
+_headers(referer) dict
+preview_search(keyword, page) list~NhentaiBookInfo~
+preview_fetch_pages(item) list~str~
}
class NhentaiUtils {
+NhentaiParser parser
+NhentaiReqer reqer
+NhentaiTagCatalog catalog
+str browser_referer_mode
+NhentaiUtils(conf)
+reset_tag_catalog() void
+load_tag_catalog(db_path) dict~str,int~
+preload_tag_catalog(db_path) dict~str,int~
+preview_client_config(context) dict
}
class NhentaiParseError {
}
class _NhentaiContract {
+set _language_excluded_names
+str name
+str proxy_policy
+str domain
+str index
+str api_index
+str image_host
+str thumbnail_host
+str search_url_head
+tuple turn_page_info
+dict mappings
+dict headers
+dict image_headers
+dict book_hea
+set cookies_field
+uuid_regex
+str book_url_regex
+Path tag_db_path
+str gallery_url_template
+str gallery_api_url_template
+build_search_url(keyword, page, sort) str
+with_referer(referer) dict
}
class EroUtils {
}
class Cookies {
}
class Previewer {
}
class Req {
}
NhentaiParser --|> _NhentaiContract
NhentaiReqer --|> _NhentaiContract
NhentaiUtils --|> _NhentaiContract
NhentaiUtils --|> EroUtils
NhentaiUtils --|> Cookies
NhentaiUtils --|> Previewer
NhentaiReqer --|> Cookies
NhentaiReqer --|> Req
NhentaiParser --> NhentaiTagCatalog
NhentaiParser --> NhentaiBookInfo
NhentaiParser --> NhentaiParseError
NhentaiUtils o--> NhentaiReqer
NhentaiUtils o--> NhentaiParser
NhentaiUtils o--> NhentaiTagCatalog
class NhentaiSpider {
+str name = "nhentai"
+dict custom_settings
+int num_of_row
+str domain
+str search_url_head
+tuple turn_page_info
+str book_id_url
+dict mappings
+ua dict
+frame_section(response)
}
class BaseComicSpider2 {
}
NhentaiSpider --|> BaseComicSpider2
NhentaiSpider --> NhentaiUtils
NhentaiSpider --> NhentaiParser
Kemono SQLite 作者缓存类图classDiagram
class KemonoAuthor {
+str id
+str name
+str service
+int updated
+int favorited
+avatar str
+to_payload() dict~str,str|int~
}
class KemonoAuthorsDb {
+Path db_path
+KemonoAuthorsDb(db_path)
+ensure_schema() void
+replace_from_creators(creators) int
+load_all() dict~str,KemonoAuthor~
}
class build_kemono_db_from_creators_bytes {
+build_kemono_db_from_creators_bytes(db_path, payload) int
}
class load_kemono_authors {
+load_kemono_authors(db_path) dict~str,KemonoAuthor~
}
KemonoAuthorsDb --> KemonoAuthor
build_kemono_db_from_creators_bytes --> KemonoAuthorsDb
load_kemono_authors --> KemonoAuthorsDb
class KemonoCreator {
+Path db_path
+by_creatorid(order_creatorids)
}
KemonoCreator --> load_kemono_authors
class KemonoTableViewController {
+_set_kemono_table()
}
KemonoTableViewController --> load_kemono_authors
文件级变更
技巧与命令与 Sourcery 交互
自定义使用体验打开你的 控制面板 可以:
获取帮助Original review guide in EnglishReviewer's GuideIntroduces an nhentai provider and spider backed by a prebuilt SQLite tag database, refactors preprocess flows to be async and asset-driven (Hitomi, nhentai, Kemono), switches Kemono to a SQLite-backed author cache, and adjusts GUI/async infrastructure to support progress reporting, queued searches, and new site indices/assets. Sequence diagram for async preprocess with asset download and queued searchsequenceDiagram
actor User
participant GUI_Main as MainWindow
participant PreMgr as PreprocessManager
participant AsyncMgr as AsyncTaskManager
participant Task as AsyncTaskThread
participant GuiRuntime as GuiSiteRuntime
participant SitePre as run_site_preprocess
participant HitomiPre as HitomiDatabasePreprocess
participant NhentaiPre as NhentaiDatabasePreprocess
participant AssetCache as ReleaseAssetCache
participant Probe as PreprocessRuntimeProbe
participant Reporter as AsyncTaskProgressReporter
User->>GUI_Main: select site index
GUI_Main->>PreMgr: handle_choosebox_changed(index, gui_site_runtime)
PreMgr->>PreMgr: _next_generation()
PreMgr->>PreMgr: _active_preprocess = (index, generation)
PreMgr->>AsyncMgr: execute_simple_task(task_func)
AsyncMgr->>Task: start AsyncTaskThread
Note over Task: In thread
Task->>Task: detect progress_callback parameter
Task->>Reporter: create AsyncTaskProgressReporter(emit_progress)
Task->>GuiRuntime: preprocess(conf_state, progress_callback=Reporter)
GuiRuntime->>SitePre: run_site_preprocess(gui_site_runtime, ...)
SitePre->>Probe: PreprocessRuntimeProbe(gui_site_runtime)
Probe->>GuiRuntime: access_ready() or manga_copy_cache_hit()
alt site is HITOMI
SitePre->>HitomiPre: run()
HitomiPre->>AssetCache: ensure()
else site is NHENTAI
SitePre->>NhentaiPre: run()
NhentaiPre->>AssetCache: ensure()
end
AssetCache->>Reporter: download_start(label, total_bytes)
loop chunks
AssetCache->>Reporter: download_advance(chunk_size, label, total_bytes)
Reporter-->>Task: emit_progress(message)
Task-->>AsyncMgr: progress_signal(message)
AsyncMgr-->>GUI_Main: show tooltip update
end
AssetCache->>Reporter: download_finish(label)
SitePre-->>GuiRuntime: PreprocessResult
GuiRuntime-->>Task: PreprocessResult
Task-->>AsyncMgr: success_signal(result)
AsyncMgr-->>PreMgr: _on_preprocess_success(index, generation, result)
PreMgr->>PreMgr: _dispatch_queued_search(index, generation, ready)
alt queued search present and ready
PreMgr->>GUI_Main: safe_single_shot(0, start_and_search(keyword))
end
PreMgr->>PreMgr: _clear_active_preprocess(index, generation)
User->>GUI_Main: click search
alt preprocess still running
GUI_Main->>PreMgr: queue_search_after_preprocess(index, keyword)
PreMgr->>PreMgr: store (generation, index, keyword)
else no active preprocess
GUI_Main->>GUI_Main: start_and_search(keyword)
end
Class diagram for async site preprocess and asset cachingclassDiagram
class GuiSiteRuntime {
+int site_index
+ProviderDescriptor provider_descriptor
+RuntimeContext runtime_context
+preprocess(conf_state, data_client, progress_callback) PreprocessResult
+create_thread_site_runtime(preview_client) ThreadSiteRuntime
}
class PreprocessRuntimeProbe {
-GuiSiteRuntime gui_site_runtime
+PreprocessRuntimeProbe(gui_site_runtime)
+manga_copy_cache_hit() bool
+access_ready() bool
+verified_runtime() ThreadSiteRuntime
}
class ReleaseAssetCache {
+str name
+Path db_path
+tuple~str~ download_urls
+httpx_AsyncClient data_client
+progress_callback
+str label
+int timeout
+int cache_ttl_hours
+Cache cache
+ReleaseAssetCache(name, db_path, download_urls, data_client, progress_callback, label, timeout, cache_ttl_hours)
+ensure() ReleaseAssetResult
-_emit_legacy_download_start() void
-_download() tuple~bool, list~str~~
}
class ReleaseAssetResult {
+bool ready
+bool cache_hit
+bool cache_expired
+Path db_path
+tuple~str~ errors
}
class SiteDatabasePreprocess {
<<abstract>>
+str name
+tuple~str~ download_urls
+bool data_required
+str data_ready_action
+GuiSiteRuntime gui_site_runtime
+PreprocessRuntimeProbe runtime_probe
+ReleaseAssetCache asset_cache
+Path db_path
+list~dict~ messages
+list~dict~ actions
+dict state_flags
+SiteDatabasePreprocess(gui_site_runtime, data_client, progress_callback)
+run() PreprocessResult
+after_data_ready() bool
}
class HitomiDatabasePreprocess {
+str name = "hitomi"
+tuple~str~ download_urls
+bool data_required = false
+str data_ready_action = "add_hitomi_tool"
}
class NhentaiDatabasePreprocess {
+str name = "nhentai"
+tuple~str~ download_urls
+after_data_ready() bool
}
class KemonoReleaseAsset {
+KemonoReleaseAsset(data_client, progress_callback)
}
class Cache {
+str cache_f
+str flag
+state
+val
+with_expiry(expiry_time, write_in) decorator
+run(func, expiry_time, write_in)
+_is_expired(cache_path, expiry_time) bool
}
class PreprocessResult {
+bool ready
+bool block_search
+bool runtime_ready
+str domain
+tuple~dict~ messages
+tuple~dict~ actions
+dict state_flags
}
GuiSiteRuntime --> PreprocessRuntimeProbe
PreprocessRuntimeProbe --> ThreadSiteRuntime
SiteDatabasePreprocess "1" *-- "1" ReleaseAssetCache
SiteDatabasePreprocess --> PreprocessRuntimeProbe
SiteDatabasePreprocess --> GuiSiteRuntime
SiteDatabasePreprocess --> PreprocessResult
HitomiDatabasePreprocess --|> SiteDatabasePreprocess
NhentaiDatabasePreprocess --|> SiteDatabasePreprocess
ReleaseAssetCache --> ReleaseAssetResult
ReleaseAssetCache --> Cache
KemonoReleaseAsset --|> ReleaseAssetCache
Cache <.. ReleaseAssetCache
class run_site_preprocess {
+run_site_preprocess(gui_site_runtime, conf_state, data_client, progress_callback) PreprocessResult
}
run_site_preprocess --> GuiSiteRuntime
run_site_preprocess --> HitomiDatabasePreprocess
run_site_preprocess --> NhentaiDatabasePreprocess
run_site_preprocess --> PreprocessRuntimeProbe
run_site_preprocess --> ReleaseAssetCache
class _preprocess_script {
+_preprocess_script(data_client, progress_callback) PreprocessResult
}
_preprocess_script --> KemonoReleaseAsset
_preprocess_script --> PreprocessResult
Class diagram for nhentai provider stack and tag catalogclassDiagram
class NhentaiBookInfo {
+str source = "nhentai"
+str media_id
+str lang
+str english_title
+str japanese_title
+str pretty_title
+list pics
+say str
}
class NhentaiTagCatalog {
+tuple~str~ _tag_types
+bool loaded
+Path db_path
+dict~str, dict~int, str~~ by_type
+dict~str, set~int~~ valid_ids_by_type
+__init__()
+reset() void
+load(db_path, default_db_path, excluded_language_names) dict~str,int~
+preload(db_path, default_db_path, excluded_language_names) dict~str,int~
}
class NhentaiParser {
+NhentaiTagCatalog catalog
+_json_payload(resp_text) dict
+_required(target, key)
+_asset_url(asset_path, host) str
+build_image_url(asset_path) str
+build_thumbnail_url(asset_path) str
+_select_title(english_title, japanese_title, pretty_title) str
+_tag_name_from_ids(tag_ids, tag_type, excluded_names) str
+parse_search_item(target) NhentaiBookInfo
+parse_search(resp_text) list~NhentaiBookInfo~
+_parse_page_assets(pages, media_id) list~dict~
+parse_book(resp_text) NhentaiBookInfo
+apply_detail(book, detail) NhentaiBookInfo
+build_page_image_map(book) dict~int,str~
+build_page_image_urls(book) list~str~
+parse_preview_books(resp_text) list~NhentaiBookInfo~
}
class NhentaiReqer {
+cli
+NhentaiReqer(conf)
+get_cli(conf, is_async, kwargs) httpx_Client
+test_index() bool
+_headers(referer) dict
+preview_search(keyword, page) list~NhentaiBookInfo~
+preview_fetch_pages(item) list~str~
}
class NhentaiUtils {
+NhentaiParser parser
+NhentaiReqer reqer
+NhentaiTagCatalog catalog
+str browser_referer_mode
+NhentaiUtils(conf)
+reset_tag_catalog() void
+load_tag_catalog(db_path) dict~str,int~
+preload_tag_catalog(db_path) dict~str,int~
+preview_client_config(context) dict
}
class NhentaiParseError {
}
class _NhentaiContract {
+set _language_excluded_names
+str name
+str proxy_policy
+str domain
+str index
+str api_index
+str image_host
+str thumbnail_host
+str search_url_head
+tuple turn_page_info
+dict mappings
+dict headers
+dict image_headers
+dict book_hea
+set cookies_field
+uuid_regex
+str book_url_regex
+Path tag_db_path
+str gallery_url_template
+str gallery_api_url_template
+build_search_url(keyword, page, sort) str
+with_referer(referer) dict
}
class EroUtils {
}
class Cookies {
}
class Previewer {
}
class Req {
}
NhentaiParser --|> _NhentaiContract
NhentaiReqer --|> _NhentaiContract
NhentaiUtils --|> _NhentaiContract
NhentaiUtils --|> EroUtils
NhentaiUtils --|> Cookies
NhentaiUtils --|> Previewer
NhentaiReqer --|> Cookies
NhentaiReqer --|> Req
NhentaiParser --> NhentaiTagCatalog
NhentaiParser --> NhentaiBookInfo
NhentaiParser --> NhentaiParseError
NhentaiUtils o--> NhentaiReqer
NhentaiUtils o--> NhentaiParser
NhentaiUtils o--> NhentaiTagCatalog
class NhentaiSpider {
+str name = "nhentai"
+dict custom_settings
+int num_of_row
+str domain
+str search_url_head
+tuple turn_page_info
+str book_id_url
+dict mappings
+ua dict
+frame_section(response)
}
class BaseComicSpider2 {
}
NhentaiSpider --|> BaseComicSpider2
NhentaiSpider --> NhentaiUtils
NhentaiSpider --> NhentaiParser
Class diagram for Kemono SQLite author cacheclassDiagram
class KemonoAuthor {
+str id
+str name
+str service
+int updated
+int favorited
+avatar str
+to_payload() dict~str,str|int~
}
class KemonoAuthorsDb {
+Path db_path
+KemonoAuthorsDb(db_path)
+ensure_schema() void
+replace_from_creators(creators) int
+load_all() dict~str,KemonoAuthor~
}
class build_kemono_db_from_creators_bytes {
+build_kemono_db_from_creators_bytes(db_path, payload) int
}
class load_kemono_authors {
+load_kemono_authors(db_path) dict~str,KemonoAuthor~
}
KemonoAuthorsDb --> KemonoAuthor
build_kemono_db_from_creators_bytes --> KemonoAuthorsDb
load_kemono_authors --> KemonoAuthorsDb
class KemonoCreator {
+Path db_path
+by_creatorid(order_creatorids)
}
KemonoCreator --> load_kemono_authors
class KemonoTableViewController {
+_set_kemono_table()
}
KemonoTableViewController --> load_kemono_authors
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - 我发现了两个问题,并给出了一些整体性反馈:
- 在 utils/website/kemono/gen_assets.py 中,generate_kemono_db() 在一个 async 函数里对 httpx.AsyncClient 使用了
with _build_data_client() as data_client;这里应该使用async with(或者让 client 变成同步的),以避免资源清理不正确。 - Kemono 和 nhentai 的资源生成脚本目前把传输细节硬编码了(例如 _build_data_client 中的代理设置);建议通过配置或 CLI 参数传入这些信息,以便在没有本地代理的环境中也能正确工作。
给 AI Agent 的提示词
Please address the comments from this code review:
## Overall Comments
- In utils/website/kemono/gen_assets.py, generate_kemono_db() uses `with _build_data_client() as data_client` on an httpx.AsyncClient inside an async function; this should be `async with` (or the client should be synchronous) to avoid improper resource cleanup.
- Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.
## Individual Comments
### Comment 1
<location path="utils/website/hitomi/scape_dataset.py" line_range="42-46" />
<code_context>
p.parent.mkdir(parents=True, exist_ok=True)
- tmp = p.parent / '__temp'
- tmp.mkdir(exist_ok=True)
+ tmp = p.parent / 'hitomi'
+ tmp.mkdir(parents=True, exist_ok=True)
return p, tmp
- return ori_path.joinpath('assets/hitomi.db'), temp_p
+ db_path = temp_p.joinpath('hitomi.db')
+ return db_path
</code_context>
<issue_to_address>
**issue (bug_risk):** _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()
With `db_path_override` set, `_get_paths` now returns `(p, tmp)`, but `main()` treats `_get_paths(args.db_path)` as a single `db_p` and passes it directly to `Db.create_tables(db_p)` and `scrape_and_save(db_p, ...)`, both of which expect a path-like, not a tuple. In the non-override case it returns only `db_path`, so the return type is inconsistent. Please either (a) always return `(db_path, temp_dir)` and unpack in `main()`, or (b) always return just `db_path` and manage the temp dir separately.
</issue_to_address>
### Comment 2
<location path="utils/website/kemono/gen_assets.py" line_range="33-36" />
<code_context>
+ return resp.content
+
+
+async def generate_kemono_db():
+ resolved_db_path = temp_p.joinpath("kemono.db")
+ resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
+ with _build_data_client() as data_client:
+ payload = await fetch_kemono_creators_payload(data_client=data_client)
+ return await asyncio.to_thread(build_kemono_db_from_creators_bytes, resolved_db_path, payload)
</code_context>
<issue_to_address>
**issue (bug_risk):** AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed
In `generate_kemono_db`, `_build_data_client()` returns an `httpx.AsyncClient`, but it’s used with a regular `with` instead of `async with`, so the client is never properly closed. Use `async with _build_data_client() as data_client:` (or explicitly `await data_client.aclose()`) to ensure the connection pool is cleanly shut down. For example:
```python
async def generate_kemono_db():
resolved_db_path = temp_p.joinpath("kemono.db")
resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
async with _build_data_client() as data_client:
payload = await fetch_kemono_creators_payload(data_client=data_client)
return await asyncio.to_thread(
build_kemono_db_from_creators_bytes,
resolved_db_path,
payload,
)
```
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进之后的代码审查。
Original comment in English
Hey - I've found 2 issues, and left some high level feedback:
- In utils/website/kemono/gen_assets.py, generate_kemono_db() uses
with _build_data_client() as data_clienton an httpx.AsyncClient inside an async function; this should beasync with(or the client should be synchronous) to avoid improper resource cleanup. - Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In utils/website/kemono/gen_assets.py, generate_kemono_db() uses `with _build_data_client() as data_client` on an httpx.AsyncClient inside an async function; this should be `async with` (or the client should be synchronous) to avoid improper resource cleanup.
- Kemono and nhentai asset generation scripts currently hardcode transport details (e.g. proxy in _build_data_client); consider wiring these through config or CLI arguments so they behave correctly in environments without that local proxy.
## Individual Comments
### Comment 1
<location path="utils/website/hitomi/scape_dataset.py" line_range="42-46" />
<code_context>
p.parent.mkdir(parents=True, exist_ok=True)
- tmp = p.parent / '__temp'
- tmp.mkdir(exist_ok=True)
+ tmp = p.parent / 'hitomi'
+ tmp.mkdir(parents=True, exist_ok=True)
return p, tmp
- return ori_path.joinpath('assets/hitomi.db'), temp_p
+ db_path = temp_p.joinpath('hitomi.db')
+ return db_path
</code_context>
<issue_to_address>
**issue (bug_risk):** _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()
With `db_path_override` set, `_get_paths` now returns `(p, tmp)`, but `main()` treats `_get_paths(args.db_path)` as a single `db_p` and passes it directly to `Db.create_tables(db_p)` and `scrape_and_save(db_p, ...)`, both of which expect a path-like, not a tuple. In the non-override case it returns only `db_path`, so the return type is inconsistent. Please either (a) always return `(db_path, temp_dir)` and unpack in `main()`, or (b) always return just `db_path` and manage the temp dir separately.
</issue_to_address>
### Comment 2
<location path="utils/website/kemono/gen_assets.py" line_range="33-36" />
<code_context>
+ return resp.content
+
+
+async def generate_kemono_db():
+ resolved_db_path = temp_p.joinpath("kemono.db")
+ resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
+ with _build_data_client() as data_client:
+ payload = await fetch_kemono_creators_payload(data_client=data_client)
+ return await asyncio.to_thread(build_kemono_db_from_creators_bytes, resolved_db_path, payload)
</code_context>
<issue_to_address>
**issue (bug_risk):** AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed
In `generate_kemono_db`, `_build_data_client()` returns an `httpx.AsyncClient`, but it’s used with a regular `with` instead of `async with`, so the client is never properly closed. Use `async with _build_data_client() as data_client:` (or explicitly `await data_client.aclose()`) to ensure the connection pool is cleanly shut down. For example:
```python
async def generate_kemono_db():
resolved_db_path = temp_p.joinpath("kemono.db")
resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
async with _build_data_client() as data_client:
payload = await fetch_kemono_creators_payload(data_client=data_client)
return await asyncio.to_thread(
build_kemono_db_from_creators_bytes,
resolved_db_path,
payload,
)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| tmp = p.parent / 'hitomi' | ||
| tmp.mkdir(parents=True, exist_ok=True) | ||
| return p, tmp | ||
| return ori_path.joinpath('assets/hitomi.db'), temp_p | ||
| db_path = temp_p.joinpath('hitomi.db') | ||
| return db_path |
There was a problem hiding this comment.
issue (bug_risk): _get_paths 现在会根据 db_path_override 返回不同形状的结果,这会导致 main() 和 scrape_and_save() 出错。
当设置了 db_path_override 时,_get_paths 现在返回 (p, tmp),但 main() 把 _get_paths(args.db_path) 当成单个 db_p 来使用,并直接传给 Db.create_tables(db_p) 和 scrape_and_save(db_p, ...),而这两个函数都期望的是一个 path-like 对象,而不是元组。在未覆盖的情况下,它只返回 db_path,因此返回类型不一致。请考虑:(a)始终返回 (db_path, temp_dir),并在 main() 中进行解包;或者(b)始终只返回 db_path,并在其他地方单独管理临时目录。
Original comment in English
issue (bug_risk): _get_paths now returns different shapes depending on db_path_override, which breaks main() and scrape_and_save()
With db_path_override set, _get_paths now returns (p, tmp), but main() treats _get_paths(args.db_path) as a single db_p and passes it directly to Db.create_tables(db_p) and scrape_and_save(db_p, ...), both of which expect a path-like, not a tuple. In the non-override case it returns only db_path, so the return type is inconsistent. Please either (a) always return (db_path, temp_dir) and unpack in main(), or (b) always return just db_path and manage the temp dir separately.
| async def generate_kemono_db(): | ||
| resolved_db_path = temp_p.joinpath("kemono.db") | ||
| resolved_db_path.parent.mkdir(parents=True, exist_ok=True) | ||
| with _build_data_client() as data_client: |
There was a problem hiding this comment.
issue (bug_risk): AsyncClient 正在与同步上下文管理器一起使用,因此从未被正确 await/关闭。
在 generate_kemono_db 中,_build_data_client() 返回的是 httpx.AsyncClient,但这里使用的是普通的 with,而不是 async with,所以 client 没有被正确关闭。请使用 async with _build_data_client() as data_client:(或显式调用 await data_client.aclose())以确保连接池被正确关闭。例如:
async def generate_kemono_db():
resolved_db_path = temp_p.joinpath("kemono.db")
resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
async with _build_data_client() as data_client:
payload = await fetch_kemono_creators_payload(data_client=data_client)
return await asyncio.to_thread(
build_kemono_db_from_creators_bytes,
resolved_db_path,
payload,
)Original comment in English
issue (bug_risk): AsyncClient is used with a synchronous context manager, so it is never properly awaited/closed
In generate_kemono_db, _build_data_client() returns an httpx.AsyncClient, but it’s used with a regular with instead of async with, so the client is never properly closed. Use async with _build_data_client() as data_client: (or explicitly await data_client.aclose()) to ensure the connection pool is cleanly shut down. For example:
async def generate_kemono_db():
resolved_db_path = temp_p.joinpath("kemono.db")
resolved_db_path.parent.mkdir(parents=True, exist_ok=True)
async with _build_data_client() as data_client:
payload = await fetch_kemono_creators_payload(data_client=data_client)
return await asyncio.to_thread(
build_kemono_db_from_creators_bytes,
resolved_db_path,
payload,
)
Description
refactor: assets flow
Related Issues
Checklist:
Summary by Sourcery
为网站集成添加 nhentai 支持,并重构预处理和资源管理。
New Features:
Enhancements:
Build:
__temp目录而不是assets目录生成并上传数据库和清单(manifest)。CI:
Original summary in English
Summary by Sourcery
Add nhentai support and refactor preprocess and asset management for website integrations.
New Features:
Enhancements:
Build:
CI: