-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Open
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested
Description
⚠️ 提交前确认
- 我已经仔细阅读了项目使用过程中的常见问题汇总
- 我已经搜索并查看了已关闭的issues
- 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题
❓ 问题描述
无法爬取小红书评论,登录后开始爬取,十几秒后显示RetryError(最后一行)
未安装UV,按照网上的教程,用的是python venv
程序启动后有cdp_xhs_user_data_dir
但没有data文件夹
🔍 使用场景
- 目标平台: (如:小红书/抖音/微博等)
- 使用功能: (如:关键词搜索/用户主页爬取等)
💻 环境信息
- 操作系统:
- Python版本: 3.10.11
- 是否使用IP代理: 否
- 是否使用VPN翻墙软件:否
- 目标平台(抖音/小红书/微博等):小红书
📋 错误日志
在此粘贴完整的错误日志
```(venv) E:\Github_Tool\MediaCrawler-main>python main.py --platform xhs --lt qrcode --type search
2026-02-10 22:45:06 MediaCrawler INFO (core.py:74) - [XiaoHongShuCrawler] Launching browser using CDP mode
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:159) - [CDPBrowserManager] Detected browser: Microsoft Edge (Unknown Version)
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:162) - [CDPBrowserManager] Browser path: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:202) - [CDPBrowserManager] User data directory: E:\Github_Tool\MediaCrawler-main\browser_data\cdp_xhs_user_data_dir
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:163) - [BrowserLauncher] Launching browser: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:164) - [BrowserLauncher] Debug port: 9222
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:165) - [BrowserLauncher] Headless mode: False
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:195) - [BrowserLauncher] Waiting for browser to be ready on port 9222...
2026-02-10 22:45:12 MediaCrawler INFO (browser_launcher.py:204) - [BrowserLauncher] Browser is ready on port 9222
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:176) - [CDPBrowserManager] CDP port 9222 is accessible
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:87) - [CDPBrowserManager] SIGINT handler already exists, skipping registration to avoid override
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:92) - [CDPBrowserManager] SIGTERM handler already exists, skipping registration to avoid override
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:95) - [CDPBrowserManager] Cleanup handlers registered
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:240) - [CDPBrowserManager] Got browser WebSocket URL: ws://localhost:9222/devtools/browser/727c1618-2986-406a-81d4-7b010a675be7
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:259) - [CDPBrowserManager] Connecting to browser via CDP: ws://localhost:9222/devtools/browser/727c1618-2986-406a-81d4-7b010a675be7
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:265) - [CDPBrowserManager] Successfully connected to browser
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:266) - [CDPBrowserManager] Browser contexts count: 1
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:291) - [CDPBrowserManager] Using existing browser context
2026-02-10 22:45:14 MediaCrawler INFO (core.py:435) - [XiaoHongShuCrawler] CDP browser info: {'version': '142.0.3595.53', 'contexts_count': 1, 'debug_port': 9222, 'is_connected': True}
2026-02-10 22:45:15 MediaCrawler INFO (core.py:358) - [XiaoHongShuCrawler.create_xhs_client] Begin create Xiaohongshu API client ...
2026-02-10 22:45:15 MediaCrawler INFO (client.py:234) - [XiaoHongShuClient.pong] Begin to check login state...
2026-02-10 22:45:15 MediaCrawler INFO (client.py:245) - [XiaoHongShuClient.pong] Login state result: False
2026-02-10 22:45:15 MediaCrawler INFO (login.py:89) - [XiaoHongShuLogin.begin] Begin login xiaohongshu ...
2026-02-10 22:45:15 MediaCrawler INFO (login.py:169) - [XiaoHongShuLogin.login_by_qrcode] Begin login xiaohongshu by qrcode ...
2026-02-10 22:45:15 MediaCrawler INFO (login.py:202) - [XiaoHongShuLogin.login_by_qrcode] waiting for scan code login, remaining time is 120s
2026-02-10 22:45:27 MediaCrawler INFO (login.py:66) - [XiaoHongShuLogin.check_login_state] Login status confirmed by UI element ('Me' button).
2026-02-10 22:45:27 MediaCrawler INFO (login.py:210) - [XiaoHongShuLogin.login_by_qrcode] Login successful then wait for 5 seconds redirect ...
2026-02-10 22:45:32 MediaCrawler INFO (core.py:127) - [XiaoHongShuCrawler.search] Begin search Xiaohongshu keywords
2026-02-10 22:45:32 MediaCrawler INFO (core.py:134) - [XiaoHongShuCrawler.search] Current search keyword: 十五运
2026-02-10 22:45:32 MediaCrawler INFO (core.py:144) - [XiaoHongShuCrawler.search] search Xiaohongshu keyword: 十五运, page: 1
2026-02-10 22:45:34 MediaCrawler INFO (cdp_browser.py:391) - [CDPBrowserManager] Browser connection disconnected
2026-02-10 22:45:34 MediaCrawler INFO (browser_launcher.py:255) - [BrowserLauncher] Closing browser process...
2026-02-10 22:45:34 MediaCrawler INFO (browser_launcher.py:287) - [BrowserLauncher] Browser process closed
Traceback (most recent call last):
File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 50, in __call__
result = await fn(*args, **kwargs)
File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 143, in request
data: Dict = response.json()
File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\httpx\_models.py", line 832, in json
return jsonlib.loads(self.content, **kwargs)
File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\Github_Tool\MediaCrawler-main\main.py", line 157, in <module>
run(main, async_cleanup, cleanup_timeout_seconds=15.0, on_first_interrupt=_force_stop)
File "E:\Github_Tool\MediaCrawler-main\tools\app_runner.py", line 109, in run
asyncio.run(_runner())
File "E:\Github_Tool\MediaCrawler-main\python310\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "E:\Github_Tool\MediaCrawler-main\python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "E:\Github_Tool\MediaCrawler-main\tools\app_runner.py", line 96, in _runner
await app_main()
File "E:\Github_Tool\MediaCrawler-main\main.py", line 110, in main
await crawler.start()
File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\core.py", line 113, in start
await self.search()
File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\core.py", line 147, in search
notes_res = await self.xhs_client.get_note_by_keyword(
File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 291, in get_note_by_keyword
return await self.post(uri, data)
File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 183, in post
return await self.request(
File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 47, in __call__
do = self.iter(retry_state=retry_state)
File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x207c0e8fdc0 state=finished raised JSONDecodeError>]
## 📷 错误截图
<!-- 请提供错误截图 -->
<img width="1618" height="807" alt="Image" src="https://github.com/user-attachments/assets/7ee432b7-def3-44ce-af70-d52d7c1a0539" />Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested