Skip to content

[问题] 无法爬取小红书评论,登录后开始爬取,十几秒后显示RetryError(最后一行) #830

@mzlhwcc

Description

@mzlhwcc

⚠️ 提交前确认

  • 我已经仔细阅读了项目使用过程中的常见问题汇总
  • 我已经搜索并查看了已关闭的issues
  • 我确认这不是由于滑块验证码、Cookie过期、Cookie提取错误、平台风控等常见原因导致的问题

❓ 问题描述

无法爬取小红书评论,登录后开始爬取,十几秒后显示RetryError(最后一行)
未安装UV,按照网上的教程,用的是python venv
程序启动后有cdp_xhs_user_data_dir
但没有data文件夹

🔍 使用场景

  • 目标平台: (如:小红书/抖音/微博等)
  • 使用功能: (如:关键词搜索/用户主页爬取等)

💻 环境信息

  • 操作系统:
  • Python版本: 3.10.11
  • 是否使用IP代理: 否
  • 是否使用VPN翻墙软件:否
  • 目标平台(抖音/小红书/微博等):小红书

📋 错误日志

在此粘贴完整的错误日志
```(venv) E:\Github_Tool\MediaCrawler-main>python main.py --platform xhs --lt qrcode --type search
2026-02-10 22:45:06 MediaCrawler INFO (core.py:74) - [XiaoHongShuCrawler] Launching browser using CDP mode
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:159) - [CDPBrowserManager] Detected browser: Microsoft Edge (Unknown Version)
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:162) - [CDPBrowserManager] Browser path: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2026-02-10 22:45:11 MediaCrawler INFO (cdp_browser.py:202) - [CDPBrowserManager] User data directory: E:\Github_Tool\MediaCrawler-main\browser_data\cdp_xhs_user_data_dir
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:163) - [BrowserLauncher] Launching browser: C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:164) - [BrowserLauncher] Debug port: 9222
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:165) - [BrowserLauncher] Headless mode: False
2026-02-10 22:45:11 MediaCrawler INFO (browser_launcher.py:195) - [BrowserLauncher] Waiting for browser to be ready on port 9222...
2026-02-10 22:45:12 MediaCrawler INFO (browser_launcher.py:204) - [BrowserLauncher] Browser is ready on port 9222
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:176) - [CDPBrowserManager] CDP port 9222 is accessible
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:87) - [CDPBrowserManager] SIGINT handler already exists, skipping registration to avoid override
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:92) - [CDPBrowserManager] SIGTERM handler already exists, skipping registration to avoid override
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:95) - [CDPBrowserManager] Cleanup handlers registered
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:240) - [CDPBrowserManager] Got browser WebSocket URL: ws://localhost:9222/devtools/browser/727c1618-2986-406a-81d4-7b010a675be7
2026-02-10 22:45:13 MediaCrawler INFO (cdp_browser.py:259) - [CDPBrowserManager] Connecting to browser via CDP: ws://localhost:9222/devtools/browser/727c1618-2986-406a-81d4-7b010a675be7
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:265) - [CDPBrowserManager] Successfully connected to browser
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:266) - [CDPBrowserManager] Browser contexts count: 1
2026-02-10 22:45:14 MediaCrawler INFO (cdp_browser.py:291) - [CDPBrowserManager] Using existing browser context
2026-02-10 22:45:14 MediaCrawler INFO (core.py:435) - [XiaoHongShuCrawler] CDP browser info: {'version': '142.0.3595.53', 'contexts_count': 1, 'debug_port': 9222, 'is_connected': True}
2026-02-10 22:45:15 MediaCrawler INFO (core.py:358) - [XiaoHongShuCrawler.create_xhs_client] Begin create Xiaohongshu API client ...
2026-02-10 22:45:15 MediaCrawler INFO (client.py:234) - [XiaoHongShuClient.pong] Begin to check login state...
2026-02-10 22:45:15 MediaCrawler INFO (client.py:245) - [XiaoHongShuClient.pong] Login state result: False
2026-02-10 22:45:15 MediaCrawler INFO (login.py:89) - [XiaoHongShuLogin.begin] Begin login xiaohongshu ...
2026-02-10 22:45:15 MediaCrawler INFO (login.py:169) - [XiaoHongShuLogin.login_by_qrcode] Begin login xiaohongshu by qrcode ...
2026-02-10 22:45:15 MediaCrawler INFO (login.py:202) - [XiaoHongShuLogin.login_by_qrcode] waiting for scan code login, remaining time is 120s
2026-02-10 22:45:27 MediaCrawler INFO (login.py:66) - [XiaoHongShuLogin.check_login_state] Login status confirmed by UI element ('Me' button).
2026-02-10 22:45:27 MediaCrawler INFO (login.py:210) - [XiaoHongShuLogin.login_by_qrcode] Login successful then wait for 5 seconds redirect ...
2026-02-10 22:45:32 MediaCrawler INFO (core.py:127) - [XiaoHongShuCrawler.search] Begin search Xiaohongshu keywords
2026-02-10 22:45:32 MediaCrawler INFO (core.py:134) - [XiaoHongShuCrawler.search] Current search keyword: 十五运
2026-02-10 22:45:32 MediaCrawler INFO (core.py:144) - [XiaoHongShuCrawler.search] search Xiaohongshu keyword: 十五运, page: 1
2026-02-10 22:45:34 MediaCrawler INFO (cdp_browser.py:391) - [CDPBrowserManager] Browser connection disconnected
2026-02-10 22:45:34 MediaCrawler INFO (browser_launcher.py:255) - [BrowserLauncher] Closing browser process...
2026-02-10 22:45:34 MediaCrawler INFO (browser_launcher.py:287) - [BrowserLauncher] Browser process closed
Traceback (most recent call last):
  File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
  File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 143, in request
    data: Dict = response.json()
  File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\httpx\_models.py", line 832, in json
    return jsonlib.loads(self.content, **kwargs)
  File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "E:\Github_Tool\MediaCrawler-main\python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Github_Tool\MediaCrawler-main\main.py", line 157, in <module>
    run(main, async_cleanup, cleanup_timeout_seconds=15.0, on_first_interrupt=_force_stop)
  File "E:\Github_Tool\MediaCrawler-main\tools\app_runner.py", line 109, in run
    asyncio.run(_runner())
  File "E:\Github_Tool\MediaCrawler-main\python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "E:\Github_Tool\MediaCrawler-main\python310\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "E:\Github_Tool\MediaCrawler-main\tools\app_runner.py", line 96, in _runner
    await app_main()
  File "E:\Github_Tool\MediaCrawler-main\main.py", line 110, in main
    await crawler.start()
  File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\core.py", line 113, in start
    await self.search()
  File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\core.py", line 147, in search
    notes_res = await self.xhs_client.get_note_by_keyword(
  File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 291, in get_note_by_keyword
    return await self.post(uri, data)
  File "E:\Github_Tool\MediaCrawler-main\media_platform\xhs\client.py", line 183, in post
    return await self.request(
  File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
  File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
  File "E:\Github_Tool\MediaCrawler-main\venv\lib\site-packages\tenacity\__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x207c0e8fdc0 state=finished raised JSONDecodeError>]

## 📷 错误截图
<!-- 请提供错误截图 -->

<img width="1618" height="807" alt="Image" src="https://github.com/user-attachments/assets/7ee432b7-def3-44ce-af70-d52d7c1a0539" />

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions