Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances LazyLLM's capabilities by integrating a robust and extensible search tool module. It provides a standardized way to perform various types of searches, from web and academic queries to specialized platforms, making it easier for multi-agent systems and other applications to leverage external information sources efficiently. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a unified search tools module, defining an extensible SearchBase class and providing multiple search engine implementations. However, there are security concerns regarding insecure communication (e.g., Arxiv using HTTP instead of HTTPS), insecure XML parsing, and potential leakage of sensitive API keys in logs due to exception handling. Additionally, the review suggests improving exception handling to prevent silent failures, adhering to Python's standard import practices, and simplifying code logic for better readability and correctness.
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.
| except Exception: | |
| return [] | |
| except Exception as e: | |
| import lazyllm | |
| lazyllm.LOG.error(f"BochaSearch request failed: {e}") | |
| return [] |
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs, which is consistent with the forward method in the base class and the TencentSearch implementation.
| except Exception: | |
| return [] | |
| except Exception as e: | |
| import lazyllm | |
| lazyllm.LOG.error(f"ArxivSearch request failed: {e}") | |
| return [] |
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.
| except Exception: | |
| return [] | |
| except Exception as e: | |
| import lazyllm | |
| lazyllm.LOG.error(f"BingSearch request failed: {e}") | |
| return [] |
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.
| except Exception: | |
| return [] | |
| except Exception as e: | |
| import lazyllm | |
| lazyllm.LOG.error(f"GoogleBooksSearch request failed: {e}") | |
| return [] |
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.
| except Exception: | |
| return [] | |
| except Exception as e: | |
| import lazyllm | |
| lazyllm.LOG.error(f"SemanticScholarSearch request failed: {e}") | |
| return [] |
lazyllm/tools/tools/search/base.py
Outdated
| return self.search(query, **kwargs) | ||
| except Exception as err: | ||
| import lazyllm | ||
| lazyllm.LOG.error('Search request failed: %s', err) |
There was a problem hiding this comment.
The SearchBase.forward method logs the entire exception object when a search request fails. For search engines that pass API keys in the URL query parameters (such as GoogleSearch, GoogleBooksSearch, and StackOverflowSearch), the exception message (e.g., from httpx) often includes the full URL, thereby leaking the API key into the application logs.
| text = resp.text | ||
| except Exception: | ||
| return [] | ||
| import xml.etree.ElementTree as ET |
| def __init__(self, timeout: int = 15, source_name: str = 'arxiv'): | ||
| super().__init__(source_name=source_name) | ||
| self._timeout = timeout | ||
| self._url = 'http://export.arxiv.org/api/query' |
There was a problem hiding this comment.
The ArxivSearch tool uses an insecure HTTP URL for API requests. This exposes sensitive search queries and results to interception and tampering by attackers on the network (Man-in-the-Middle). Arxiv supports HTTPS, which should be used instead.
| self._url = 'http://export.arxiv.org/api/query' | |
| self._url = 'https://export.arxiv.org/api/query' |
| if snippet and len(snippet) > 500: | ||
| snippet = snippet[:500] + '...' |
There was a problem hiding this comment.
The current logic for truncating the snippet, snippet[:500] + '...', can result in a string longer than 500 characters (up to 503). To ensure the snippet with the ellipsis does not exceed a certain length, consider adjusting the slice. For example, to keep it around 500 characters, you could use snippet[:497] + '...'.
| if snippet and len(snippet) > 500: | |
| snippet = snippet[:500] + '...' | |
| if snippet and len(snippet) > 500: | |
| snippet = snippet[:497] + '...' |
| snippet = it.get('body', '')[:500] if it.get('body') else '' | ||
| if snippet and len(it.get('body', '')) > 500: | ||
| snippet = snippet + '...' |
There was a problem hiding this comment.
The logic for creating the snippet is a bit complex and can be simplified for better readability and to avoid redundant checks. The if it.get('body') else '' is redundant since it.get('body', '') already provides a default empty string.
body = it.get('body', '')
snippet = body[:500]
if len(body) > 500:
snippet += '...'|
回头申请一下key,进行一下测试验证 |
概述 / Summary
本 PR 为 LazyLLM 新增统一搜索工具模块,提供可扩展的搜索基类与多种搜索引擎实现,便于在多智能体等场景中接入网页检索、学术检索等能力。
This PR adds a unified search tools module to LazyLLM: an extensible search base class and multiple search engine implementations for web search, academic search, etc., in multi-agent and other scenarios.
主要改动 / Changes
1. 搜索基类与统一接口
SearchBase(lazyllm/tools/tools/search/base.py):继承ModuleBase,定义统一接口:search(query, **kwargs),返回规范化的结果列表。forward(query, **kwargs)透传调用search,异常时记录日志并返回空列表。title、url、snippet、source(及可选extra)的字典,由_make_result()辅助构建。2. 支持的搜索引擎实现
GoogleSearchcustom_search_api_key,search_engine_idTencentSearchsecret_id,secret_keyBingSearchsubscription_keyBochaSearchapi_key,base_urlStackOverflowSearchapi_key(可选)SemanticScholarSearchGoogleBooksSearchapi_keyArxivSearchWikipediaSearch3. 模块与导出
lazyllm/tools/tools/search/,各引擎独立文件,在search/__init__.py中统一导出。lazyllm/tools/tools/__init__.py中增加对上述搜索类的导出,便于from lazyllm.tools.tools import GoogleSearch, ArxivSearch, ...使用。4. 文档与规范
lazyllm/docs/tools/search.py中为所有搜索类及SearchBase提供中英文文档与示例(add_chinese_doc/add_english_doc/add_example),符合项目文档规范。lazyllm/docs/tools/tool_tools.py中与搜索相关的内容,避免重复。使用示例 / Usage Example
Checklist
SearchBase与多种搜索实现lazyllm/docs/tools/search.py中补充中英文文档与示例make lint-only-diff(需安装 flake8-quotes、flake8-bugbear)