Skip to content

add search engine#1054

Merged
wzh1994 merged 5 commits intoLazyAGI:mainfrom
wzh1994:wzh/serach
Mar 10, 2026
Merged

add search engine#1054
wzh1994 merged 5 commits intoLazyAGI:mainfrom
wzh1994:wzh/serach

Conversation

@wzh1994
Copy link
Contributor

@wzh1994 wzh1994 commented Mar 10, 2026

概述 / Summary

本 PR 为 LazyLLM 新增统一搜索工具模块,提供可扩展的搜索基类与多种搜索引擎实现,便于在多智能体等场景中接入网页检索、学术检索等能力。

This PR adds a unified search tools module to LazyLLM: an extensible search base class and multiple search engine implementations for web search, academic search, etc., in multi-agent and other scenarios.


主要改动 / Changes

1. 搜索基类与统一接口

  • SearchBaselazyllm/tools/tools/search/base.py):继承 ModuleBase,定义统一接口:
    • 子类实现 search(query, **kwargs),返回规范化的结果列表。
    • forward(query, **kwargs) 透传调用 search,异常时记录日志并返回空列表。
    • 统一结果格式:每项为包含 titleurlsnippetsource(及可选 extra)的字典,由 _make_result() 辅助构建。

2. 支持的搜索引擎实现

类名 说明 主要参数
GoogleSearch Google Custom Search custom_search_api_key, search_engine_id
TencentSearch 腾讯云 SearchPro secret_id, secret_key
BingSearch Bing Web Search API subscription_key
BochaSearch Bocha AI 网页搜索 api_key, base_url
StackOverflowSearch Stack Overflow API api_key(可选)
SemanticScholarSearch Semantic Scholar 无需 key
GoogleBooksSearch Google Books API api_key
ArxivSearch arXiv API 无需 key
WikipediaSearch Wikipedia API 无需 key

3. 模块与导出

  • 新增目录 lazyllm/tools/tools/search/,各引擎独立文件,在 search/__init__.py 中统一导出。
  • lazyllm/tools/tools/__init__.py 中增加对上述搜索类的导出,便于 from lazyllm.tools.tools import GoogleSearch, ArxivSearch, ... 使用。

4. 文档与规范

  • lazyllm/docs/tools/search.py 中为所有搜索类及 SearchBase 提供中英文文档与示例(add_chinese_doc / add_english_doc / add_example),符合项目文档规范。
  • 删除/迁移原 lazyllm/docs/tools/tool_tools.py 中与搜索相关的内容,避免重复。

使用示例 / Usage Example

from lazyllm.tools.tools import ArxivSearch

engine = ArxivSearch()
results = engine('transformer')
# 或 engine.search('transformer')
# 每项: {'title', 'url', 'snippet', 'source'}

Checklist

  • 新增 SearchBase 与多种搜索实现
  • 统一结果格式与异常处理
  • lazyllm/docs/tools/search.py 中补充中英文文档与示例
  • 通过 make lint-only-diff(需安装 flake8-quotes、flake8-bugbear)

@wzh1994 wzh1994 requested a review from a team as a code owner March 10, 2026 07:29
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LazyLLM's capabilities by integrating a robust and extensible search tool module. It provides a standardized way to perform various types of searches, from web and academic queries to specialized platforms, making it easier for multi-agent systems and other applications to leverage external information sources efficiently.

Highlights

  • Unified Search Tool Module: Introduced a new, extensible search tools module with a SearchBase class that defines a unified interface and result format for all search operations.
  • Multiple Search Engine Implementations: Added support for various search engines including Google Custom Search, Tencent SearchPro, Bing Web Search, Bocha AI Web Search, Stack Overflow, Semantic Scholar, Google Books, arXiv, and Wikipedia.
  • Modular Organization and Export: Organized search engines into individual files within a new lazyllm/tools/tools/search/ directory and updated init.py files for unified export and easy access.
  • Comprehensive Documentation: Provided detailed Chinese and English documentation with usage examples for SearchBase and all new search engine classes in lazyllm/docs/tools/search.py.
  • Refactored Existing Search Tools: Migrated and refactored existing GoogleSearch and TencentSearch implementations to conform to the new SearchBase interface and removed their old documentation.
Changelog
  • lazyllm/docs/tools/init.py
    • Added import for the new 'search' documentation module.
    • Included 'search' in the list of modules to be deleted from the namespace after import.
  • lazyllm/docs/tools/search.py
    • Created a new documentation file for the search tools.
    • Added comprehensive Chinese and English documentation for SearchBase and its methods.
    • Included detailed documentation and usage examples for all new search engine implementations (Google, Tencent, Bing, Bocha, StackOverflow, SemanticScholar, GoogleBooks, Arxiv, Wikipedia).
  • lazyllm/docs/tools/tool_tools.py
    • Removed previous documentation and examples for GoogleSearch and TencentSearch to avoid duplication and align with the new search module.
  • lazyllm/tools/tools/init.py
    • Updated imports to include SearchBase and all newly added search engine classes from the search submodule.
    • Modified the all list to export SearchBase and all new search engine classes for broader accessibility.
  • lazyllm/tools/tools/search/init.py
    • Added imports for SearchBase and all new search engine classes.
    • Updated the all list to export all search-related classes.
  • lazyllm/tools/tools/search/arxiv_search.py
    • Added a new file implementing the ArxivSearch class, inheriting from SearchBase.
    • Implemented the search method to query the arXiv API and parse results into a unified format.
  • lazyllm/tools/tools/search/base.py
    • Added a new file defining the SearchBase abstract class, which inherits from ModuleBase.
    • Introduced _make_result helper function to standardize search result formatting.
    • Defined the source_name property, an abstract search method, and a forward method with error handling.
  • lazyllm/tools/tools/search/bing_search.py
    • Added a new file implementing the BingSearch class, inheriting from SearchBase.
    • Implemented the search method to interact with the Bing Web Search API and format results.
  • lazyllm/tools/tools/search/bocha_search.py
    • Added a new file implementing the BochaSearch class, inheriting from SearchBase.
    • Implemented the search method to query the Bocha AI Web Search API and standardize its output.
  • lazyllm/tools/tools/search/google_books_search.py
    • Added a new file implementing the GoogleBooksSearch class, inheriting from SearchBase.
    • Implemented the search method to query the Google Books API and format results.
  • lazyllm/tools/tools/search/google_search.py
    • Refactored GoogleSearch to inherit from SearchBase and internally use HttpTool.
    • Adapted the search method to call the internal HttpTool and format its raw output using _make_result.
  • lazyllm/tools/tools/search/semantic_scholar_search.py
    • Added a new file implementing the SemanticScholarSearch class, inheriting from SearchBase.
    • Implemented the search method to query the Semantic Scholar API and format academic paper results.
  • lazyllm/tools/tools/search/stackoverflow_search.py
    • Added a new file implementing the StackOverflowSearch class, inheriting from SearchBase.
    • Implemented the search method to interact with the Stack Exchange API and format question results.
  • lazyllm/tools/tools/search/tencent_search.py
    • Refactored TencentSearch to inherit from SearchBase.
    • Adapted the search method to call the Tencent Cloud SearchPro API and format its results using _make_result.
  • lazyllm/tools/tools/search/wikipedia_search.py
    • Added a new file implementing the WikipediaSearch class, inheriting from SearchBase.
    • Implemented the search method to query the MediaWiki API for Wikipedia content and format results.
Activity
  • New SearchBase and various search implementations were added.
  • Unified result format and exception handling were implemented.
  • Chinese and English documentation with examples were added to lazyllm/docs/tools/search.py.
  • The changes passed make lint-only-diff.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a unified search tools module, defining an extensible SearchBase class and providing multiple search engine implementations. However, there are security concerns regarding insecure communication (e.g., Arxiv using HTTP instead of HTTPS), insecure XML parsing, and potential leakage of sensitive API keys in logs due to exception handling. Additionally, the review suggests improving exception handling to prevent silent failures, adhering to Python's standard import practices, and simplifying code logic for better readability and correctness.

Comment on lines +29 to +30
except Exception:
return []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.

Suggested change
except Exception:
return []
except Exception as e:
import lazyllm
lazyllm.LOG.error(f"BochaSearch request failed: {e}")
return []

Comment on lines +28 to +29
except Exception:
return []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs, which is consistent with the forward method in the base class and the TencentSearch implementation.

Suggested change
except Exception:
return []
except Exception as e:
import lazyllm
lazyllm.LOG.error(f"ArxivSearch request failed: {e}")
return []

Comment on lines +29 to +30
except Exception:
return []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.

Suggested change
except Exception:
return []
except Exception as e:
import lazyllm
lazyllm.LOG.error(f"BingSearch request failed: {e}")
return []

Comment on lines +25 to +26
except Exception:
return []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.

Suggested change
except Exception:
return []
except Exception as e:
import lazyllm
lazyllm.LOG.error(f"GoogleBooksSearch request failed: {e}")
return []

Comment on lines +32 to +33
except Exception:
return []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This except block catches all exceptions silently, which can hide important errors and make debugging difficult. Please log the exception before returning. This will ensure that failures are visible in the logs.

Suggested change
except Exception:
return []
except Exception as e:
import lazyllm
lazyllm.LOG.error(f"SemanticScholarSearch request failed: {e}")
return []

return self.search(query, **kwargs)
except Exception as err:
import lazyllm
lazyllm.LOG.error('Search request failed: %s', err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The SearchBase.forward method logs the entire exception object when a search request fails. For search engines that pass API keys in the URL query parameters (such as GoogleSearch, GoogleBooksSearch, and StackOverflowSearch), the exception message (e.g., from httpx) often includes the full URL, thereby leaking the API key into the application logs.

text = resp.text
except Exception:
return []
import xml.etree.ElementTree as ET

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports should be at the top of the file as per PEP 8 guidelines for better readability and to avoid re-importing on every call. Please move import xml.etree.ElementTree as ET to the top of the file.

def __init__(self, timeout: int = 15, source_name: str = 'arxiv'):
super().__init__(source_name=source_name)
self._timeout = timeout
self._url = 'http://export.arxiv.org/api/query'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The ArxivSearch tool uses an insecure HTTP URL for API requests. This exposes sensitive search queries and results to interception and tampering by attackers on the network (Man-in-the-Middle). Arxiv supports HTTPS, which should be used instead.

Suggested change
self._url = 'http://export.arxiv.org/api/query'
self._url = 'https://export.arxiv.org/api/query'

Comment on lines +34 to +35
if snippet and len(snippet) > 500:
snippet = snippet[:500] + '...'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for truncating the snippet, snippet[:500] + '...', can result in a string longer than 500 characters (up to 503). To ensure the snippet with the ellipsis does not exceed a certain length, consider adjusting the slice. For example, to keep it around 500 characters, you could use snippet[:497] + '...'.

Suggested change
if snippet and len(snippet) > 500:
snippet = snippet[:500] + '...'
if snippet and len(snippet) > 500:
snippet = snippet[:497] + '...'

Comment on lines +40 to +42
snippet = it.get('body', '')[:500] if it.get('body') else ''
if snippet and len(it.get('body', '')) > 500:
snippet = snippet + '...'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for creating the snippet is a bit complex and can be simplified for better readability and to avoid redundant checks. The if it.get('body') else '' is redundant since it.get('body', '') already provides a default empty string.

            body = it.get('body', '')
            snippet = body[:500]
            if len(body) > 500:
                snippet += '...'

@wzh1994 wzh1994 requested a review from a team as a code owner March 10, 2026 07:54
@wzh1994
Copy link
Contributor Author

wzh1994 commented Mar 10, 2026

回头申请一下key,进行一下测试验证

@wzh1994 wzh1994 merged commit c2cc491 into LazyAGI:main Mar 10, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant