Skip to content

Add fetch tool for web content retrieval#65

Merged
Furisto merged 4 commits into
mainfrom
webfetch
Dec 22, 2025
Merged

Add fetch tool for web content retrieval#65
Furisto merged 4 commits into
mainfrom
webfetch

Conversation

@Furisto
Copy link
Copy Markdown
Owner

@Furisto Furisto commented Dec 22, 2025

This PR introduces a new fetch tool that enables the agent to retrieve and process web content from HTTP(S) URLs.

Features

• HTML page fetching: Retrieves HTML pages, extracts main content using readability algorithms, and converts to clean Markdown format. Removes navigation, ads,
scripts, and other non-essential elements.
• JSON API support: Automatically detects JSON responses and pretty-prints them for readability.
• Custom headers: Supports custom HTTP headers for authentication or other requirements.
• Configurable timeout: Allows setting request timeout (defaults to 30 seconds).
• Content truncation: Handles large responses (>5MB) gracefully with truncation.

API

Input:

• url (required): HTTP or HTTPS URL to fetch
• headers (optional): Custom HTTP headers
• timeout (optional): Request timeout in seconds

Output:

• url : The fetched URL
• title : Page title (extracted from HTML or derived from URL for JSON)
• content : Processed content (Markdown for HTML, pretty-printed for JSON)
• content_type : Response type ( "html" or "json" )
• byte_size : Original response size in bytes
• truncated : Whether content was truncated due to size limits

Changes

• Add web.Fetch function with HTML extraction and Markdown conversion
• Add proto definitions for FetchInput and FetchResult messages
• Integrate fetch tool into codeact runtime
• Comprehensive error handling with helpful suggestions
• Test coverage for HTML, JSON, error cases, and edge cases

Furisto and others added 4 commits December 22, 2025 21:31
Implement a new fetch tool that retrieves HTML content via HTTP(S), extracts
the main content, and converts it to Markdown format. Removes boilerplate,
navigation, ads, and other non-essential elements for clean, readable output.

- Add web.Fetch function with HTML extraction and Markdown conversion
- Implement FetchInput/FetchResult types for consistent API
- Add fetch tool integration to codeact runtime
- Include comprehensive error handling with helpful suggestions
- Support custom headers and configurable timeouts
- Add tests covering successful fetches, error cases, and header handling
- Update dependencies: html-to-markdown, go-readability for content extraction

Co-authored-by: construct-agent <noreply@construct.sh>
Add FetchInput and FetchResult message types to the ToolCall and ToolResult
oneof unions. These messages define the protocol buffer schema for the fetch
tool, supporting URL fetching with custom headers and timeout configuration,
plus structured result handling with content extraction and size information.

Co-authored-by: construct-agent <noreply@construct.sh>
Extend the fetch tool to handle JSON API responses in addition to HTML pages.
JSON responses are automatically pretty-printed for readability. A content_type
field is added to FetchResult to indicate whether the response was "html" or
"json".

Co-authored-by: construct-agent <noreply@construct.sh>
@Furisto Furisto marked this pull request as ready for review December 22, 2025 22:12
@Furisto Furisto merged commit 903a156 into main Dec 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant