Add fetch tool for web content retrieval by Furisto · Pull Request #65 · Furisto/construct

Furisto · 2025-12-22T22:05:11Z

This PR introduces a new fetch tool that enables the agent to retrieve and process web content from HTTP(S) URLs.

Features

• HTML page fetching: Retrieves HTML pages, extracts main content using readability algorithms, and converts to clean Markdown format. Removes navigation, ads,
scripts, and other non-essential elements.
• JSON API support: Automatically detects JSON responses and pretty-prints them for readability.
• Custom headers: Supports custom HTTP headers for authentication or other requirements.
• Configurable timeout: Allows setting request timeout (defaults to 30 seconds).
• Content truncation: Handles large responses (>5MB) gracefully with truncation.

API

Input:

• url (required): HTTP or HTTPS URL to fetch
• headers (optional): Custom HTTP headers
• timeout (optional): Request timeout in seconds

Output:

• url : The fetched URL
• title : Page title (extracted from HTML or derived from URL for JSON)
• content : Processed content (Markdown for HTML, pretty-printed for JSON)
• content_type : Response type ( "html" or "json" )
• byte_size : Original response size in bytes
• truncated : Whether content was truncated due to size limits

Changes

• Add web.Fetch function with HTML extraction and Markdown conversion
• Add proto definitions for FetchInput and FetchResult messages
• Integrate fetch tool into codeact runtime
• Comprehensive error handling with helpful suggestions
• Test coverage for HTML, JSON, error cases, and edge cases

Implement a new fetch tool that retrieves HTML content via HTTP(S), extracts the main content, and converts it to Markdown format. Removes boilerplate, navigation, ads, and other non-essential elements for clean, readable output. - Add web.Fetch function with HTML extraction and Markdown conversion - Implement FetchInput/FetchResult types for consistent API - Add fetch tool integration to codeact runtime - Include comprehensive error handling with helpful suggestions - Support custom headers and configurable timeouts - Add tests covering successful fetches, error cases, and header handling - Update dependencies: html-to-markdown, go-readability for content extraction Co-authored-by: construct-agent <noreply@construct.sh>

Add FetchInput and FetchResult message types to the ToolCall and ToolResult oneof unions. These messages define the protocol buffer schema for the fetch tool, supporting URL fetching with custom headers and timeout configuration, plus structured result handling with content extraction and size information. Co-authored-by: construct-agent <noreply@construct.sh>

Extend the fetch tool to handle JSON API responses in addition to HTML pages. JSON responses are automatically pretty-printed for readability. A content_type field is added to FetchResult to indicate whether the response was "html" or "json". Co-authored-by: construct-agent <noreply@construct.sh>

Furisto and others added 4 commits December 22, 2025 21:31

Update tool description

d4360d9

Furisto marked this pull request as ready for review December 22, 2025 22:12

Furisto merged commit 903a156 into main Dec 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fetch tool for web content retrieval#65

Add fetch tool for web content retrieval#65
Furisto merged 4 commits into
mainfrom
webfetch

Furisto commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Furisto commented Dec 22, 2025

Features

API

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant