Conversation
Implement a new fetch tool that retrieves HTML content via HTTP(S), extracts the main content, and converts it to Markdown format. Removes boilerplate, navigation, ads, and other non-essential elements for clean, readable output. - Add web.Fetch function with HTML extraction and Markdown conversion - Implement FetchInput/FetchResult types for consistent API - Add fetch tool integration to codeact runtime - Include comprehensive error handling with helpful suggestions - Support custom headers and configurable timeouts - Add tests covering successful fetches, error cases, and header handling - Update dependencies: html-to-markdown, go-readability for content extraction Co-authored-by: construct-agent <noreply@construct.sh>
Add FetchInput and FetchResult message types to the ToolCall and ToolResult oneof unions. These messages define the protocol buffer schema for the fetch tool, supporting URL fetching with custom headers and timeout configuration, plus structured result handling with content extraction and size information. Co-authored-by: construct-agent <noreply@construct.sh>
Extend the fetch tool to handle JSON API responses in addition to HTML pages. JSON responses are automatically pretty-printed for readability. A content_type field is added to FetchResult to indicate whether the response was "html" or "json". Co-authored-by: construct-agent <noreply@construct.sh>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a new fetch tool that enables the agent to retrieve and process web content from HTTP(S) URLs.
Features
• HTML page fetching: Retrieves HTML pages, extracts main content using readability algorithms, and converts to clean Markdown format. Removes navigation, ads,
scripts, and other non-essential elements.
• JSON API support: Automatically detects JSON responses and pretty-prints them for readability.
• Custom headers: Supports custom HTTP headers for authentication or other requirements.
• Configurable timeout: Allows setting request timeout (defaults to 30 seconds).
• Content truncation: Handles large responses (>5MB) gracefully with truncation.
API
Input:
• url (required): HTTP or HTTPS URL to fetch
• headers (optional): Custom HTTP headers
• timeout (optional): Request timeout in seconds
Output:
• url : The fetched URL
• title : Page title (extracted from HTML or derived from URL for JSON)
• content : Processed content (Markdown for HTML, pretty-printed for JSON)
• content_type : Response type ( "html" or "json" )
• byte_size : Original response size in bytes
• truncated : Whether content was truncated due to size limits
Changes
• Add web.Fetch function with HTML extraction and Markdown conversion
• Add proto definitions for FetchInput and FetchResult messages
• Integrate fetch tool into codeact runtime
• Comprehensive error handling with helpful suggestions
• Test coverage for HTML, JSON, error cases, and edge cases