Skip to content

Conversation

timothycarambat
Copy link
Member

@timothycarambat timothycarambat commented Oct 7, 2025

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

--

What is in this change?

  • Allows the collector to reasonably handle URLS without protocols

This happens a lot when an LLM creates the web-scraping request resulting in an implied URL like anythingllm.com.

Currently, the collector fails to reach this URL due to lack of protocol since it fails the validURL function check since new URL('anythingllm.com') is actually invalid.

Instead, we now handle implied protocols to remap this URL to a https:// protocol, while still reserving the protocol if set.

Additional Information

  • Updated the feedback loop for web-scraping so user can know when LLM is processing the site text and how large that payload might be.

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@timothycarambat timothycarambat merged commit cf3fbcb into master Oct 7, 2025
2 checks passed
@timothycarambat timothycarambat deleted the improve-url-handler-collector branch October 7, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant