Open
Description
Description
We currently have a site that we set up in the scraper config that is hosted on a non-standard HTTP/HTTPS port (3000). When setting the start_urls
to a hostname with a port e.g. http://my-host:3000/
, the scraper fails with an error message suggesting it does not accept domains with ports. It looks like the old algolia scraper configs used to support ports so I assume this is related to an update to the scrapy package used in this forked solution.
Steps to reproduce
- Build and run a docusaurus site locally, serving on
http://localhost:3000
- Update the Docsearch config to set the start_urls
"start_urls":["http://localhost:3000/"]
- run the docsearch scraper
Expected Behavior
- Site is scraped and uploaded to Typesense server
Actual Behavior
Error returned from scraper:
PortWarning: allowed_domains accepts only domains without ports. Ignoring entry localhost:3000 in allowed_domains.
warnings.warn(message, PortWarning)
Metadata
Typesense Version:
Docker images:
- typesense/typesense:0.24.1
- typesense/docsearch-scraper:0.6.0
OS: Linux
Metadata
Assignees
Labels
No labels