Skip to content

Use Port in start_urls #42

Open
Open
@JasonWhall

Description

Description

We currently have a site that we set up in the scraper config that is hosted on a non-standard HTTP/HTTPS port (3000). When setting the start_urls to a hostname with a port e.g. http://my-host:3000/ , the scraper fails with an error message suggesting it does not accept domains with ports. It looks like the old algolia scraper configs used to support ports so I assume this is related to an update to the scrapy package used in this forked solution.

Steps to reproduce

  • Build and run a docusaurus site locally, serving on http://localhost:3000
  • Update the Docsearch config to set the start_urls "start_urls":["http://localhost:3000/"]
  • run the docsearch scraper

Expected Behavior

  • Site is scraped and uploaded to Typesense server

Actual Behavior

Error returned from scraper:

PortWarning: allowed_domains accepts only domains without ports. Ignoring entry localhost:3000 in allowed_domains.
  warnings.warn(message, PortWarning)

Metadata

Typesense Version:

Docker images:

  • typesense/typesense:0.24.1
  • typesense/docsearch-scraper:0.6.0

OS: Linux

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions