Skip to content

Conversation

SheezZarR
Copy link

Change Summary

Added a short instruction on how to enable typesense-docs-search scrapper to scrape website launched on port different from :80.

I have chosen to update here since it is a starting point for any docs search setup.

PR Checklist

@SheezZarR
Copy link
Author

@jasonbosco hi!! Is the PR needed or not really?

Ideally, you will need to run your site at port `:80`, because the scraper can present a not expected behavior if has hosted in another port.
However, there is an option to scrape your site at port other than `:80` by specifying `"allowed_domains": ["localhost"]` in yours scraper configuration.
Then it is possible to write `"start_urls": ["http://localhost:<your-port>"]`.
More detailed example of the configuration can be found [here](https://github.com/meilisearch/docs-scraper/issues/103#issuecomment-810736674)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions in this guide assume that the scraper is being run from within a Docker container, and that the website is running somewhere outside the docker container. Which is why we have a mention of the URL to use in the paragraph above this.

Using localhost in the scraper configuration when it is running inside a Docker container will make the scraper look for the docs site to also be running inside the same Docker container and won't work.

So the instructions here, won't work with the way the rest of the guide is structured around running the scraper as a Docker container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants