crawler-configurations-examples

This repository contains sample configuration files for Algolia Crawler:

config.basic.js will go through all web pages on https://www.example.com and save one object per page containing the title along with an objectID equal to the value of the meta tag pageID.
config.documents.js adopts a different record extraction strategy, depending on the type of resource being crawled: HTML page or PDF/DOC document.
config.advanced.js showcases several advanced features: authorization, setting up a session cookie, scheduling, ignoring query params and backup.
config.csv.js shows how to integrate pageviews and categories from external CSV files.
config.google-analytics.js shows how to integrate pageviews from Google Analytics.
config.splitting.js implements full-text search while complying with the record size limits of your Algolia plan, by splitting the textual content to fill records.

License

Apache 2.0 - See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
tests		tests
LICENSE		LICENSE
README.md		README.md
config.advanced.js		config.advanced.js
config.basic.js		config.basic.js
config.csv.js		config.csv.js
config.documents.js		config.documents.js
config.google-analytics.js		config.google-analytics.js
config.splitting.js		config.splitting.js