This repository contains sample configuration files for Algolia Crawler:
config.basic.jswill go through all web pages on https://www.example.com and save one object per page containing thetitlealong with anobjectIDequal to the value of the meta tagpageID.config.documents.jsadopts a different record extraction strategy, depending on the type of resource being crawled: HTML page or PDF/DOC document.config.advanced.jsshowcases several advanced features: authorization, setting up a session cookie, scheduling, ignoring query params and backup.config.csv.jsshows how to integrate pageviews and categories from external CSV files.config.google-analytics.jsshows how to integrate pageviews from Google Analytics.config.splitting.jsimplements full-text search while complying with the record size limits of your Algolia plan, by splitting the textual content to fill records.
Apache 2.0 - See LICENSE for more information.