Self-hosted search engine you can use for your static blog or about any other website you want search functionality for.
My live instance is at http://search.cweiske.de/ and indexes my website, blog and all linked URLs.
- Crawler and indexer with the ability to run many in parallel
 - Shows and highlights text that contains search words
 - Boolean search queries:
foo barsearches forfoo AND barfoo OR bartitle:foosearches forfooonly in the page title
 - Facets for tag, domain, language and type
 - Date search:
before:2016-08-30- modification date before that dayafter:2016-08-30- modified after that daydate::2016-08-30- exact modification day match
 - Site search
- Query: 
foo bar site:example.org/dir/ - or use the 
siteGET parameter:/?q=foo&site=example.org/dir 
 - Query: 
 - OpenSearch support with HTML and Atom result lists
 - Instant indexing with WebSub (formerly PubSubHubbub)
 
- PHP 8.x
 - Elasticsearch 2.0
 - MySQL or MariaDB for WebSub subscriptions
 - Gearman (Debian 9: 
gearman-job-server, notgearman-server) gearadmincommand line tool (gearman-toolspackage)- PHP Gearman extension
 - Some PHP libraries that get installed with composer
 
Install and run Elasticsearch and Gearman
Install
php-gearmanandgearman-toolsGet a local copy of the code:
$ git clone https://git.cweiske.de/phinde.git phinde
Install dependencies via composer:
$ composer install --no-dev
Point your webserver's document root to phinde's
wwwdirectoryCopy
data/config.php.disttodata/config.phpand adjust it. Make sure your add your domain to the crawl whitelist.Create a MySQL database and import the schema from
data/schema.sqlRun
bin/setup.phpwhich sets up the Elasticsearch schemaPut your homepage into the queue:
$ ./bin/process.php http://example.org/
Start at least one worker to process the crawl+index queue:
$ ./bin/phinde-worker.php
Check phinde's status page in your browser. The number of open tasks should be > 0, the number of workers also.
When your site changed, the search engine needs to re-crawl and re-index the pages.
Simply tell phinde that something changed by running:
$ ./bin/process.php http://example.org/foo.htm
phinde supports HTML pages and Atom feeds, so if your blog has a feed it's enough to let phinde reindex that one. It will find all linked pages automatically.
Adding a simple search form to your website is easy. It needs two things:
<form>tag with an action that points to the phinde instance- Search text field with name of 
q. 
Example:
<form method="get" action="http://phinde.example.org"> <input type="text" name="q" placeholder="Search text"/> <button type="submit">Search</button> </form>
When using systemd, you can let it run multiple worker instances when the system boots up:
Copy files
data/systemd/phinde*.serviceinto/etc/systemd/system/Adjust user and group names, and the work directories
Enable three worker processes:
$ systemctl daemon-reload $ systemctl enable phinde@1 $ systemctl enable phinde@2 $ systemctl enable phinde@3 $ systemctl enable phinde $ systemctl start phinde
Now three workers are running. Restarting the
phindeservice also restarts the workers.
Run bin/renew-subscriptions.php once a day with cron.
It will renew the WebSub subscriptions.
Delete index data from one domain:
$ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query
That's delete-by-query 2.0, see https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html
Phinde supports WebSub to get subscribe to changes of a website. When phinde gets notified by the website's hub about changes, it will immediately crawl and index the changed pages.
Subscribe to a website's feed:
$ php bin/subscribe.php http://example.org/feed.atom
Phinde will determine the website's hub and send a registration request to it.
The status page will show the number of working, and the number of open subscriptions.
Unsubscribing also happens on command line:
$ php bin/unsubscribe.php http://example.org/feed.atom
phinde's source code is available from http://git.cweiske.de/phinde.git or the mirror on github.
phinde is licensed under the AGPL v3 or later.
phinde was written by Christian Weiske.