NOTE: On system update the elastic container can start to crash due to vm.max_map_count value set too low. To fix this issue run the following command
sudo sysctl -w vm.max_map_count=262144
To make it persistent, you can add this line:
vm.max_map_count=262144
in your /etc/sysctl.conf and run
sudo sysctl -p
for Windows it's
wsl -d docker-desktop
sysctl -w vm.max_map_count=262144
exit
for Linux:
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
for Windows Command Prompt:
py -3.12 -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Clone the repository, set the username and password in the .env_template freely and rename it to .env. Then run the following commands
docker compose -f docker-compose.setup.yml run --rm certs
docker compose -f docker-compose.setup.yml run --rm keystore
docker compose up -d
In the following instructions we assume you chose the username "elastic"
Create and populate the index for the person entities in Wikidata:
python utils/wikidata_load_to_elasticsearch.pyOr if you download the newest wikidata dump you need to filter by human instances:
cat latest-all.json.gz | pigz -d -k | ./utils/prep_wikidata/wikibase_dump_filter --claim ./utils/prep_wikidata/claims > humans.jsonThen prepare the data as is appropriate for your usecase. For CHNOBLi we only keep certain fields and resolve the Q-Codes into their value labels (see utils/prep_wikidata/wikidata_prep_for_elasticsearch.ipynb).
If you are creating this index for the CHNOBLi pipeline, remember to replace "plessur.ethz.ch:9200" with "localhost:9200" in the "utility/.env" file for the CHNOBLi code.
Create and populate the index for the person entities in the GND-ID:
python utils/gnd_load_to_elasticsearch.pyIf you are creating this index for the CHNOBLi pipeline, remember to replace "plessur.ethz.ch:9200" with "localhost:9200" in the "utility/.env" file for the CHNOBLi code.
GET /INDEX_NAME/_search
{
"query": {
"match" : {
"labels" : {
"query" : "wäshington geörge",
"operator": "and",
"fuzziness": "auto"
}
}
}
}
GET /INDEX_NAME/_search
{
"query": {
"match": {
"id":"Q82955"
}
},
"_source": ["labels"]
}
#!/bin/bash
HOST=localhost
PORT=9200
USER=elastic
CACERT=./secrets/certs/ca/ca.crt
curl --cacert ${CACERT} -H 'Content-Type: application/json' -su ${USER} -vv -XGET "https://${HOST}:${PORT}/INDEX_NAME/_search?pretty" -d '
{
"query": {
"match" : {
"labels" : {
"query" : "wäshington geörge",
"operator": "and",
"fuzziness": "auto"
}
}
}
}
'
It is taking longer than expected for the data to be published in the Research Collection, replace the URL in utils/wikidata_load_to_elasticsearch.py by https://polybox.ethz.ch/index.php/s/tX5CYKa4ZcHRmLJ/download and the one in utils/gnd_load_to_elasticsearch.py by https://polybox.ethz.ch/index.php/s/3SsnHMCeRfaSaKx/download
Try replacing base_url = "https://localhost:9200" with base_url = f"https://{ELASTIC_USERNAME}:{ELASTIC_PASSWORD}@localhost:9200". The ElasticSearch library should take care of that for you but that might help.
Start up Docker Desktop manually.
This is a certain character (end of line sequence) that only Windows OS uses and which it adds automatically if you open a file in Windows. Open the affected file (likely /setup/setup-certs.sh and /setup/setup-keystore.sh) in VSCode, press F1 and search for Change End of Line Sequence, then select LF.
ERROR: will not overwrite keystore at [/usr/share/elasticsearch/config/elasticsearch.keystore], because this incurs changing the file owner
Something went wrong with the setup, delete the repo and clone it again.
Something went wrong with the setup, delete the repo and clone it again.