RAG-based knowledge assistant for CKAN - semantic search over datasets using natural language queries.
- CKAN >= 2.10
- Python >= 3.10
- PostgreSQL with pgvector extension
- Ollama (for local models) OR OpenAI API key
Compatibility with core CKAN versions:
| CKAN version | Compatible? |
|---|---|
| 2.9 and earlier | not tested |
| 2.10 | not tested |
| 2.11 | yes |
Suggested values:
- "yes"
- "not tested" - I can't think of a reason why it wouldn't work
- "not yet" - there is an intention to get it working
- "no"
# Ubuntu/Debian
sudo apt-get install postgresql-15-pgvector
# Or from source
cd /tmp
git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make installsudo -u postgres psql -d your_ckan_db -c "CREATE EXTENSION vector;"# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull qwen3:8b-q4_K_M
ollama pull nomic-embed-textActivate your CKAN virtual environment:
. /usr/lib/ckan/default/bin/activateClone the source and install it on the virtualenv:
git clone https://github.com/DataShades/ckanext-knowledge-assistant.git
cd ckanext-knowledge-assistant
pip install -e .
pip install -r requirements.txtAdd knowledge_assistant to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).
Add the following configuration settings:
# LLM Configuration
ckanext.knowledge_assistant.llm_provider = ollama
ckanext.knowledge_assistant.llm_model = qwen3:8b-q4_K_M
ckanext.knowledge_assistant.ollama_base_url = http://localhost:11434
# Embedding Configuration
ckanext.knowledge_assistant.embedding_provider = ollama
ckanext.knowledge_assistant.embedding_model = nomic-embed-text
# Vector Store (optional - defaults to CKAN's database)
# ckanext.knowledge_assistant.vector_store_url = postgresql://user:pass@localhost/ckan
ckanext.knowledge_assistant.vector_store_table = knowledge_assistant_embeddings
# Query Settings
ckanext.knowledge_assistant.similarity_top_k = 5For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reloadckan -c /etc/ckan/default/ckan.ini knowledge-assistant indexIndex datasets:
# Initial indexing
ckan knowledge-assistant index
# Re-index (clears existing data)
ckan knowledge-assistant index --forceTest queries:
# Interactive mode
ckan knowledge-assistant test-query
# Direct query
ckan knowledge-assistant test-query -q "Show me datasets about soil"To install ckanext-knowledge-assistant for development, activate your CKAN virtualenv and do:
git clone https://github.com/DataShades/ckanext-knowledge-assistant.git
cd ckanext-knowledge-assistant
pip install -e .
pip install -r dev-requirements.txt
To run the tests, do:
pytest --ckan-ini=test.ini
If ckanext-knowledge-assistant should be available on PyPI you can follow these steps to publish a new version:
-
Update the version number in the
pyproject.tomlfile. See PEP 440 for how to choose version numbers. -
Make sure you have the latest version of necessary packages:
pip install --upgrade setuptools wheel twine
-
Create a source and binary distributions of the new version:
python -m build && twine check dist/*Fix any errors you get.
-
Upload the source distribution to PyPI:
twine upload dist/* -
Commit any outstanding changes:
git commit -a git push -
Tag the new release of the project on GitHub with the version number from the
setup.pyfile. For example if the version number insetup.pyis 0.0.1 then do:git tag 0.0.1 git push --tags