SciChat Semantic AI Platform

A unified technical platform enabling an agent conversationnel to search, explore, and retrieve scientific metadata and resources using a Knowledge Graph (Oxigraph), Vector Search (Elasticsearch), and LLM (Ollama).

✨ Features

GraphRAG: Retrieval-Augmented Generation combining vector search + knowledge graph + LLM
SKG-IF Compliant: Follows the Scientific Knowledge Graphs Interoperability Framework
Multi-Source Ingestion: arXiv, EuropePMC, PubChem, CopolDB crawlers with resume support
MCP Server: Model Context Protocol endpoint for LLM tool integration
OpenWebUI Integration: Ready-to-use functions and pipelines

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   OpenWebUI     │────▶│   MCP Server    │────▶│   Backend API   │
│   (Frontend)    │     │   (FastMCP)     │     │   (FastAPI)     │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                        ┌────────────────────────────────┼────────────────────────────────┐
                        │                                │                                │
                        ▼                                ▼                                ▼
               ┌─────────────────┐              ┌─────────────────┐              ┌─────────────────┐
               │  Elasticsearch  │              │    Oxigraph     │              │     Ollama      │
               │ (Vector Search) │              │ (Knowledge Graph)│              │     (LLM)       │
               └─────────────────┘              └─────────────────┘              └─────────────────┘

The system is built on four main pillars:

Data Layer:
- Oxigraph: RDF Triplestore with SKG-IF ontology (Research Products, Agents, Topics, Venues)
- Elasticsearch: Vector store for semantic similarity search on abstracts/titles
Ingestion Layer:
- Python ETL pipeline (ingestion/) with crawlers for arXiv, EuropePMC, PubChem
- RDF transformation following SKG-IF data model
- Vector embeddings with SentenceTransformers
Service Layer:
- FastAPI (api/) providing GraphRAG, SPARQL, and Vector search endpoints
- Ollama integration for LLM generation
Agentic Layer:
- FastMCP (mcp/) exposing API functions as tools to LLMs
- OpenWebUI: Chat interface with GraphRAG tools

Prerequisites

Docker & Docker Compose installed.
Ollama running locally on the host machine (default port 11434) for LLM inference (e.g., Llama 3).
- Note: OpenWebUI is configured to talk to host.docker.internal:11434.

Quick Start

Setup Directories:
```
./setup.sh
```
Start Services:
```
./start.sh
```
Access:
- OpenWebUI: http://localhost:3000
- API Docs: http://localhost:8000/docs
- Oxigraph UI: http://localhost:7878

Configuration

Ingestion

The ingestion service runs automatically on startup (defined in docker-compose.yml). It loads mock data located in ingestion/scripts/run_ingestion.py. To ingest real data, modify this script to connect to your OAI-PMH or FTP sources.

MCP & LLM

The MCP Server runs on a separate container but is not automatically "connected" to the LLM inside OpenWebUI unless configured.

In a production setup, you would configure the LLM runner to attach to the MCP server.
Currently, the MCP server acts as a standalone tool provider that can be queried.

Project Structure

.
├── api/                 # FastAPI Service (GraphRAG + SPARQL + Vector)
│   └── src/
│       ├── routers/     # API endpoints (rag.py, search.py)
│       └── services/    # Business logic (rag_service, ollama_service, etc.)
├── ingestion/           # Python ETL Pipeline
│   ├── scripts/         # Crawlers and ingestion scripts
│   └── src/
│       ├── extractors/  # Source-specific crawlers (arXiv, EuropePMC, PubChem)
│       ├── transformers/# RDF transformation (SKG-IF)
│       └── loaders/     # AllegroGraph + Elasticsearch loaders
├── mcp/                 # FastMCP Server (MCP tools for LLMs)
├── openwebui/           # OpenWebUI integration
│   ├── functions/       # GraphRAG tools for OpenWebUI
│   ├── pipelines/       # Auto-augmentation pipeline
│   └── models/          # Custom model configurations
├── data/                # Persistent storage (created by setup.sh)
├── docker-compose.yml   # Orchestration
├── RUNBOOK.md           # Operations guide
├── setup.sh             # Init script
└── start.sh             # Launch script

API Endpoints

Endpoint	Method	Description
`/rag/query`	POST	GraphRAG query (vector search + LLM generation)
`/rag/health`	GET	Health check for all components
`/search/semantic`	POST	Vector similarity search
`/search/sparql`	POST	Direct SPARQL query

SKG-IF Ontology

The system uses the Scientific Knowledge Graphs Interoperability Framework (SKG-IF) for RDF metadata:

Entity	Description
`ResearchProduct`	Publications, datasets, software
`Agent` (Person/Organisation)	Authors, institutions
`Topic`	Keywords, categories
`Venue`	Journals, conferences
`DataSource`	Provenance (arXiv, EuropePMC, PubChem)
`Contribution`	Author roles with ranking
`Manifestation`	Access URLs (HTML, PDF)
`Identifier`	DOI, PMID, arXiv ID

See: https://skg-if.github.io/

Technologies

Python 3.11
Oxigraph (RDF Triplestore)
Elasticsearch v8 (Vector Search)
FastAPI (REST API)
FastMCP (Model Context Protocol)
Ollama (LLM inference)
OpenWebUI (Chat interface)
SentenceTransformers (Embeddings)
rdflib (RDF processing)

OpenWebUI Integration

See openwebui/README.md for detailed instructions on:

Importing GraphRAG tools
Configuring the RAG pipeline
Creating a custom SciChat model

License

Apache 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciChat Semantic AI Platform

✨ Features

Architecture

Prerequisites

Quick Start

Configuration

Ingestion

MCP & LLM

Project Structure

API Endpoints

SKG-IF Ontology

Technologies

OpenWebUI Integration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
docs		docs
ingestion		ingestion
mcp		mcp
ollama		ollama
openwebui		openwebui
.gitignore		.gitignore
PROMPTS.md		PROMPTS.md
README.md		README.md
RUNBOOK.md		RUNBOOK.md
docker-compose.yml		docker-compose.yml
setup.sh		setup.sh
start.sh		start.sh
tools_full.json		tools_full.json

Folders and files

Latest commit

History

Repository files navigation

SciChat Semantic AI Platform

✨ Features

Architecture

Prerequisites

Quick Start

Configuration

Ingestion

MCP & LLM

Project Structure

API Endpoints

SKG-IF Ontology

Technologies

OpenWebUI Integration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages