Natural Language to SPARQL Converter

A multi-agent system for converting natural language queries into SPARQL queries for knowledge graph exploration, implemented using a Master-Slave architecture with AutoGen.

Project Structure

NL2SPARQL
│
├── config/                       # Configuration settings
│   ├── __init__.py
│   ├── agent_config.py           # AutoGen agent configurations
│   └── api_config.py             # External API configurations
│
├── agents/                       # Agent implementations
│   ├── __init__.py
│   ├── master_agent.py           # Coordinates the entire workflow
│   ├── query_refinement.py       # Refines ambiguous natural language queries
│   ├── entity_recognition.py     # Extracts ontology-related entities from queries
│   ├── ontology_mapping.py       # Maps entities to formal ontology terms
│   ├── tool_selection.py         # Selects appropriate SPARQL templates
│   ├── plan_formulation.py       # Creates query execution plans
│   ├── validation.py             # Validates plans to prevent hallucinations
│   ├── sparql_construction.py    # Constructs SPARQL queries from plans
│   ├── sparql_validation.py      # Validates SPARQL query syntax and semantics
│   ├── tool_execution.py         # Wrapper for query execution
│   ├── query_execution.py        # Executes SPARQL queries against endpoints
│   └── response_generation.py    # Generates natural language responses
│
├── database/                     # Database connectors
│   ├── __init__.py
│   ├── qdrant_client.py          # Vector database for semantic search
│   ├── elastic_client.py         # Entity resolution and search
│   └── ontology_store.py         # RDF graph management and access
│
├── models/                       # Machine learning models
│   ├── __init__.py
│   ├── embeddings.py             # Embedding models (Bi-encoder, Cross-encoder)
│   └── entity_recognition.py     # GLiNER entity recognition model
│
├── tools/                        # Utility tools
│   ├── __init__.py
│   └── sparql_tools.py           # SPARQL query utilities
│
├── utils/                        # General utilities
│   ├── __init__.py
│   └── logging_utils.py          # Logging configuration and tools
│
├── templates/                    # Query templates
│   └── sparql/                   # SPARQL query templates
│       ├── class_instances.json  # Template for listing class instances
│       ├── instance_properties.json # Template for instance properties
│       ├── property_values.json  # Template for property values
│       ├── instance_exists.json  # Template for checking instance existence
│       └── filtered_instances.json # Template for filtered instances
│
├── assets/                        # Data files
│   └── ontologies/
│       └── academic_ontology.ttl # Sample academic domain ontology
│
├── main.py                       # Application entry point
├── requirements.txt              # Project dependencies
└── README.md                     # Project documentation

Architecture Overview

This system uses a Master-Slave architecture where a central Master Agent coordinates multiple specialized Slave Agents, each responsible for a specific task in the natural language to SPARQL conversion workflow.

Master Agent

The Master Agent serves as the central coordinator with these responsibilities:

Receiving and analyzing natural language queries about knowledge graphs
Orchestrating the workflow between slave agents
Making high-level decisions about query processing strategy
Evaluating outputs from slave agents
Synthesizing the final response to the user

Slave Agents

Each agent is highly specialized and contributes to a specific part of the query processing pipeline:

Query Refinement Agent
- Processes raw user queries and conversation history
- Uses vector search to find similar examples
- Transforms ambiguous or context-dependent queries into standalone, well-structured queries
- Considers conversation context for query improvement
Entity Recognition Agent
- Uses GLiNER model with ontology-specific entity types
- Identifies knowledge graph-specific entities (classes, properties, instances, literals)
- Extracts relevant terms from natural language
- Determines query types and patterns
Ontology Mapping Agent
- Uses embedding similarity and ontology structure
- Maps extracted entities to specific ontology terms
- Resolves ambiguities when multiple mappings exist
- Handles synonyms and understands class hierarchies
Tool Selection Agent
- Selects appropriate SPARQL templates and patterns
- Uses vector similarity for template matching
- Matches query intent to template patterns
- Considers query complexity requirements
Plan Formulation Agent
- Creates execution plans for queries
- Generates step-by-step plans for execution
- Handles complex queries requiring multiple SPARQL statements
- Plans query optimization strategies
Validation Agent
- Validates execution plans to prevent hallucinations
- Checks logical consistency of plans
- Ensures plan steps are appropriate for the query
- Prevents invalid query constructions
SPARQL Construction Agent
- Builds SPARQL queries based on templates and entities
- Fills templates with entity values
- Constructs syntactically correct SPARQL
- Handles complex query components like FILTER, OPTIONAL, and UNION
SPARQL Validation Agent
- Validates syntactic correctness of generated SPARQL
- Checks semantic validity against the ontology
- Ensures queries will execute correctly
- Detects potential performance issues
Query Execution Agent
- Executes SPARQL queries against configured endpoints
- Handles authentication and rate limiting
- Processes results and error handling
- Manages query caching and optimization
Response Generation Agent
- Transforms SPARQL results into natural language responses
- Formats complex results into readable forms
- Provides explanations of the query and results
- Generates user-friendly responses

Technology Stack

Agent Framework: Microsoft AutoGen
Vector Database: Qdrant for vector search of similar queries and patterns
Entity Resolution: Elasticsearch for fuzzy search and handling misspellings
Triple Store: RDF store for ontology access (can use GraphDB, Stardog, Apache Jena)
Embedding Models:
- BiEncoder for general semantic matching
- CrossEncoder for precise reranking
Entity Recognition: GLiNER (Generalist Language Interface for Named Entity Recognition)
Language Models: GPT-3.5/4 for agents requiring reasoning and natural language processing

Key Components

Ontology Management

The system relies on access to ontology information:

Class hierarchies
Property domains and ranges
Instance data
Vocabulary and concept definitions

SPARQL Templates

A collection of parameterized SPARQL query templates for common question types:

Entity lookup ("What is X?")
Relationship queries ("How are X and Y related?")
Attribute queries ("What is the value of property P for entity E?")
Filtering queries ("Which entities have property P greater than value V?")

Query Patterns

Support for various SPARQL query forms:

SELECT: Retrieving specific values
ASK: Yes/no questions
DESCRIBE: Getting all information about a resource
CONSTRUCT: Creating new RDF graphs

Setup and Installation

Prerequisites

Python 3.9 or higher
Docker and Docker Compose
At least 8GB of RAM recommended
Basic knowledge of Docker commands

1. Docker Setup

First, set up the required services using Docker:

Create a docker-compose.yml file:

version: '3.8'

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    restart: unless-stopped

  graphdb:
    image: ontotext/graphdb:10.6.0
    ports:
      - "7200:7200"
    environment:
      - GDB_HEAP_SIZE=4g
      - GDB_MIN_MEM=1g
      - GDB_MAX_MEM=4g
    volumes:
      - graphdb_data:/opt/graphdb/home
    restart: unless-stopped

volumes:
  qdrant_data:
  elasticsearch_data:
  graphdb_data:

Start the services:

docker-compose up -d

Verify the services are running:

Qdrant: Visit http://localhost:6333/dashboard
Elasticsearch: Visit http://localhost:9200
GraphDB: Visit http://localhost:7200

2. Python Environment Setup

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

3. Environment Configuration

Create a .env file with the following variables:

# OpenAI API
OPENAI_API_KEY=your_openai_api_key

# Database URLs
QDRANT_URL=http://localhost:6333
ELASTICSEARCH_URL=http://localhost:9200
GRAPHDB_URL=http://localhost:7200
GRAPHDB_REPOSITORY=your-repo-name

4. Initialize the System

Start the application:

python main.py

Access the Gradio interface at http://localhost:7860

Troubleshooting

Port Conflicts: If you get port conflict errors, change the port mappings in the docker-compose.yml file.
Memory Issues: Adjust the memory settings in the docker-compose.yml if you encounter out-of-memory errors.
Service Health: Use docker-compose ps to check if all services are running properly.
Cleanup: To remove all containers and volumes:

docker-compose down -v

Usage Examples

Basic Queries

User Query: "What are all the subclasses of Person?"

Generated SPARQL:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/ontology#>

SELECT ?subclass ?label
WHERE {
  ?subclass rdfs:subClassOf ex:Person .
  OPTIONAL { ?subclass rdfs:label ?label }
}

Complex Queries

User Query: "Find all research papers published after 2020 with 'machine learning' in the title and authored by someone from Stanford University"

Generated SPARQL:

PREFIX ex: <http://example.org/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?paper ?title ?date ?author ?authorName
WHERE {
  ?paper a ex:ResearchPaper ;
         ex:title ?title ;
         ex:publicationDate ?date ;
         ex:hasAuthor ?author .
  ?author ex:affiliation ?affiliation .
  ?affiliation rdfs:label ?affLabel .
  OPTIONAL { ?author ex:name ?authorName }
  
  FILTER (CONTAINS(LCASE(?title), "machine learning"))
  FILTER (?date >= "2020-01-01"^^xsd:date)
  FILTER (CONTAINS(LCASE(?affLabel), "stanford university"))
}

Extending the System

Adding New Ontologies

To support new knowledge domains:

Load new ontology files or configure access to ontology endpoints
Update entity recognition patterns for domain-specific terminology
Add domain-specific SPARQL templates

Improving Query Understanding

To enhance natural language understanding:

Add more example mappings between natural language and SPARQL patterns
Fine-tune entity recognition for specific domains
Expand the template library with variations of common query patterns

Architecture Advantages

Modularity: Each agent handles a specific task, enabling focused development and testing
Extensibility: Easy to add support for new ontologies or query patterns
Quality Control: Validation ensures syntactically and semantically correct SPARQL
Explainability: System can show the mapping from natural language to formal query elements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Language to SPARQL Converter

Project Structure

Architecture Overview

Master Agent

Slave Agents

Technology Stack

Key Components

Ontology Management

SPARQL Templates

Query Patterns

Setup and Installation

Prerequisites

1. Docker Setup

2. Python Environment Setup

3. Environment Configuration

4. Initialize the System

Troubleshooting

Usage Examples

Basic Queries

Complex Queries

Extending the System

Adding New Ontologies

Improving Query Understanding

Architecture Advantages

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
agents		agents
assets/ontologies		assets/ontologies
config		config
database		database
models		models
notebooks		notebooks
templates/sparql		templates/sparql
tools		tools
utils		utils
.env		.env
.gitignore		.gitignore
README.md		README.md
gradio_app.py		gradio_app.py
main.py		main.py
main_2.py		main_2.py
requirements.txt		requirements.txt

endjin/Multi-Agents-NL2SPARQL

Folders and files

Latest commit

History

Repository files navigation

Natural Language to SPARQL Converter

Project Structure

Architecture Overview

Master Agent

Slave Agents

Technology Stack

Key Components

Ontology Management

SPARQL Templates

Query Patterns

Setup and Installation

Prerequisites

1. Docker Setup

2. Python Environment Setup

3. Environment Configuration

4. Initialize the System

Troubleshooting

Usage Examples

Basic Queries

Complex Queries

Extending the System

Adding New Ontologies

Improving Query Understanding

Architecture Advantages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages