Skip to content
This repository was archived by the owner on Nov 10, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions PR_DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Add Neo4jSearchTool for Semantic Search in Neo4j Graph Databases

## Summary

This PR introduces `Neo4jSearchTool`, a new RAG-based tool that enables semantic search capabilities over Neo4j graph databases. The tool extends the existing `RagTool` infrastructure and follows the same pattern as `MySQLSearchTool` and `PGSearchTool`, providing CrewAI agents with the ability to intelligently query and search graph data using natural language queries.

## What This PR Adds

- **Neo4jSearchTool**: A semantic search tool for Neo4j databases that executes Cypher queries and enables RAG-based search over graph data
- **Neo4jLoader**: A dedicated loader for Neo4j databases that handles Cypher query execution and result formatting
- **DataType.NEO4J**: New data type enum value for Neo4j integration with proper chunker and loader mappings
- **Comprehensive test suite**: Full test coverage with 6 test cases validating initialization, data addition, query execution, and edge cases
- **Documentation**: Complete README.md with usage examples, configuration options, and connection URI formats

## Key Features

✅ **Semantic Search Over Graph Data**: Leverages RAG technology to enable natural language queries over Neo4j graph databases
✅ **Cypher Query Support**: Executes Cypher queries to extract nodes, relationships, and properties from Neo4j
✅ **Flexible Connection Options**: Supports Bolt, Neo4j URI, and secure TLS/SSL connection schemes
✅ **Customizable LLM/Embeddings**: Full support for custom model providers and embedding configurations
✅ **RAG Integration**: Seamlessly integrates with existing CrewAI RAG infrastructure for vector search and retrieval

## Implementation Details

### Core Components

1. **Neo4jSearchTool** (`crewai_tools/tools/neo4j_search_tool/neo4j_search_tool.py`)
- Extends `RagTool` class
- Handles Neo4j connection credentials (URI, user, password)
- Manages Cypher query execution and semantic search

2. **Neo4jLoader** (`crewai_tools/rag/loaders/neo4j_loader.py`)
- Implements `BaseLoader` interface
- Executes Cypher queries using Neo4j Python driver
- Formats query results into structured text for RAG processing
- Supports secure connections with optional TLS/SSL

3. **DataType Integration**
- Added `NEO4J` enum value to `DataType`
- Configured chunker mapping (uses TextChunker)
- Configured loader mapping (uses Neo4jLoader)

### Dependencies

- Added `neo4j>=5.0.0` as an optional dependency in `pyproject.toml`
- Import handling with graceful fallback if neo4j package is not installed

## Testing

The PR includes comprehensive test coverage in `tests/tools/test_neo4j_search_tool.py`:

- ✅ Tool initialization with connection parameters
- ✅ Adding data via Cypher queries
- ✅ Running semantic search queries
- ✅ Custom similarity threshold and limit parameters
- ✅ Handling empty/no results scenarios
- ✅ Description generation

All tests pass successfully with mocked Neo4j connections to avoid requiring actual database instances.

## Usage Example

```python
from crewai_tools import Neo4jSearchTool

# Initialize the tool
tool = Neo4jSearchTool(
neo4j_uri='bolt://localhost:7687',
neo4j_user='neo4j',
neo4j_password='your_password'
)

# Add data from a Cypher query
tool.add("MATCH (n:Person)-[:KNOWS]->(f:Person) RETURN n.name as person, f.name as friend")

# Perform semantic search
result = tool._run(
search_query="Find people who know others",
similarity_threshold=0.7,
limit=10
)
```

## Files Changed

### New Files
- `crewai_tools/tools/neo4j_search_tool/neo4j_search_tool.py` - Main tool implementation
- `crewai_tools/tools/neo4j_search_tool/README.md` - Comprehensive documentation
- `crewai_tools/rag/loaders/neo4j_loader.py` - Neo4j database loader
- `tests/tools/test_neo4j_search_tool.py` - Test suite

### Modified Files
- `crewai_tools/rag/data_types.py` - Added NEO4J data type enum
- `crewai_tools/__init__.py` - Added Neo4jSearchTool export
- `crewai_tools/tools/__init__.py` - Added Neo4jSearchTool import
- `pyproject.toml` - Added neo4j optional dependency

## Benefits

1. **Extends Database Tool Support**: Adds graph database support alongside existing relational database tools (MySQL, PostgreSQL)
2. **Semantic Search for Graphs**: Enables intelligent querying of graph data using natural language, not just structured Cypher queries
3. **Consistent API**: Follows the same patterns as existing database search tools for easy adoption
4. **Production Ready**: Includes error handling, secure connection support, and comprehensive testing

## Compatibility

- ✅ Backward compatible - No breaking changes to existing functionality
- ✅ Follows existing patterns - Consistent with MySQLSearchTool and PGSearchTool
- ✅ Optional dependency - Neo4j support requires explicit installation (`pip install neo4j` or `pip install 'crewai-tools[neo4j]'`)

## Testing Instructions

```bash
# Install test dependencies
pip install neo4j pytest

# Run tests
pytest tests/tools/test_neo4j_search_tool.py -v
```

All 6 tests should pass.

1 change: 1 addition & 0 deletions crewai_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
MongoDBVectorSearchTool,
MultiOnTool,
MySQLSearchTool,
Neo4jSearchTool,
NL2SQLTool,
OCRTool,
OxylabsAmazonProductScraperTool,
Expand Down
3 changes: 3 additions & 0 deletions crewai_tools/rag/data_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ class DataType(str, Enum):
# Database types
MYSQL = "mysql"
POSTGRES = "postgres"
NEO4J = "neo4j"

# Repository types
GITHUB = "github"
Expand Down Expand Up @@ -55,6 +56,7 @@ def get_chunker(self) -> BaseChunker:
DataType.DOCS_SITE: ("text_chunker", "TextChunker"),
DataType.MYSQL: ("text_chunker", "TextChunker"),
DataType.POSTGRES: ("text_chunker", "TextChunker"),
DataType.NEO4J: ("text_chunker", "TextChunker"),
}

if self not in chunkers:
Expand Down Expand Up @@ -88,6 +90,7 @@ def get_loader(self) -> BaseLoader:
DataType.DOCS_SITE: ("docs_site_loader", "DocsSiteLoader"),
DataType.MYSQL: ("mysql_loader", "MySQLLoader"),
DataType.POSTGRES: ("postgres_loader", "PostgresLoader"),
DataType.NEO4J: ("neo4j_loader", "Neo4jLoader"),
}

if self not in loaders:
Expand Down
97 changes: 97 additions & 0 deletions crewai_tools/rag/loaders/neo4j_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
"""Neo4j database loader."""

from typing import Any
from urllib.parse import urlparse

try:
from neo4j import GraphDatabase
except ImportError:
GraphDatabase = None

from crewai_tools.rag.base_loader import BaseLoader, LoaderResult
from crewai_tools.rag.source_content import SourceContent


class Neo4jLoader(BaseLoader):
"""Loader for Neo4j database content."""

def load(self, source: SourceContent, **kwargs) -> LoaderResult:
"""Load content from a Neo4j database using a Cypher query.

Args:
source: Cypher query string
**kwargs: Additional arguments including neo4j_uri, neo4j_user, neo4j_password

Returns:
LoaderResult with database content
"""
if GraphDatabase is None:
raise ImportError(
"The neo4j package is required to use Neo4jLoader. "
"Install it with: pip install neo4j"
)

metadata = kwargs.get("metadata", {})
neo4j_uri = metadata.get("neo4j_uri")
neo4j_user = metadata.get("neo4j_user")
neo4j_password = metadata.get("neo4j_password")

if not neo4j_uri or not neo4j_user or not neo4j_password:
raise ValueError("Neo4j URI, user, and password are required for Neo4j loader")

query = source.source

parsed = urlparse(neo4j_uri)
if parsed.scheme not in ["bolt", "neo4j", "bolt+s", "neo4j+s"]:
raise ValueError(f"Invalid Neo4j URI scheme: {parsed.scheme}")

connection_params = {
"uri": neo4j_uri,
"auth": (neo4j_user, neo4j_password)
}

try:
driver = GraphDatabase.driver(**connection_params)
try:
with driver.session() as session:
result = session.run(query)
records = list(result)

if not records:
content = "No data found from the query"
return LoaderResult(
content=content,
metadata={"source": query, "record_count": 0},
doc_id=self.generate_doc_id(source_ref=query, content=content)
)

text_parts = []
text_parts.append(f"Total records: {len(records)}")
text_parts.append("")

for i, record in enumerate(records, 1):
text_parts.append(f"Record {i}:")
for key in record.keys():
value = record[key]
if value is not None:
text_parts.append(f" {key}: {value}")
text_parts.append("")

content = "\n".join(text_parts)

if len(content) > 100000:
content = content[:100000] + "\n\n[Content truncated...]"

return LoaderResult(
content=content,
metadata={
"source": query,
"record_count": len(records),
},
doc_id=self.generate_doc_id(source_ref=query, content=content)
)
finally:
driver.close()
except Exception as e:
raise ValueError(f"Neo4j database error: {e}")

1 change: 1 addition & 0 deletions crewai_tools/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
)
from .multion_tool.multion_tool import MultiOnTool
from .mysql_search_tool.mysql_search_tool import MySQLSearchTool
from .neo4j_search_tool.neo4j_search_tool import Neo4jSearchTool
from .nl2sql.nl2sql_tool import NL2SQLTool
from .ocr_tool.ocr_tool import OCRTool
from .oxylabs_amazon_product_scraper_tool.oxylabs_amazon_product_scraper_tool import (
Expand Down
123 changes: 123 additions & 0 deletions crewai_tools/tools/neo4j_search_tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Neo4jSearchTool

## Description
This tool is designed to facilitate semantic searches within Neo4j graph databases. Leveraging the RAG (Retrieve and Generate) technology, the Neo4jSearchTool provides users with an efficient means of querying Neo4j database content using Cypher queries. It enables semantic search capabilities over graph data, making it an invaluable resource for users needing to perform intelligent queries on graph databases containing nodes, relationships, and properties.

## Installation
To install the `crewai_tools` package with Neo4j support, execute the following command in your terminal:

```shell
pip install 'crewai[tools]'
```

Or install with the Neo4j extra for the latest dependencies:

```shell
pip install 'crewai-tools[neo4j]'
```

Or install the required dependencies manually:

```shell
pip install neo4j>=5.0.0
```

## Example
Below is an example showcasing how to use the Neo4jSearchTool to conduct a semantic search on a Neo4j database:

```python
from crewai_tools import Neo4jSearchTool

# Initialize the tool with Neo4j connection details
tool = Neo4jSearchTool(
neo4j_uri='bolt://localhost:7687',
neo4j_user='neo4j',
neo4j_password='your_password'
)

# Execute a semantic search query
result = tool._run(
search_query="Find all users who follow John",
similarity_threshold=0.7,
limit=10
)
print(result)
```

## Arguments
The Neo4jSearchTool requires the following arguments for its operation:

- `neo4j_uri`: A string representing the URI of the Neo4j database (e.g., `bolt://localhost:7687` or `neo4j://localhost:7687`). This argument is mandatory.
- `neo4j_user`: A string specifying the username for Neo4j database authentication. This argument is mandatory.
- `neo4j_password`: A string specifying the password for Neo4j database authentication. This argument is mandatory.
- `search_query`: A string containing the semantic search query you want to perform. This is used when calling `_run()` method.
- `similarity_threshold` (optional): A float between 0 and 1 specifying the minimum similarity score for results. Defaults to 0.6.
- `limit` (optional): An integer specifying the maximum number of results to return. Defaults to 5.

## Usage with Cypher Queries

The tool automatically handles Cypher queries to extract data from your Neo4j database. When you add data using `tool.add()`, it executes your Cypher query and stores the results for semantic search:

```python
# Add data from a Cypher query
tool.add("MATCH (n:Person)-[:KNOWS]->(f:Person) RETURN n.name as person, f.name as friend")

# Now you can search semantically
result = tool._run(search_query="Find people who know others")
```

## Custom model and embeddings

By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:

```python
tool = Neo4jSearchTool(
neo4j_uri='bolt://localhost:7687',
neo4j_user='neo4j',
neo4j_password='your_password',
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google",
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

## Connection URI Formats

The `neo4j_uri` parameter supports several connection schemes:

- `bolt://` - Bolt protocol (recommended for most use cases)
- `neo4j://` - Neo4j URI scheme with Bolt
- `bolt+s://` - Bolt over TLS/SSL
- `neo4j+s://` - Neo4j URI scheme with Bolt over TLS/SSL

Examples:
```python
# Local database
neo4j_uri='bolt://localhost:7687'

# Remote database
neo4j_uri='bolt://neo4j.example.com:7687'

# Secure connection
neo4j_uri='bolt+s://neo4j.example.com:7687'

# Neo4j Aura (managed cloud service)
neo4j_uri='neo4j+s://your-instance.databases.neo4j.io:7687'
```

Empty file.
Loading