Skip to content

Commit 927dc44

Browse files
Struct2graph microservice for HybridRAG (#1502)
Signed-off-by: siddhivelankar23 <[email protected]>
1 parent 5bd5309 commit 927dc44

File tree

15 files changed

+767
-0
lines changed

15 files changed

+767
-0
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# this file should be run in the root of the repo
5+
services:
6+
struct2graph:
7+
build:
8+
dockerfile: comps/struct2graph/src/Dockerfile
9+
image: ${REGISTRY:-opea}/struct2graph:${TAG:-latest}

comps/cores/mega/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ class ServiceType(Enum):
3636
TEXT2SQL = 19
3737
TEXT2GRAPH = 20
3838
TEXT2CYPHER = 21
39+
STRUCT2GRAPH = 23
3940

4041

4142
class MegaServiceEndpoint(Enum):

comps/struct2graph/deployment/docker_compose/README.md

Whitespace-only changes.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
include:
5+
- ../../../third_parties/neo4j/deployment/docker_compose/compose.yaml
6+
7+
services:
8+
struct2graph:
9+
image: opea/struct2graph:latest
10+
container_name: struct2graph
11+
environment:
12+
- no_proxy=${no_proxy}
13+
- https_proxy=${https_proxy}
14+
- http_proxy=${http_proxy}
15+
- NEO4J_URL=${NEO4J_URL}
16+
- NEO4J_server_directories_import=import
17+
- NEO4J_PLUGINS=["apoc"]
18+
- NEO4J_dbms_security_allow__csv__import__from__file__urls=true
19+
- NEO4J_server_directories_import='/var/lib/neo4j/import'
20+
- NEO4J_dbms_security_procedures_unrestricted=apoc.\\\* neo4j:5.23.0
21+
ports:
22+
- ${STRUCT2GRAPH_PORT:-8090}:8090
23+
depends_on:
24+
neo4j-apoc:
25+
condition: service_healthy
26+
healthcheck:
27+
test: ["CMD", "curl", "-f", "http://localhost:7474"]
28+
interval: 10s
29+
timeout: 5s
30+
retries: 10
31+
start_period: 30s
32+
ipc: host
33+
network_mode: host
34+
restart: always
35+
36+
networks:
37+
default:
38+
driver: bridge

comps/struct2graph/src/Dockerfile

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
FROM ubuntu:22.04
5+
6+
WORKDIR /home/graph_extract
7+
8+
FROM python:3.11-slim
9+
ENV LANG=C.UTF-8
10+
ARG ARCH=cpu
11+
12+
RUN apt-get update -y && apt-get install vim -y && apt-get install -y --no-install-recommends --fix-missing \
13+
build-essential
14+
15+
RUN useradd -m -s /bin/bash user && \
16+
mkdir -p /home/user && \
17+
chown -R user /home/user/
18+
19+
COPY comps /home/user/comps
20+
21+
RUN pip install --no-cache-dir --upgrade pip setuptools && \
22+
if [ ${ARCH} = "cpu" ]; then \
23+
pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu -r /home/user/comps/struct2graph/src/requirements.txt; \
24+
else \
25+
pip install --no-cache-dir -r /home/user/comps/struct2graph/src/requirements.txt; \
26+
fi
27+
28+
ENV https_proxy=${https_proxy}
29+
ENV http_proxy=${http_proxy}
30+
ENV no_proxy=${no_proxy}
31+
ENV PYTHONPATH="/home/user/":$PYTHONPATH
32+
33+
USER user
34+
35+
WORKDIR /home/user/comps/struct2graph/src/
36+
37+
ENTRYPOINT ["python", "opea_struct2graph_microservice.py"]

comps/struct2graph/src/README.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Struct2Graph Microservice
2+
3+
The Struct2Graph Microservice represents a powerful solution for transforming structured data formats like csv and json into Neo4j graph structures, serving as a crucial bridge between traditional data sources and modern graph-based systems. This process allows for enriching existing graphs, performing advanced data analysis, and constructing comprehensive knowledge graphs.
4+
By importing structured data, users can integrate it into RAG flows, enhance querying capabilities to uncover patterns and relationships across large datasets. It's particularly useful for populating databases, creating hierarchical structures, and enabling cross-document querying. Furthermore, this approach supports data integration to provide a solid foundation for developing sophisticated graph-based applications that can exploit the rich relationships and properties inherent in graph data structures.
5+
6+
## Features
7+
8+
To convert structured data from CSV and JSON we provide the following interface -
9+
Input:
10+
11+
```
12+
{
13+
"input_text": "string",
14+
"task": "string",
15+
"cypher_cmd": "string"
16+
}
17+
```
18+
19+
Output: Directory with results to query.
20+
21+
The task can be set to the following -
22+
23+
1. Index - generates index based on the cypher command (Output: Generated index)
24+
2. Query - queries the index based on the input text (Output: Directory with results to query)
25+
26+
## Implementation
27+
28+
The struct2graph microservice is able to load and query structured data through neo4j.
29+
The service is hosted in a docker. The mode of operation is through docker build + run or using docker compose.
30+
31+
## 🚀1. Start Microservice with docker run
32+
33+
### Install Requirements
34+
35+
```bash
36+
pip install -r requirements.txt
37+
```
38+
39+
### Export environment variables
40+
41+
```
42+
cd comps/struct2graph/src/
43+
source environment_setup.sh
44+
```
45+
46+
OR
47+
48+
```
49+
export https_proxy=${https_proxy}
50+
export http_proxy=${http_proxy}
51+
export no_proxy=${no_proxy}
52+
export INDEX_NAME=${INDEX_NAME:-"graph_store"}
53+
export PYTHONPATH="/home/user/"
54+
export NEO4J_USERNAME=${NEO4J_USERNAME:-"neo4j"}
55+
export NEO4J_PASSWORD=${NEO4J_PASSWORD:-"neo4j_password"}
56+
export NEO4J_URL=${NEO4J_URL:-"neo4j://neo4j-apoc:7687"}
57+
export DATA_DIRECTORY=${DATA_DIRECTORY:-data}
58+
export STRUCT2GRAPH_PORT=8090
59+
export LOAD_FORMAT="CSV" # or JSON
60+
```
61+
62+
### Launch Neo4j Service
63+
64+
Refer to [this link](https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/neo4j/src/README.md) to start and verify the neo4j microservice.
65+
66+
### Verify the Neo4j Service
67+
68+
```bash
69+
curl -v http://localhost:7474
70+
```
71+
72+
If the Neo4j server is running correctly, the response should include an HTTP status code of 200 OK. Any other status code or an error message indicates that the server is not running or is not accessible. If the port 7474 is mapped to another port, you should change the port in the command accordingly.
73+
74+
### Start struct2graph Microservice with Docker
75+
76+
Command to build struct2graph microservice -
77+
78+
```bash
79+
docker build -f Dockerfile -t opea/struct2graph:latest ../../../
80+
```
81+
82+
Command to run struct2graph microservice -
83+
84+
```bash
85+
docker run -i -t --net=host --ipc=host -p STRUCT2GRAPH_PORT opea/struct2graph:latest
86+
```
87+
88+
The docker launches the struct2graph microservice interactively.
89+
90+
## 🚀2. Start Microservice with docker compose
91+
92+
Export environment variables as mentioned in option 1.
93+
94+
Command to run docker compose -
95+
96+
```bash
97+
cd GenAIComps/tests/struct2graph/deployment/docker_compose
98+
99+
docker compose -f struct2graph-compose.yaml up
100+
```
101+
102+
## 3. Validate the service using API endpoint
103+
104+
Example for "index" task -
105+
106+
```bash
107+
curl -X POST http://localhost:$STRUCT2GRAPH_PORT/v1/struct2graph \
108+
-H "accept: application/json" \
109+
-H "Content-Type: application/json" \
110+
-d '{
111+
"input_text": "",
112+
"task": "Index",
113+
"cypher_cmd": "LOAD CSV WITH HEADERS FROM \'file:///$DATA_DIRECTORY/test1.csv\' AS row CREATE (:Person {ID: toInteger(row.ID), Name: row.Name, Age: toInteger(row.Age), City: row.City})"
114+
}'
115+
```
116+
117+
Example for "query" task -
118+
119+
```bash
120+
curl -X POST http://localhost:$STRUCT2GRAPH_PORT/v1/struct2graph \
121+
-H "accept: application/json" \
122+
-H "Content-Type: application/json" \
123+
-d '{
124+
"input_text": "MATCH (p:Person {Name:\'Alice\'}) RETURN p",
125+
"task": "Query",
126+
"cypher_cmd": ""
127+
}'
128+
```
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
#######################################################################
5+
# Proxy
6+
#######################################################################
7+
export https_proxy=${https_proxy}
8+
export http_proxy=${http_proxy}
9+
export no_proxy=${no_proxy}
10+
################################################################
11+
# Configure LLM Parameters based on the model selected.
12+
################################################################
13+
export INDEX_NAME=${INDEX_NAME:-"graph_store"}
14+
export PYTHONPATH="/home/user/"
15+
export NEO4J_USERNAME=${NEO4J_USERNAME:-"neo4j"}
16+
export NEO4J_PASSWORD=${NEO4J_PASSWORD:-"neo4j_password"}
17+
export NEO4J_URL=${NEO4J_URL:-"neo4j://neo4j-apoc:7687"}
18+
export DATA_DIRECTORY=${DATA_DIRECTORY:-data}
19+
export FILENAME=${FILENAME:-test1.csv}
20+
export LOAD_FORMAT=${LOAD_FORMAT:-"CSV"}
21+
22+
23+
export CYPHER_CSV_CMD="LOAD CSV WITH HEADERS FROM 'file:////test1.csv' AS row \
24+
CREATE (:Person {ID: toInteger(row.ID), Name: row.Name, Age: toInteger(row.Age), City: row.City});"
25+
export CYPHER_JSON_CMD=" \
26+
CALL apoc.load.json("file:///test1.json") YIELD value \
27+
UNWIND value.table AS row \
28+
CREATE (:Person { \
29+
ID: row.ID, \
30+
Name: row.Name, \
31+
Age: row.Age, \
32+
City: row.City \
33+
}); \
34+
"
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import logging
5+
import os
6+
7+
from langchain_neo4j import Neo4jGraph
8+
9+
from comps import CustomLogger
10+
11+
logger = CustomLogger("opea_struct2graph")
12+
13+
14+
class PrepareGraphDB:
15+
"""A class for preparing and saving a GraphDB."""
16+
17+
def __init__(self):
18+
self.graph_store = self.neo4j_link()
19+
20+
def neo4j_link(self):
21+
NEO4J_URL = os.getenv("NEO4J_URL")
22+
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
23+
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
24+
NEO4J_DATABASE = os.getenv("NEO4J_DATABASE")
25+
26+
if not all([NEO4J_URL, NEO4J_USERNAME, NEO4J_PASSWORD]):
27+
raise EnvironmentError("Missing required Neo4j environment variables")
28+
29+
graph_store = Neo4jGraph(username=NEO4J_USERNAME, password=NEO4J_PASSWORD, url=NEO4J_URL)
30+
return graph_store
31+
32+
def cleanup_neo4j(self):
33+
try:
34+
cypher = """MATCH (n) DETACH DELETE n"""
35+
self.graph_store.query(cypher)
36+
37+
logger.info("## Existing graph_store schema...")
38+
logger.info(self.graph_store.schema)
39+
40+
logger.info("Deleting all nodes...")
41+
cypher = """MATCH (n) RETURN count(n)"""
42+
result = self.graph_store.query(cypher)
43+
44+
logger.info("Dropping all constraints...")
45+
for constraint in self.graph_store.query("SHOW CONSTRAINTS"):
46+
self.graph_store.query(f"DROP CONSTRAINT {constraint['name']}")
47+
48+
logger.info("Dropping all indexes...")
49+
for index in self.graph_store.query("SHOW INDEXES"):
50+
logger.info(f"Removing index {index['name']}:")
51+
self.graph_store.query(f"""DROP INDEX `{index['name']}`""")
52+
53+
logger.info("## Blank schema...")
54+
self.graph_store.refresh_schema()
55+
logger.info(self.graph_store.schema)
56+
return
57+
58+
except Exception as e:
59+
logger.error(f"Failed to cleanup Neo4j database: {str(e)}")
60+
raise
61+
62+
def load_graphdb(self, cypher_cmd):
63+
LOAD_FORMAT = os.getenv("LOAD_FORMAT", "CSV")
64+
65+
try:
66+
if LOAD_FORMAT == "CSV":
67+
cypher_csv_insert = cypher_cmd
68+
logger.info(f"INSERTING CSV Cypher command : {cypher_csv_insert}")
69+
logger.info("Preparing graphdb...")
70+
self.graph_store.query(cypher_csv_insert)
71+
logger.info("GraphDB is created and saved.")
72+
73+
elif LOAD_FORMAT == "JSON":
74+
cypher_json_insert = cypher_cmd
75+
logger.info(f"INSERTING JSON Cypher command : {cypher_json_insert}")
76+
self.graph_store.query(cypher_json_insert)
77+
logger.info(f"The following is the graph schema \n\n {self.graph_store.schema}")
78+
logger.info("GraphDB is created and saved.")
79+
80+
else:
81+
logger.error("Only CSV and JSON formats are supported")
82+
raise ValueError("Only CSV and JSON formats are supported")
83+
84+
logger.info("Preparing graphdb...")
85+
return self.graph_store
86+
87+
except NameError:
88+
raise ValueError("Error: The variable CYPHER_CSV_CMD is not set.")
89+
90+
def prepare_insert_graphdb(self, cypher_cmd):
91+
logger.info("Cleaning up graph db")
92+
self.cleanup_neo4j()
93+
logger.info("Done cleaning up graph db")
94+
self.load_graphdb(cypher_cmd)
95+
logger.info("Completed inserting into graphdb")
96+
logger.info(f"The following is the graph schema \n\n {self.graph_store.schema}")
97+
logger.info("Preparing graphdb...")
98+
logger.info("GraphDB is created and saved.")
99+
return self.graph_store

0 commit comments

Comments
 (0)