Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions comps/dataprep/deployment/docker_compose/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ include:
- ../../../third_parties/vllm/deployment/docker_compose/compose.yaml
- ../../../third_parties/arangodb/deployment/docker_compose/compose.yaml
- ../../../third_parties/mariadb/deployment/docker_compose/compose.yaml
- ../../../third_parties/opengauss/deployment/docker_compose/compose.yaml

services:

Expand Down Expand Up @@ -191,6 +192,28 @@ services:
security_opt:
- no-new-privileges:true

dataprep-opengauss:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-opengauss-server
ports:
- "${DATAPREP_PORT:-5000}:5000"
depends_on:
opengauss-db:
condition: service_healthy
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
DATAPREP_COMPONENT_NAME: "OPEA_DATAPREP_OPENGAUSS"
GS_CONNECTION_STRING: ${GS_CONNECTION_STRING}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 10
restart: unless-stopped

dataprep-pgvector:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-pgvector-server
Expand Down
109 changes: 109 additions & 0 deletions comps/dataprep/src/README_opengauss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Dataprep Microservice with openGauss

## Table of contents

1. [🚀1. Start Microservice with Docker](#1-start-microservice-with-docker)
2. [🚀2. Consume Microservice](#2-consume-microservice)

## 🚀1. Start Microservice with Docker

### 1.1 Start openGauss

Please refer to this [readme](../../third_parties/opengauss/src/README.md).

### 1.2 Setup Environment Variables

```bash
export GS_CONNECTION_STRING=opengauss+psycopg2://gaussdb:openGauss@123@${your_ip}:5432/postgres
export INDEX_NAME=${your_index_name}
export TEI_EMBEDDING_ENDPOINT=${your_tei_embedding_endpoint}
export HF_TOKEN=${your_hf_api_token}
```

### 1.3 Build Docker Image

```bash
cd GenAIComps
docker build -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
```

### 1.4 Run Docker with CLI (Option A)

```bash
docker run --name="dataprep-opengauss" -p 6007:6007 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e GS_CONNECTION_STRING=$GS_CONNECTION_STRING -e INDEX_NAME=$INDEX_NAME -e EMBED_MODEL=${EMBED_MODEL} -e TEI_EMBEDDING_ENDPOINT=$TEI_EMBEDDING_ENDPOINT -e HF_TOKEN=${HF_TOKEN} -e DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_OPENGAUSS" opea/dataprep:latest
```

### 1.5 Run with Docker Compose (Option B)

```bash
cd comps/dataprep/deployment/docker_compose
docker compose -f compose.yaml up dataprep-opengauss -d
```

## 🚀2. Consume Microservice

### 2.1 Consume Upload API

Once document preparation microservice for openGauss is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database.

```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{"path":"/path/to/document"}' \
http://localhost:6007/v1/dataprep/ingest
```

### 2.2 Consume get API

To get uploaded file structures, use the following command:

```bash
curl -X POST \
-H "Content-Type: application/json" \
http://localhost:6007/v1/dataprep/get
```

Then you will get the response JSON like this:

```json
[
{
"name": "uploaded_file_1.txt",
"id": "uploaded_file_1.txt",
"type": "File",
"parent": ""
},
{
"name": "uploaded_file_2.txt",
"id": "uploaded_file_2.txt",
"type": "File",
"parent": ""
}
]
```

### 2.3 Consume delete API

To delete uploaded file/link, use the following command.

The `file_path` here should be the `id` get from `/v1/dataprep/get` API.

```bash
# delete link
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "https://www.ces.tech/.txt"}' \
http://localhost:6007/v1/dataprep/delete

# delete file
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "uploaded_file_1.txt"}' \
http://localhost:6007/v1/dataprep/delete

# delete all files and links
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "all"}' \
http://localhost:6007/v1/dataprep/delete
```
Loading
Loading