Skip to content

Conversation

@dilverse
Copy link

@dilverse dilverse commented Nov 3, 2024

Description

MinIO is a high performance Object Storage solution which is fully S3 compatible and using which its easier to build highly scalable applications. In this PR adding support for the following

  • In Dataprep storing all the uploaded documents directly to MinIO
  • Modularize the dataprep process
  • Once the document is uploaded to MinIO bucket an event notification is sent to the dataprep service to do the chunking and store the chunked metatdata as msgpack into MinIO Bucket
  • once the msgpack chunked metadata is stored in MInIO bucket another notification is send to do the embeddings process and store the chunks, metadata and embeddings to to vector database like Milvus and LanceDB.
  • Also adding support for MinIO LanceDB based retriever

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

@mkbhanda
Copy link
Collaborator

mkbhanda commented Nov 5, 2024

@lvliang-intel would you kindly review.

Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @chensuyue do we need any tests?

"parent": ""
},
{
"name": "uploaded_file_2.txt",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to change one of these to a file path/url. What is the name of a file when it is path? id == name?

@chensuyue
Copy link
Collaborator

Yes, we need some one test for each microservice.
test_retrievers_minio_lancedb_langchain.sh
test_dataprep_minio_lancedb_langchain.sh
test_dataprep_minio_milvus_langchain.sh

@chensuyue
Copy link
Collaborator

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are few minor comments on the docs and Python code.

As to Dockerfiles, see: opea-project/GenAIExamples#225

@lvliang-intel
Copy link
Collaborator

@dilverse,
Please update comps/dataprep/minio/milvus/langchain/Dockerfile, comps/dataprep/minio/lancedb/langchain/Dockerfile and comps/retrievers/minio/lancedb/langchain/Dockerfile to .github/workflows/docker/compose/dataprep-compose.yaml. The yaml is used for release images build.

@dilverse
Copy link
Author

@dilverse, Please update comps/dataprep/minio/milvus/langchain/Dockerfile, comps/dataprep/minio/lancedb/langchain/Dockerfile and comps/retrievers/minio/lancedb/langchain/Dockerfile to .github/workflows/docker/compose/dataprep-compose.yaml. The yaml is used for release images build.

@lvliang-intel Updated the github workflow

@xiguiw
Copy link
Collaborator

xiguiw commented Feb 25, 2025

@dilverse
GenAIComps are refactored in v1.2.
It does not create a micro-service for each DB anymore.
Instead, only one micro-service is needed.
It simplifies the integration of DB.

Please refer to latest code change.

@xiguiw
Copy link
Collaborator

xiguiw commented Mar 26, 2025

@dilverse

Remind
Are you still working on this?

@xiguiw xiguiw added the A3 Maintain label Mar 26, 2025
@dilverse
Copy link
Author

@dilverse

Remind Are you still working on this?

Sounds good I will take a look at it and update the PR @xiguiw

@xiguiw
Copy link
Collaborator

xiguiw commented May 30, 2025

@dilverse

There is conflict, please help to resolve the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A3 Maintain

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants