This project is a Retrieval-Augmented Generation (RAG) system that processes various file formats and generates responses to user queries by embedding the file content into a database. The system uses subquery generation and context retrieval to provide accurate answers based on the uploaded documents. The application is built using Flask, Docker, and integrates with Redis and Qdrant.
- Supports file uploads in the following formats: PNG, XLSX, PDF, DOCX, TXT
- Embeds the content of the files into a database for efficient retrieval
- Generates responses to user queries based on the embedded content
- Setup using Flask and Docker
- Scalable to multiple instances
- Backend: Flask
- Database: Qdrant
- Cache: Redis
- Containerization: Docker
To run this application, you need the following installed on your machine:
- Docker
- Docker Compose
- URL:
http://localhost:443/upload - Method:
POST - Description: Upload files to be embedded into the database.
- URL:
http://localhost:443/status/<task_id> - Method:
GET - Description: Check the status of a background task by providing the task ID.
- URL:
http://localhost:443/query - Method:
POST - Body:
{"query": "[YOUR_QUERY]"} - Description: Send a query to get a response based on the embedded content of the uploaded files.
To run the application using Docker, follow these steps:
-
Clone the repository:
git clone https://github.com/Himanshu2561/multipart_rag.git cd multipart_rag/ -
Build and run the Docker containers:
docker-compose up -d --build
-
Scale the application (optional):
docker-compose up -d --build --scale app=<number_of_instances>
-
Upload files: Use the
/uploadendpoint to upload files in the supported formats. -
Query: Use the
/queryendpoint to send a query and receive a response based on the uploaded files. -
Check Status: Use the
/status/<task_id>endpoint to check the status of any background tasks.
curl -X POST -F "file=@example.pdf" http://localhost:443/upload
curl -X POST -H "Content-Type: application/json" -d '{"query": "What is in the document?"}' http://localhost:443/query
curl -X GET http://localhost:443/status/<task_id>