RAG Project with LlamaIndex

Overview

This project implements an advanced Retrieval-Augmented Generation (RAG) system based on the LlamaIndex library. The system enables querying and generating insights from a repository of Markdown documents collected during the development of the "Text Analyzer" project.".

The uniqueness of the system lies in its integration of two data channels:

Semantic Search (Vector Store): Utilizing Pinecone to retrieve information from free text and articles.
Structured Data: Extracting data into a MongoDB database and querying specific data points.

Architecture and Workflows

The system is built upon two primary data processing paths activated during system initialization:

1. Unstructured Information Path (Vector Search)

In this path, the system ingests Markdown files, breaks them into information units (Nodes), and indexes them in a vector database.

Ingestion: Reading and preparing the data.
Pinecone Engine: Creating the semantic query engine.

2. Structured Data Path

In this path, the system uses an LLM to extract entities, decisions, and technical data from the text, saving them as structured data in MongoDB.

Extractor Workflow: The extraction process and schema definition.
Mongo Engine: Creating an engine that allows complex queries on the extracted data.

---

Technologies

Framework: LlamaIndex (Core, LLMs, Embeddings, Vector Stores)
LLM & Embeddings: Google Gemini (GoogleGenAI)
Vector Database: Pinecone
NoSQL Database: MongoDB (AsyncIOMotorClient)
UI: Gradio
Environment Management: python-dotenv

Project Structure

main.py: Entry point for the application, system initialization, and Router management.
src/config.py: Central configurations for the LLM, Embeddings model, and Pinecone connectivity.
src/mongo_workflow_engine.py: Utility engine for running queries against the MongoDB Workflow asynchronously.
src/agent_events.py & src/extractor_events.py: Definition of the various workflows for data processing.
data/: Folder containing the Markdown files (such as the Text Analyzer README) used as the knowledge base.

Execution Instructions

Prerequisites

Python 3.12 or higher.
API keys for Google Gemini and Pinecone.
An active MongoDB database.

Installation and Execution

Clone the project to your local computer.

 clone https://github.com/naama-git/RAG-project-with-llamaIndex

Create a .env file and enter the following variables:

GEMINI_API_KEY=your_key
LLM=gemini-1.5-flash
EMBEDDING_LLM=models/text-embedding-004
PINECONE_API_KEY=your_key
PINECONE_INDEX_NAME=your_index
MONGO_URI=your_mongodb_uri
DB_NAME=your_db
COLLECTION_NAME=your_collection

Install the dependencies using your preferred manager (pip/uv).
Run the application:
```
python main.py
```

The system will launch a Gradio UI at the local address.

Technical Notes

The system includes a specific SSL certificate configuration in config.py for compatibility with the NetFree network when working with Pinecone.
Communication with MongoDB is fully asynchronous for optimal performance.

Note: This project was conducted for educational purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
RAG_llamaIndex		RAG_llamaIndex
text_analyzer		text_analyzer
workflows_drowings		workflows_drowings
README.md		README.md
questions_answers.md		questions_answers.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Project with LlamaIndex

Overview

Architecture and Workflows

1. Unstructured Information Path (Vector Search)

2. Structured Data Path

Technologies

Project Structure

Execution Instructions

Prerequisites

Installation and Execution

Technical Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Project with LlamaIndex

Overview

Architecture and Workflows

1. Unstructured Information Path (Vector Search)

2. Structured Data Path

Technologies

Project Structure

Execution Instructions

Prerequisites

Installation and Execution

Technical Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages