Financial Document Search & Retrieval System

Overview

The Financial Document Search & Retrieval System is an AI-powered search tool that allows users to query financial documents (such as company reports, mutual fund documents, and market research reports) using natural language. It leverages OpenAI embeddings and Pinecone vector search to provide accurate and contextually relevant results.

Features

Semantic Search: Uses OpenAI's text-embedding-ada-002 model to generate document embeddings for similarity-based retrieval.
Retrieval-Augmented Generation (RAG): Combines search results with GPT-4 to generate concise, well-informed answers.
API Backend: Built with FastAPI to handle document search and retrieval requests.
Frontend Interface: Uses Streamlit for an interactive UI to submit queries and display results.
PDF Processing: Extracts text and tables from PDF files using pdfplumber and PyMuPDF.
Containerized Deployment: Managed using Docker and docker-compose for easy scalability.

Directory Structure

├── aswinpradeepc-llmsearch/
│   ├── docker-compose.yml  # Docker Compose file for multi-container setup
│   ├── Dockerfile          # Defines the API and frontend container
│   ├── dockerignore        # Files to ignore during Docker build
│   ├── requirements.txt    # Python dependencies
│   ├── main.py             # Main entry point (if required)
│   ├── api/                # Backend API
│   │   └── main.py         # FastAPI server implementation
│   ├── frontend/           # Streamlit frontend
│   │   ├── app.py          # Main Streamlit application
│   │   └── components.py   # Helper functions for UI rendering
│   ├── scripts/            # Preprocessing scripts
│   │   ├── generate_embeddings.py # Converts documents into vector embeddings
│   │   └── pdf_to_json.py         # Extracts text and tables from PDFs

Setup & Installation

Prerequisites

Ensure you have the following installed:

Python 3.12+
Docker & Docker Compose
An OpenAI API Key
A Pinecone API Key

Environment Variables

Create a .env file and define the following variables:

OPENAI_API_KEY=your_openai_api_key
PINECONE_KEY=your_pinecone_api_key

Running the Application

1. Clone the repository

git clone https://github.com/aswinpradeepc/llmsearch.git
cd llmsearch

2. Build and run the containers

docker-compose up --build

3. Access the application

Backend API: Runs on http://localhost:8000
Frontend UI: Accessible at http://localhost:8501

API Endpoints

Health Check

GET /health

Response:

{"status": "healthy"}

Semantic Search

POST /search

Request Body:

{
  "query": "Top mutual funds in 2023",
  "top_k": 5
}

Response:

{
  "matches": [
    {
      "id": "doc123",
      "score": 0.95,
      "metadata": {
        "filename": "funds_report.pdf",
        "content": "Best performing funds in 2023...",
        "tables": "[Table data here]"
      }
    }
  ]
}

Retrieval-Augmented Generation (RAG)

POST /rag

Request Body:

{
  "query": "What are the key insights from XYZ report?",
  "top_k": 3
}

Response:

{
  "answer": "XYZ report highlights revenue growth of 10%...",
  "sources": [
    {
      "id": "doc456",
      "metadata": {
        "filename": "XYZ_report.pdf",
        "content": "Revenue increased by 10%..."
      }
    }
  ]
}

Development & Contribution

Running Locally

1. Install Dependencies

pip install -r requirements.txt

2. Start API

uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

3. Start Frontend

streamlit run frontend/app.py --server.port 8501

Modifying the Code

Backend (FastAPI): Modify api/main.py
Frontend (Streamlit): Modify frontend/app.py
PDF Processing: Modify scripts/pdf_to_json.py

Deployment

Running in Production

To deploy the application, ensure environment variables are set and run:

docker compose up -d --build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Financial Document Search & Retrieval System

Overview

Features

Directory Structure

Setup & Installation

Prerequisites

Environment Variables

Running the Application

1. Clone the repository

2. Build and run the containers

3. Access the application

API Endpoints

Health Check

Semantic Search

Request Body:

Response:

Retrieval-Augmented Generation (RAG)

Request Body:

Response:

Development & Contribution

Running Locally

1. Install Dependencies

2. Start API

3. Start Frontend

Modifying the Code

Deployment

Running in Production

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
api		api
frontend		frontend
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
dockerignore		dockerignore
main.py		main.py
requirements.txt		requirements.txt

aswinpradeepc/llmsearch

Folders and files

Latest commit

History

Repository files navigation

Financial Document Search & Retrieval System

Overview

Features

Directory Structure

Setup & Installation

Prerequisites

Environment Variables

Running the Application

1. Clone the repository

2. Build and run the containers

3. Access the application

API Endpoints

Health Check

Semantic Search

Request Body:

Response:

Retrieval-Augmented Generation (RAG)

Request Body:

Response:

Development & Contribution

Running Locally

1. Install Dependencies

2. Start API

3. Start Frontend

Modifying the Code

Deployment

Running in Production

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages