RAG Document Loader

This is a simple script to load a text document and generate vector embeddings for chunks of text.

The embeddings are generated using the openai/text-embedding-3-small model from openai in 1536 dimensions.

Neon is used as a vector database to store the embeddings.

Langchain is used seamlessly to generate embeddings and store them in Neon.

Prerequisites

pnpm install

OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
NEON_CONNECTION_STRING=<YOUR_NEON_CONNECTION_STRING>

pnpm start

Need better documents to generate embeddings from. The current document is a simple text document with no specific structure.
The document loader is optimized to load given text document at ./src/db/neon_info.txt. This can be improved to accept any text document.
More cleaning and preprocessing can be done to the text before generating embeddings.

Feel free to contribute to this project. Please create an issue or a pull request with your suggestions or improvements.

This project is under MIT license.