Skip to content

Latest commit

 

History

History
52 lines (32 loc) · 1.65 KB

File metadata and controls

52 lines (32 loc) · 1.65 KB

RAG Document Loader

This is a simple script to load a text document and generate vector embeddings for chunks of text.

The embeddings are generated using the openai/text-embedding-3-small model from openai in 1536 dimensions.

Neon is used as a vector database to store the embeddings.

Langchain is used seamlessly to generate embeddings and store them in Neon.

More info langchain/community/neon/vectorstores/neon/neonpostgres

Prerequisites

Installation

pnpm install

Configuration

  • Create a .env file in the root directory of the project.
  • Add the following environment variables to the .env file:
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
NEON_CONNECTION_STRING=<YOUR_NEON_CONNECTION_STRING>

Usage

pnpm start

Challenges and Further Improvements

  • Need better documents to generate embeddings from. The current document is a simple text document with no specific structure.
  • The document loader is optimized to load given text document at ./src/db/neon_info.txt. This can be improved to accept any text document.
  • More cleaning and preprocessing can be done to the text before generating embeddings.

Contributing

Feel free to contribute to this project. Please create an issue or a pull request with your suggestions or improvements.

License

This project is under MIT license.