This is a simple script to load a text document and generate vector embeddings for chunks of text.
The embeddings are generated using the openai/text-embedding-3-small model from openai in 1536 dimensions.
Neon is used as a vector database to store the embeddings.
Langchain is used seamlessly to generate embeddings and store them in Neon.
More info langchain/community/neon/vectorstores/neon/neonpostgres
pnpm install- Create a
.envfile in the root directory of the project. - Add the following environment variables to the
.envfile:
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
NEON_CONNECTION_STRING=<YOUR_NEON_CONNECTION_STRING>
pnpm start- Need better documents to generate embeddings from. The current document is a simple text document with no specific structure.
- The document loader is optimized to load given text document at
./src/db/neon_info.txt. This can be improved to accept any text document. - More cleaning and preprocessing can be done to the text before generating embeddings.
Feel free to contribute to this project. Please create an issue or a pull request with your suggestions or improvements.
This project is under MIT license.