Embedding Models Evaluation using pgai Vectorizer and LiteLLM

This evaluation script show hows simple it is to evaluate different embedding models using pgai's Vectorizer integrated with LiteLLM. The dataset used in this evaluation is Paul Graham's essays, but the flexibility of the Vectorizer allows you to use any dataset both from HuggingFace and from your own files.The LiteLLM integration also allows for easy swapping and testing of different embedding models through one SQL interface.

The evaluation generates diverse question types (short, long, direct, implied, and unclear) for random text chunks and measures each model's ability to retrieve relevant context through vector similarity search.

Prerequisites:

Docker and Docker Compose installed (compose.yaml is included)
PostgreSQL with pgai extension running
HuggingFace dataset: https://huggingface.co/datasets/sgoel9/paul_graham_essays
API Keys for:
- Cohere (COHERE_API_KEY)
- Mistral (MISTRAL_API_KEY)
- OpenAI (OPENAI_API_KEY)
- HuggingFace (HUGGINGFACE_API_KEY)

Configuration:

NUM_CHUNKS = 20 # Number of random text chunks to evaluate
NUM_QUESTIONS_PER_CHUNK = 20 # Total questions per chunk (4 of each type)
TOP_K = 10 # Number of closest chunks to retrieve
QUESTION_DISTRIBUTION = { # Distribution of question types 'short': 4, # Direct, simple questions under 10 words 'long': 4, # Detailed questions requiring comprehensive answers 'direct': 4, # Questions about explicit information 'implied': 4, # Questions requiring context understanding 'unclear': 4 # Vague or ambiguous questions }
EMBEDDING_TABLES = [ # Database tables containing embeddings 'essays_cohere_embeddings', 'essays_mistral_embeddings', 'essays_openai_small_embeddings' ]

Installation and Setup:

Create directory with compose.yaml. Make sure you have all API keys in compose.yaml:
- services.db.environment.COHERE_API_KEY="..."
- services.db.environment.MISTRAL_API_KEY="..."
- services.db.environment.OPENAI_API_KEY="..."
Start services: docker compose up -d
Connect to your database: docker exec -it pgai-db-1 psql -U postgres -d postgres

Enable pgai extension:

CREATE EXTENSION IF NOT EXISTS ai CASCADE;

Load Paul Graham's essays into the database: SELECT ai.load_dataset('sgoel9/paul_graham_essays');
Create vectorizers in psql for each embedding model:
- Cohere embed-english-v3.0 (1024 dimensions)
- Mistral mistral-embed (1024 dimensions)
- OpenAI text-embedding-3-small (1024 dimensions)
SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_cohere_embeddings'), embedding => ai.embedding_litellm( 'cohere/embed-english-v3.0', 1024, api_key_name => 'COHERE_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );

SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_mistral_embeddings'), embedding => ai.embedding_litellm( 'mistral/mistral-embed', 1024, api_key_name => 'MISTRAL_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );

SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_openai_small_embeddings'), embedding => ai.embedding_openai( 'text-embedding-3-small', 1024, api_key_name => 'OPENAI_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );

Use SELECT * FROM ai.vectorizer_status; to check if Vectorizers are ready.

Use select * from ai.vectorizer_errors to check if there are any errors.

Usage:

Run the script after vectorizers are created and processing is complete
Generate chunks - chunks are randomly selected from the vectorized table
Generate questions using GPT-4o-mini
Evaluate models

Outputs:

chunks.csv: Random text chunks from database
questions.csv: Generated questions for each chunk
results.csv: Overall model performance metrics
detailed_results.csv: Per-question evaluation results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!