This evaluation script show hows simple it is to evaluate different embedding models using pgai's Vectorizer integrated with LiteLLM. The dataset used in this evaluation is Paul Graham's essays, but the flexibility of the Vectorizer allows you to use any dataset both from HuggingFace and from your own files.The LiteLLM integration also allows for easy swapping and testing of different embedding models through one SQL interface.
The evaluation generates diverse question types (short, long, direct, implied, and unclear) for random text chunks and measures each model's ability to retrieve relevant context through vector similarity search.
- Docker and Docker Compose installed (compose.yaml is included)
- PostgreSQL with pgai extension running
- HuggingFace dataset: https://huggingface.co/datasets/sgoel9/paul_graham_essays
- API Keys for:
- Cohere (COHERE_API_KEY)
- Mistral (MISTRAL_API_KEY)
- OpenAI (OPENAI_API_KEY)
- HuggingFace (HUGGINGFACE_API_KEY)
- NUM_CHUNKS = 20 # Number of random text chunks to evaluate
- NUM_QUESTIONS_PER_CHUNK = 20 # Total questions per chunk (4 of each type)
- TOP_K = 10 # Number of closest chunks to retrieve
- QUESTION_DISTRIBUTION = { # Distribution of question types 'short': 4, # Direct, simple questions under 10 words 'long': 4, # Detailed questions requiring comprehensive answers 'direct': 4, # Questions about explicit information 'implied': 4, # Questions requiring context understanding 'unclear': 4 # Vague or ambiguous questions }
- EMBEDDING_TABLES = [ # Database tables containing embeddings 'essays_cohere_embeddings', 'essays_mistral_embeddings', 'essays_openai_small_embeddings' ]
-
Create directory with compose.yaml. Make sure you have all API keys in compose.yaml:
- services.db.environment.COHERE_API_KEY="..."
- services.db.environment.MISTRAL_API_KEY="..."
- services.db.environment.OPENAI_API_KEY="..."
-
Start services:
docker compose up -d
-
Connect to your database:
docker exec -it pgai-db-1 psql -U postgres -d postgres
-
Enable pgai extension:
CREATE EXTENSION IF NOT EXISTS ai CASCADE;
-
Load Paul Graham's essays into the database: SELECT ai.load_dataset('sgoel9/paul_graham_essays');
-
Create vectorizers in psql for each embedding model:
- Cohere embed-english-v3.0 (1024 dimensions)
- Mistral mistral-embed (1024 dimensions)
- OpenAI text-embedding-3-small (1024 dimensions)
SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_cohere_embeddings'), embedding => ai.embedding_litellm( 'cohere/embed-english-v3.0', 1024, api_key_name => 'COHERE_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );
SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_mistral_embeddings'), embedding => ai.embedding_litellm( 'mistral/mistral-embed', 1024, api_key_name => 'MISTRAL_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );
SELECT ai.create_vectorizer( 'paul_graham_essays'::regclass, destination => ai.destination_table('essays_openai_small_embeddings'), embedding => ai.embedding_openai( 'text-embedding-3-small', 1024, api_key_name => 'OPENAI_API_KEY' ), chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50) );
Use
SELECT * FROM ai.vectorizer_status;
to check if Vectorizers are ready.Use
select * from ai.vectorizer_errors
to check if there are any errors.
- Run the script after vectorizers are created and processing is complete
- Generate chunks - chunks are randomly selected from the vectorized table
- Generate questions using GPT-4o-mini
- Evaluate models
- chunks.csv: Random text chunks from database
- questions.csv: Generated questions for each chunk
- results.csv: Overall model performance metrics
- detailed_results.csv: Per-question evaluation results