Skip to content

docs: update for name and destination changes #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/vectorizer-quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ Now we can create and run a vectorizer. A vectorizer is a pgai concept, it proce
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down
283 changes: 241 additions & 42 deletions docs/vectorizer/api-reference.md

Large diffs are not rendered by default.

68 changes: 60 additions & 8 deletions docs/vectorizer/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,10 @@ query like this:
```sql
SELECT ai.create_vectorizer(
'blog'::regclass,
name => 'blog_embeddings', -- Optional custom name for easier reference
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768)
embedding => ai.embedding_ollama('nomic-embed-text', 768),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down Expand Up @@ -150,9 +151,9 @@ into each chunk:
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768),
formatting => ai.formatting_python_template('$title: $chunk')
formatting => ai.formatting_python_template('$title: $chunk'),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down Expand Up @@ -284,9 +285,9 @@ accordingly:
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768),
formatting => ai.formatting_python_template('$title - by $author - $chunk')
formatting => ai.formatting_python_template('$title - by $author - $chunk'),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand All @@ -304,10 +305,10 @@ example uses a HNSW index:
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768),
formatting => ai.formatting_python_template('$title - by $author - $chunk'),
indexing => ai.indexing_hnsw(min_rows => 100000, opclass => 'vector_l2_ops')
indexing => ai.indexing_hnsw(min_rows => 100000, opclass => 'vector_l2_ops'),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down Expand Up @@ -344,6 +345,57 @@ CREATE TABLE blog_contents_embeddings_store(
);
```

## Destination Options for Embeddings

Vectorizer supports two different ways to store your embeddings:

### 1. Table Destination (Default)

The default approach creates a separate table to store embeddings and a view that joins with the source table:

```sql
SELECT ai.create_vectorizer(
'blog'::regclass,
name => 'blog_vectorizer', -- Optional custom name for easier reference
loading => ai.loading_column('contents'),
embedding => ai.embedding_ollama('nomic-embed-text', 768),
destination => ai.destination_table(
target_schema => 'public',
target_table => 'blog_embeddings_store',
view_name => 'blog_embeddings'
),
);
```

**When to use table destination:**
- When you need multiple embeddings per row (chunking)
- For large text fields that need to be split
- You are vectorizing documents (which typically require chunking)

### 2. Column Destination

For simpler cases, you can add an embedding column directly to the source table. This can only be used when the vectorizer does not perform chunking because it requires a one-to-one relationship between the source data and the embedding. This is useful in cases where you know the source text is short (as is common if the chunking has already been done upstream in your data pipeline).

The workflow is that your application inserts data into the table with a NULL in the embedding column. The vectorizer will then read the row, generate the embedding and update the row with the correct value in the embedding column.
```sql
SELECT ai.create_vectorizer(
'product_descriptions'::regclass,
name => 'product_descriptions_vectorizer',
loading => ai.loading_column('description'),
embedding => ai.embedding_openai('text-embedding-3-small', 768),
chunking => ai.chunking_none(), -- Required for column destination
destination => ai.destination_column('description_embedding')
);
```

**When to use column destination:**
- When you need exactly one embedding per row
- For shorter text that doesn't require chunking
- When your application already takes care of the chunking before inserting into the database
- When you want to avoid creating additional database objects

**Note:** Column destination requires chunking to be set to `ai.chunking_none()` since it can only store one embedding per row.

## Monitor a vectorizer

Since embeddings are created asynchronously, a delay may occur before they
Expand Down
17 changes: 12 additions & 5 deletions docs/vectorizer/python-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ Then you can create a vectorizer from python:

```python
from pgai.vectorizer import CreateVectorizer
from pgai.vectorizer.configuration import EmbeddingOpenaiConfig, ChunkingCharacterTextSplitterConfig, FormattingPythonTemplateConfig, LoadingColumnConfig
from pgai.vectorizer.configuration import EmbeddingOpenaiConfig, ChunkingCharacterTextSplitterConfig, FormattingPythonTemplateConfig, LoadingColumnConfig, DestinationTableConfig

vectorizer_statement = CreateVectorizer(
source="blog",
target_table='blog_embeddings',
name="blog_content_embedder", # Optional custom name for easier reference
destination=DestinationTableConfig(
destination='blog_embeddings'
),
loading=LoadingColumnConfig(column_name='content'),
embedding=EmbeddingOpenaiConfig(
model='text-embedding-3-small',
Expand Down Expand Up @@ -237,14 +240,18 @@ from pgai.vectorizer.configuration import (
EmbeddingOpenaiConfig,
ChunkingCharacterTextSplitterConfig,
FormattingPythonTemplateConfig,
LoadingColumnConfig
LoadingColumnConfig,
DestinationTableConfig
)


def upgrade() -> None:
op.create_vectorizer(
source="blog",
target_table='blog_embeddings',
name="blog_content_embedder", # Optional custom name for easier reference
destination=DestinationTableConfig(
destination='blog_embeddings'
),
loading=LoadingColumnConfig(column_name='content'),
embedding=EmbeddingOpenaiConfig(
model='text-embedding-3-small',
Expand All @@ -261,7 +268,7 @@ def upgrade() -> None:


def downgrade() -> None:
op.drop_vectorizer(target_table="blog_embeddings", drop_all=True)
op.drop_vectorizer(name="blog_content_embedder", drop_all=True)
```

The `create_vectorizer` operation supports all configuration options available in the [SQL API](/docs/vectorizer/api-reference.md).
4 changes: 2 additions & 2 deletions docs/vectorizer/quick-start-openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ To create and run a vectorizer, then query the auto-generated embeddings created
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_openai('text-embedding-3-small', 768)
embedding => ai.embedding_openai('text-embedding-3-small', 768),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down
4 changes: 2 additions & 2 deletions docs/vectorizer/quick-start-voyage.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,11 @@ Now you can create and run a vectorizer. A vectorizer is a pgai concept, it proc
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_voyageai(
'voyage-3-lite',
512
)
),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down
4 changes: 2 additions & 2 deletions docs/vectorizer/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ Now we can create and run a vectorizer. A vectorizer is a pgai concept, it proce
SELECT ai.create_vectorizer(
'blog'::regclass,
loading => ai.loading_column('contents'),
destination => 'blog_contents_embeddings',
embedding => ai.embedding_ollama('nomic-embed-text', 768)
embedding => ai.embedding_ollama('nomic-embed-text', 768),
destination => ai.destination_table('blog_contents_embeddings')
);
```

Expand Down
10 changes: 5 additions & 5 deletions examples/embeddings_from_documents/documents/pgai.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,9 @@ Please note that using Ollama requires a large (>4GB) download of the docker ima
```sql
SELECT ai.create_vectorizer(
'wiki'::regclass,
destination => 'wiki_embeddings',
embedding => ai.embedding_ollama('all-minilm', 384),
chunking => ai.chunking_recursive_character_text_splitter('text')
chunking => ai.chunking_recursive_character_text_splitter('text'),
destination => ai.destination_table('wiki_embeddings')
);
```

Expand Down Expand Up @@ -477,9 +477,9 @@ With one line of code, you can define a vectorizer that creates embeddings for d
```sql
SELECT ai.create_vectorizer(
<table_name>::regclass,
destination => <embedding_table_name>,
embedding => ai.embedding_ollama(<model_name>, <dimensions>),
chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
destination => ai.destination_table(<embedding_table_name>),
embedding => ai.embedding_ollama('all-minilm', 384),
chunking => ai.chunking_recursive_character_text_splitter('text')
);
```
This newly created vectorizer will automatically track any changes to the
Expand Down
6 changes: 3 additions & 3 deletions examples/evaluations/litellm_vectorizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ The evaluation generates diverse question types (short, long, direct, implied, a

SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_cohere_embeddings',
destination => ai.destination_table('essays_cohere_embeddings'),
embedding => ai.embedding_litellm(
'cohere/embed-english-v3.0',
1024,
Expand All @@ -65,7 +65,7 @@ The evaluation generates diverse question types (short, long, direct, implied, a

SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_mistral_embeddings',
destination => ai.destination_table('essays_mistral_embeddings'),
embedding => ai.embedding_litellm(
'mistral/mistral-embed',
1024,
Expand All @@ -76,7 +76,7 @@ The evaluation generates diverse question types (short, long, direct, implied, a

SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_openai_small_embeddings',
destination => ai.destination_table('essays_openai_small_embeddings'),
embedding => ai.embedding_openai(
'text-embedding-3-small',
1024,
Expand Down
8 changes: 4 additions & 4 deletions examples/evaluations/ollama_vectorizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'pg_essays'::regclass,
loading => ai.loading_column('text'),
destination => 'essays_nomic_embeddings',
destination => ai.destination_table('essays_nomic_embeddings'),
embedding => ai.embedding_ollama('nomic-embed-text', 768),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand All @@ -70,7 +70,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'pg_essays'::regclass,
loading => ai.loading_column('text'),
destination => 'essays_openai_small_embeddings',
destination => ai.destination_table('essays_openai_small_embeddings'),
embedding => ai.embedding_openai('text-embedding-3-small', 768),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand All @@ -79,7 +79,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'pg_essays'::regclass,
loading => ai.loading_column('text'),
destination => 'essays_bge_large_embeddings',
destination => ai.destination_table('essays_bge_large_embeddings'),
embedding => ai.embedding_ollama('bge-large', 1024),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand All @@ -88,7 +88,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'pg_essays'::regclass,
loading => ai.loading_column('text'),
destination => 'essays_openai_large_embeddings',
destination => ai.destination_table('essays_openai_large_embeddings'),
embedding => ai.embedding_openai('text-embedding-3-large', 1536),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand Down
4 changes: 2 additions & 2 deletions examples/evaluations/voyage_vectorizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'sec_filings'::regclass,
loading => ai.loading_column('text'),
destination => 'sec_filings_openai_embeddings',
destination => ai.destination_table('sec_filings_openai_embeddings'),
embedding => ai.embedding_openai('text-embedding-3-small', 768),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand All @@ -78,7 +78,7 @@ Dataset Setup:
SELECT ai.create_vectorizer(
'sec_filings'::regclass,
loading => ai.loading_column('text'),
destination => 'sec_filings_voyage_embeddings',
destination => ai.destination_table('sec_filings_voyage_embeddings'),
embedding => ai.embedding_voyageai('voyage-finance-2', 1024),
chunking => ai.chunking_recursive_character_text_splitter(512, 50)
);
Expand Down
4 changes: 2 additions & 2 deletions projects/pgai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ follows:
SELECT ai.create_vectorizer(
'wiki'::regclass,
loading => ai.loading_column(column_name=>'text'),
destination => ai.destination_table(target_table=>'wiki_embedding_storage'),
embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536')
embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536'),
destination => ai.destination_table(target_table=>'wiki_embedding_storage')
)
```

Expand Down
Loading