Embedding Generator and Store

A Cloudflare Worker that generates vector embeddings from text using Cloudflare Workers AI and stores them in Cloudflare Vectorize for efficient vector search and retrieval.

Overview

This service provides an API endpoint for generating and optionally storing text embeddings. It uses the @cf/baai/bge-base-en-v1.5 model provided by Cloudflare Workers AI to generate high-quality embeddings, which can then be stored in a Vectorize index for fast vector-based similarity search.

The worker is particularly useful for:

Converting document chunks or text fragments into vector embeddings
Organizing embeddings by collection, user, and file
Efficiently managing batch processing of large documents
Building search applications, RAG (Retrieval Augmented Generation) systems, or AI-powered document analysis tools

Features

Generate embeddings from text strings or arrays of text
Optional storage of embeddings in Cloudflare Vectorize
Batch processing support with configurable batch sizes
Detailed metadata storage including collection ID, file information, and chunk indexes
Error handling and logging

API Usage

Endpoint

POST /

Request Format

{
  "text": "String or array of strings to embed",
  "collection_id": "your-collection-identifier",
  "file_key": "user-id/collection-id/file-id/filename",
  "start_index": 0,
  "batch_size": 10,
  "total_chunks": 100,
  "store": true
}

Parameters

text: A string or array of strings to generate embeddings for
collection_id: Identifier for the collection this embedding belongs to
file_key: String in the format "user-id/collection-id/file-id/filename"
start_index: Starting chunk index for this batch
batch_size: Number of chunks in this batch (maximum 100)
total_chunks: Total number of chunks in the entire document
store: Boolean flag whether to store embeddings in Vectorize

Response Format

Success (with store=true)

{
  "status": "completed",
  "stored": true
}

Success (with store=false)

{
  "status": "completed",
  "stored": false,
  "embeddings": {
    "data": [
      [...embedding vector values...]
    ]
  }
}

Error

{
  "status": "error",
  "message": "Error message details"
}

Setup and Deployment

Prerequisites

Node.js (v16 or later)
Wrangler CLI
Cloudflare account with Workers AI and Vectorize enabled

Installation

Clone the repository:

git clone [repository-url]
cd embedding-generator

Install dependencies:
```
npm install
```
Configure your Vectorize index in the Cloudflare Dashboard or using Wrangler.

Development

Run the worker locally:

npm run dev

Deployment

Deploy to Cloudflare Workers:

npm run deploy

Configuration

Configuration is managed through the wrangler.jsonc file:

{
  "name": "embedding-generator-and-store",
  "main": "src/index.ts",
  "compatibility_date": "2025-02-11",
  "compatibility_flags": ["nodejs_compat"],
  "ai": {
    "binding": "AI"
  },
  "vectorize": [
    {
      "binding": "VECTORIZE",
      "index_name": "files-1"
    }
  ],
  "observability": {
    "enabled": true,
    "head_sampling_rate": 1
  }
}

Limitations

Maximum batch size is 100 text chunks at a time
Text encoding is determined by the @cf/baai/bge-base-en-v1.5 model
File path structure must follow the format: user-id/collection-id/file-id/filename

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
history_lab_indexes.md		history_lab_indexes.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embedding Generator and Store

Overview

Features

API Usage

Endpoint

Request Format

Parameters

Response Format

Success (with store=true)

Success (with store=false)

Error

Setup and Deployment

Prerequisites

Installation

Development

Deployment

Configuration

Limitations

License

About

Uh oh!

Releases

Packages

Languages

history-lab/embedding-generator

Folders and files

Latest commit

History

Repository files navigation

Embedding Generator and Store

Overview

Features

API Usage

Endpoint

Request Format

Parameters

Response Format

Success (with store=true)

Success (with store=false)

Error

Setup and Deployment

Prerequisites

Installation

Development

Deployment

Configuration

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages