Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,13 @@ EMBEDDING_TIMEOUT=30
# EMBEDDING_BASE_URL=http://localhost:8000/v1

# ========================================
# Milvus Configuration
# Vector Backend Configuration
# ========================================
# Supported values: milvus, pgvector
VECTOR_BACKEND=milvus

# ========================================
# Milvus Configuration (VECTOR_BACKEND=milvus)
# ========================================
# Milvus host
MILVUS_HOST=localhost
Expand All @@ -128,6 +134,16 @@ MILVUS_PORT=19530
MILVUS_ENTRIES_COLLECTION=entries
MILVUS_PREFS_COLLECTION=user_preferences

# ========================================
# pgvector Configuration (VECTOR_BACKEND=pgvector)
# ========================================
# Optional override. Falls back to DATABASE_URL when empty.
# Note: pgvector uses its own async pool even when it points at DATABASE_URL.
PGVECTOR_DATABASE_URL=
PGVECTOR_ENTRIES_TABLE=entry_embeddings
PGVECTOR_PREFS_TABLE=user_preference_vectors
PGVECTOR_METADATA_TABLE=vector_store_metadata

# ========================================
# Preference Configuration
# ========================================
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: glean_test
POSTGRES_USER: glean
Expand Down
7 changes: 7 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ The project includes multiple Docker Compose configurations for different use ca
# Basic deployment (without admin dashboard)
docker compose up -d

# pgvector backend deployment
docker compose -f docker-compose.pgvector.yml up -d

# Full deployment with admin dashboard
docker compose --profile admin up -d

Expand All @@ -55,11 +58,15 @@ IMAGE_TAG=v0.3.0-alpha.1 docker compose up -d
# Start development infrastructure (PostgreSQL, Redis, Milvus)
docker compose -f docker-compose.dev.yml up -d

# Start development infrastructure (PostgreSQL with pgvector, Redis)
docker compose -f docker-compose.dev.pgvector.yml up -d

# View logs
docker compose -f docker-compose.dev.yml logs -f

# Stop services
docker compose -f docker-compose.dev.yml down
docker compose -f docker-compose.dev.pgvector.yml down
```

### Local Development with Override
Expand Down
112 changes: 80 additions & 32 deletions DEPLOY.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ This guide provides comprehensive instructions for deploying Glean in production

## Quick Deployment

### Full Deployment (Recommended)
### Using pgvector (Recommended)

Includes Milvus for Phase 3 features (smart recommendations, preference learning):
Uses PostgreSQL's built-in pgvector extension for vector storage. No additional infrastructure required:

```bash
# Download docker-compose.yml
curl -fsSL https://raw.githubusercontent.com/LeslieLeung/glean/main/docker-compose.yml -o docker-compose.yml
# Download pgvector compose
curl -fsSL https://raw.githubusercontent.com/LeslieLeung/glean/main/docker-compose.pgvector.yml -o docker-compose.yml

# (Optional) Create .env file to customize admin credentials
cat > .env << EOF
Expand All @@ -50,13 +50,18 @@ docker compose up -d
- Password: `Admin123!`
- ⚠️ **Change this password in production!**

### Lite Deployment (Without Milvus)
**Next steps**:
1. Log in to admin dashboard at http://localhost:3001
2. Change the default password
3. Configure additional environment variables for production (see [Environment Configuration](#environment-configuration))

### Using Milvus

For lighter deployments if you don't need Phase 3 features:
Uses a dedicated Milvus vector database for vector storage:

```bash
# Download lite version
curl -fsSL https://raw.githubusercontent.com/LeslieLeung/glean/main/docker-compose.lite.yml -o docker-compose.yml
# Download docker-compose.yml
curl -fsSL https://raw.githubusercontent.com/LeslieLeung/glean/main/docker-compose.yml -o docker-compose.yml

# (Optional) Create .env file to customize admin credentials
cat > .env << EOF
Expand All @@ -65,24 +70,22 @@ ADMIN_PASSWORD=$(openssl rand -base64 24)
SECRET_KEY=$(openssl rand -base64 32)
EOF

# ⚠️ IMPORTANT: Save the generated passwords
# ⚠️ IMPORTANT: Save the generated passwords before proceeding!
cat .env

# Start services
# Start all services
docker compose up -d

# Access:
# - Web App: http://localhost
# - Admin Dashboard: http://localhost:3001 (default: admin / Admin123!)
```

**Default admin account**: If you don't create a `.env` file, the default credentials are:
- Username: `admin`
- Password: `Admin123!`
- Dashboard: http://localhost:3001
- ⚠️ **Change this password in production!**

**Next steps**:
1. Log in to admin dashboard at http://localhost:3001
2. Change the default password
3. Configure additional environment variables for production (see [Environment Configuration](#environment-configuration))

### Testing Pre-release Versions

Pre-release versions (alpha/beta/rc) are available for testing upcoming features:
Expand Down Expand Up @@ -233,9 +236,43 @@ docker compose logs backend | grep "Admin Account Created"

## Service Architecture

### Full Deployment
### Using pgvector (Recommended)

Uses PostgreSQL with pgvector extension for vector storage (6 services total). Use `docker-compose.pgvector.yml` for this configuration.

**Services:**

| Service | Container Name | Description | Dependencies |
| ---------- | -------------- | ----------------------------------- | ------------------ |
| postgres | glean-postgres | PostgreSQL 16 with pgvector | - |
| redis | glean-redis | Redis 8 for task queue | - |
| backend | glean-backend | FastAPI REST API server | postgres, redis |
| worker | glean-worker | arq background worker (feed sync) | postgres, redis |
| web | glean-web | React web frontend (nginx) | backend |
| admin | glean-admin | Admin dashboard (nginx) | backend |

**Data persistence:**
- `postgres_data` - PostgreSQL database files (including vector data)
- `redis_data` - Redis persistence (AOF)
- `glean_logs` - Application logs (backend + worker)

**Networking:**
- All services communicate via `glean-network` bridge network
- Only `web` (port 80) and `admin` (port 3001) are exposed to host

**Recommended pgvector index:**

Glean consists of 9 services orchestrated by Docker Compose:
Run this after migrations if you expect non-trivial similarity-search volume. Adjust the table name if you customized `PGVECTOR_ENTRIES_TABLE`.

```sql
CREATE INDEX IF NOT EXISTS idx_entry_embeddings_embedding_hnsw
ON entry_embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
```

### Using Milvus

Uses a dedicated Milvus vector database (9 services total). Use `docker-compose.yml` for this configuration.

**Core services:**

Expand All @@ -248,7 +285,7 @@ Glean consists of 9 services orchestrated by Docker Compose:
| web | glean-web | React web frontend (nginx) | backend |
| admin | glean-admin | Admin dashboard (nginx) | backend |

**Milvus services (Phase 3 features):**
**Milvus services:**

| Service | Container Name | Description | Dependencies |
| ------------- | ------------------- | ------------------------------ | ------------------ |
Expand All @@ -263,10 +300,6 @@ Glean consists of 9 services orchestrated by Docker Compose:
4. `web` and `admin` start after backend is ready
5. `milvus-etcd` and `milvus-minio` start in parallel, then `milvus`

### Lite Deployment

Excludes Milvus services (6 services total). Use `docker-compose.lite.yml` for this configuration.

**Data persistence:**
- `postgres_data` - PostgreSQL database files
- `redis_data` - Redis persistence (AOF)
Expand Down Expand Up @@ -317,16 +350,20 @@ Excludes Milvus services (6 services total). Use `docker-compose.lite.yml` for t
| `LOG_RETENTION` | `30 days` | Log retention period |
| `LOG_COMPRESSION` | `gz` | Log compression format |

### Milvus Configuration (Phase 3 Features)
### Vector Backend Configuration

Milvus is optional and provides vector database capabilities for smart recommendations and preference learning.
Two vector backends are supported:

**Enable Milvus:**
```bash
docker compose --profile milvus up -d
```
- `pgvector` (in `docker-compose.pgvector.yml`, **recommended** — no extra infrastructure needed)
- `milvus` (in `docker-compose.yml` — for users who prefer a dedicated vector database)

**Backend selector:**

| Variable | Default | Description |
| ---------------- | -------- | --------------------------------------------- |
| `VECTOR_BACKEND` | `milvus` | Vector backend (`pgvector` or `milvus`) |

**Milvus connection settings:**
**Milvus connection settings (VECTOR_BACKEND=milvus):**

| Variable | Default | Description |
| ------------------------- | ----------- | --------------------------------- |
Expand All @@ -337,9 +374,20 @@ docker compose --profile milvus up -d
| `MILVUS_ENTRIES_COLLECTION` | `entries` | Collection name for entry vectors |
| `MILVUS_PREFS_COLLECTION` | `user_preferences` | Collection name for user preferences |

### Embedding Configuration (Phase 3 Features)
**pgvector settings (VECTOR_BACKEND=pgvector):**

| Variable | Default | Description |
| --------------------------- | -------------------------- | -------------------------------------------------------- |
| `PGVECTOR_DATABASE_URL` | - | Optional override, falls back to `DATABASE_URL` |
| `PGVECTOR_ENTRIES_TABLE` | `entry_embeddings` | Table name for entry vectors |
| `PGVECTOR_PREFS_TABLE` | `user_preference_vectors` | Table name for user preference vectors |
| `PGVECTOR_METADATA_TABLE` | `vector_store_metadata` | Table name for model signature metadata |

When `PGVECTOR_DATABASE_URL` is left empty, Glean connects pgvector to `DATABASE_URL` using its own async pool. That is fine for most installs, but count it separately when tuning PostgreSQL connection limits.

### Embedding Configuration

Required when using Milvus for smart recommendations:
Required for preference learning and smart recommendations:

| Variable | Default | Description |
| ---------------------- | ------------------------ | ------------------------------------------------ |
Expand Down
Loading
Loading