Skip to content

Commit bc14fac

Browse files
authored
Merge pull request #25 from llbbl/feat/unit-testing
Remove cloud embedding config and add just tasks
2 parents 5e4be87 + 28071ac commit bc14fac

25 files changed

Lines changed: 274 additions & 409 deletions

.env.example

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,3 @@ TURSO_DB_URL=libsql://your-database.turso.io
44
TURSO_AUTH_TOKEN=your-auth-token-here
55

66
# Note: If credentials are not set, scripts with :local suffix will use file:local.db instead
7-
8-
# Embedding Provider (local, gemini, or openai)
9-
EMBEDDING_PROVIDER=local
10-
11-
# Optional: API Keys for cloud embedding providers
12-
# GEMINI_API_KEY=your-gemini-api-key
13-
# OPENAI_API_KEY=your-openai-api-key

.github/workflows/ci.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,13 @@ jobs:
2626
- name: Install dependencies
2727
run: pnpm install
2828

29-
- name: Run tests
30-
run: pnpm test --run
29+
- name: Run tests with coverage
30+
run: pnpm test:coverage --run
31+
32+
- name: Upload coverage reports to Codecov
33+
uses: codecov/codecov-action@v5
34+
with:
35+
token: ${{ secrets.CODECOV_TOKEN }}
3136

3237
- name: Initialize local database
3338
run: pnpm db:init:local

CLAUDE.md

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ pnpm format # Format all files
5252

5353
### Content Flow
5454
1. **Markdown → Database**: Content in `./content` is indexed into Turso via `scripts/index-content.ts`
55-
2. **libsql-search**: Handles embedding generation (local/Gemini/OpenAI), vector storage, and semantic search
55+
2. **libsql-search**: Handles embedding generation (local), vector storage, and semantic search
5656
3. **Static Generation**: Article pages are pre-rendered at build time using `getStaticPaths()`
5757
4. **Server Search**: Search API runs server-side at `/api/search.json` (requires `output: 'server'` with Node.js adapter)
5858

@@ -73,29 +73,20 @@ pnpm format # Format all files
7373
Optional in `.env`:
7474
- `TURSO_DB_URL`: Turso database URL (libsql://...) - if not set, uses local libSQL file
7575
- `TURSO_AUTH_TOKEN`: Turso authentication token - if not set, uses local libSQL file
76-
- `EMBEDDING_PROVIDER`: "local" (default), "gemini", or "openai"
77-
- Optional: `GEMINI_API_KEY` or `OPENAI_API_KEY` (if using cloud providers)
76+
77+
Embeddings run locally by default; no API keys are required.
7878

7979
**Local Development**: If Turso credentials aren't provided, the project automatically falls back to a local SQLite file (`local.db`) for database operations. This is useful for CI builds and local development without cloud dependencies.
8080

8181
## Critical Configuration
8282

83-
### Server-Side Rendering + Dual Adapter Support
83+
### Server-Side Rendering + Node Adapter
8484
The search API endpoint requires SSR with an adapter. The configuration uses:
8585
- `output: 'server'` - Enables server-side rendering
86-
- **Dual adapter support** - Conditionally uses Node.js or Cloudflare adapter based on `ADAPTER` env var
87-
- Default: `node({ mode: 'standalone' })` - For traditional Node.js deployments
88-
- Cloudflare: `cloudflare()` - For Cloudflare Workers (set `ADAPTER=cloudflare`)
86+
- `node({ mode: 'standalone' })` - For traditional Node.js deployments
8987
- Article pages marked with `prerender: true` are pre-rendered as static HTML
9088
- Search API marked with `prerender: false` runs server-side
9189

92-
**Adapter Selection** (astro.config.mjs:14-16):
93-
```js
94-
const adapter = process.env.ADAPTER === 'cloudflare'
95-
? cloudflare()
96-
: node({ mode: 'standalone' });
97-
```
98-
9990
**Never** remove the adapter or change output to 'static', or the search API will break.
10091

10192
### Content Structure
@@ -129,11 +120,11 @@ The project relies heavily on libsql-search. When modifying search behavior:
129120
3. Update `src/pages/api/search.json.ts` for search query changes
130121
4. Maintain embedding dimension consistency (768) across indexing and search
131122

132-
### Customizing Embedding Providers
133-
To switch providers, update `.env` and ensure API keys are set. The dimension (768) must match across:
123+
### Customizing Embeddings
124+
The embedding dimension (768) must match across:
134125
- `scripts/index-content.ts` (createTable and indexContent)
135-
- Search API (automatically uses same provider)
136-
- Re-index content after switching providers
126+
- Search API
127+
- Re-index content after changing the embedding model or dimension
137128

138129
### Styling
139130
- Uses Tailwind CSS 4 via Vite plugin
@@ -152,7 +143,6 @@ To switch providers, update `.env` and ensure API keys are set. The dimension (7
152143

153144
## Deployment Options
154145

155-
> **Note**: Cloudflare Workers/Pages deployment is being developed in the `cloudflare-workers` branch.
156146

157147
### Node.js Platforms (Vercel, Netlify, etc.)
158148

@@ -232,4 +222,4 @@ chore: bump dependencies
232222
### Release Process
233223
1. Make commits following the convention above
234224
2. Create and push a version tag: `git tag v1.2.3 && git push --tags`
235-
3. GitHub Actions will automatically generate changelog and create release
225+
3. GitHub Actions will automatically generate changelog and create release

Dockerfile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ FROM node:20-slim AS builder
77
# Build arguments for Turso credentials (required for indexing and pre-rendering)
88
ARG TURSO_DB_URL
99
ARG TURSO_AUTH_TOKEN
10-
ARG EMBEDDING_PROVIDER=local
1110

1211
# Install pnpm
1312
RUN corepack enable && corepack prepare pnpm@latest --activate
@@ -27,7 +26,6 @@ COPY . .
2726
# Set environment variables for build
2827
ENV TURSO_DB_URL=$TURSO_DB_URL
2928
ENV TURSO_AUTH_TOKEN=$TURSO_AUTH_TOKEN
30-
ENV EMBEDDING_PROVIDER=$EMBEDDING_PROVIDER
3129

3230
# Index content to Turso database (env vars already set via ENV directives)
3331
RUN pnpm exec tsx scripts/init-db.ts && pnpm exec tsx scripts/index-content.ts

Makefile

Lines changed: 0 additions & 63 deletions
This file was deleted.

README.md

Lines changed: 15 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# semantic-docs
22

3+
[![Coverage](https://img.shields.io/codecov/c/github/llbbl/semantic-docs?label=coverage)](https://codecov.io/gh/llbbl/semantic-docs) [![CI](https://github.com/llbbl/semantic-docs/actions/workflows/ci.yml/badge.svg)](https://github.com/llbbl/semantic-docs/actions/workflows/ci.yml) [![Release](https://img.shields.io/github/v/release/llbbl/semantic-docs)](https://github.com/llbbl/semantic-docs/releases)
4+
35
Documentation theme with semantic vector search.
46

57
A beautiful, dark-mode documentation theme powered by [libsql-search](https://github.com/llbbl/libsql-search) for semantic search capabilities. Perfect for technical documentation, knowledge bases, and content-heavy sites.
@@ -31,6 +33,16 @@ Or use as a template on GitHub.
3133
pnpm install
3234
```
3335

36+
Optional: use the `justfile` task runner for common commands:
37+
38+
```bash
39+
just
40+
just dev
41+
just test
42+
```
43+
44+
See `docs/just.md` for the full list of recipes.
45+
3446
### 3. Set Up Environment
3547

3648
Copy `.env.example` to `.env` and add your credentials:
@@ -44,7 +56,6 @@ Edit `.env`:
4456
```env
4557
TURSO_DB_URL=libsql://your-database.turso.io
4658
TURSO_AUTH_TOKEN=your-auth-token
47-
EMBEDDING_PROVIDER=local
4859
```
4960

5061
**Get Turso credentials:**
@@ -130,21 +141,9 @@ const { title = "Your Site Name", description = "Your description" } = Astro.pro
130141

131142
Edit `src/styles/global.css` to change the color scheme. The theme uses OKLCH colors for smooth gradients and perceptual uniformity.
132143

133-
### Change Embedding Provider
134-
135-
**Use Gemini** (free tier: 1,500 requests/day):
136-
137-
```env
138-
EMBEDDING_PROVIDER=gemini
139-
GEMINI_API_KEY=your-key
140-
```
141-
142-
**Use OpenAI** (paid):
144+
### Embeddings
143145

144-
```env
145-
EMBEDDING_PROVIDER=openai
146-
OPENAI_API_KEY=your-key
147-
```
146+
Semantic search uses local embeddings by default, so no API keys are required.
148147

149148
## Project Structure
150149

@@ -178,8 +177,6 @@ semantic-docs/
178177

179178
## Deployment
180179

181-
> **Note**: Cloudflare Workers/Pages deployment support is currently in development on the `cloudflare-workers` branch.
182-
183180
### Container-Based Platforms (Recommended)
184181

185182
This project is designed to run on platforms that support Docker containers, such as:
@@ -278,21 +275,14 @@ pnpm preview
278275

279276
First run downloads ~50MB model. Subsequent runs use cache.
280277

281-
Use Gemini for faster embeddings:
282-
283-
```env
284-
EMBEDDING_PROVIDER=gemini
285-
GEMINI_API_KEY=your-key
286-
```
287-
288278
## Tech Stack
289279

290280
- **Framework**: [Astro](https://astro.build) 5
291281
- **Search**: [libsql-search](https://github.com/llbbl/libsql-search)
292282
- **Database**: [Turso](https://turso.tech) (libSQL)
293283
- **Styling**: [Tailwind CSS](https://tailwindcss.com) 4
294284
- **UI**: React islands for interactivity
295-
- **Embeddings**: Xenova, Gemini, or OpenAI
285+
- **Embeddings**: Xenova (local)
296286

297287
## License
298288

content/features/semantic-search.md

Lines changed: 2 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -117,15 +117,12 @@ const embedding = await embedder(text, {
117117
console.log(embedding.data); // [0.123, -0.456, 0.789, ...]
118118
```
119119

120-
### Popular Models
120+
### Popular Local Models
121121

122122
| Model | Dimensions | Speed | Quality |
123123
|-------|------------|-------|---------|
124124
| all-MiniLM-L6-v2 | 384 | Fast | Good |
125125
| all-mpnet-base-v2 | 768 | Medium | Better |
126-
| text-embedding-3-small (OpenAI) | 1536 | API | Excellent |
127-
| text-embedding-ada-002 (OpenAI) | 1536 | API | Excellent |
128-
| textembedding-gecko (Gemini) | 768 | API | Excellent |
129126

130127
## Implementation in Astro Vault
131128

@@ -351,7 +348,7 @@ async function hybridSearch(query: string) {
351348
- **E-commerce**: Product codes (full-text) + descriptions (semantic)
352349
- **Code search**: Function names (full-text) + purpose (semantic)
353350

354-
## Embedding Providers
351+
## Embeddings
355352

356353
### Local (Xenova Transformers)
357354
```typescript
@@ -364,31 +361,6 @@ const embedder = await pipeline('feature-extraction',
364361
const embedding = await embedder(text);
365362
```
366363

367-
### OpenAI
368-
```typescript
369-
// Pros: High quality, fast
370-
// Cons: Costs money, rate limits
371-
372-
import { OpenAI } from 'openai';
373-
const openai = new OpenAI();
374-
const response = await openai.embeddings.create({
375-
model: 'text-embedding-3-small',
376-
input: text,
377-
});
378-
const embedding = response.data[0].embedding;
379-
```
380-
381-
### Gemini
382-
```typescript
383-
// Pros: High quality, generous free tier
384-
// Cons: Rate limits
385-
386-
import { GoogleGenerativeAI } from '@google/generative-ai';
387-
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
388-
const model = genAI.getGenerativeModel({ model: 'embedding-001' });
389-
const result = await model.embedContent(text);
390-
const embedding = result.embedding.values;
391-
```
392364

393365
## Use Cases
394366

@@ -409,5 +381,4 @@ const embedding = result.embedding.values;
409381

410382
- **Xenova Transformers**: [huggingface.co/docs/transformers.js](https://huggingface.co/docs/transformers.js)
411383
- **Sentence Transformers**: [sbert.net](https://www.sbert.net/)
412-
- **OpenAI Embeddings**: [platform.openai.com/docs/guides/embeddings](https://platform.openai.com/docs/guides/embeddings)
413384
- **Vector Search Explained**: [pinecone.io/learn/vector-database](https://www.pinecone.io/learn/vector-database/)

content/getting-started/welcome.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Check out the README for customization options:
4141

4242
- Change colors
4343
- Update site title
44-
- Configure embedding providers
44+
- Tune search behavior
4545
- And more!
4646

4747
---

content/theme/overview.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Semantic Docs is a modern documentation theme built with Astro, featuring semant
1313

1414
### Semantic Vector Search
1515
- **Vector embeddings**: Content is indexed with 768-dimension embeddings
16-
- **Three embedding providers**: Local (onnxruntime), Gemini, or OpenAI
16+
- **Local embeddings**: Runs on-device with no API keys required
1717
- **Fast semantic search**: Natural language queries return relevant results
1818
- **Edge-optimized**: Runs on Turso's edge database for low latency
1919

@@ -116,12 +116,6 @@ semantic-docs/
116116
TURSO_DB_URL=libsql://your-db.turso.io
117117
TURSO_AUTH_TOKEN=your-token
118118

119-
# Embedding provider (optional, defaults to "local")
120-
EMBEDDING_PROVIDER=local # or "gemini" or "openai"
121-
122-
# API keys (if using cloud embeddings)
123-
GEMINI_API_KEY=your-key
124-
OPENAI_API_KEY=your-key
125119
```
126120

127121
### Astro Configuration

docs/SECURITY.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,10 @@ For multi-server deployments, consider:
5050
```
5151

5252
2. **Edge rate limiting** (Platform-specific)
53-
- Cloudflare Workers: Use Durable Objects
5453
- Vercel: Use Edge Config or KV
5554
- Netlify: Use Blobs
5655

5756
3. **WAF/CDN rate limiting**
58-
- Cloudflare: Configure rate limiting rules
5957
- AWS CloudFront: Lambda@Edge
6058
- Fastly: VCL rate limiting
6159

@@ -72,15 +70,6 @@ The API limits:
7270
- Risk: CPU abuse
7371
- Mitigation: Rate limiting sufficient
7472

75-
**Gemini Provider** (Free tier: 1,500 req/day)
76-
- Risk: API quota exhaustion
77-
- Mitigation: Consider stricter rate limits (5-10 req/min)
78-
79-
**OpenAI Provider** (Paid)
80-
- Risk: Cost abuse
81-
- Mitigation: Monitor usage, alert on anomalies
82-
- Recommendation: Use OpenAI's own rate limiting
83-
8473
### Turso Database Limits
8574

8675
Free tier limits:

0 commit comments

Comments
 (0)