Skip to content

Commit 1c13a14

Browse files
docs: update README for pipeline, batch, logging, completions, benchmarks
- Add distill pipeline and distill completion to CLI commands section - Document POST /v1/pipeline and POST /v1/batch/* API endpoints - Add Pipeline API and Batch API usage examples - Add Logging section with pkg/logging usage - Update roadmap table: mark #4, #11, #24, #26, #27, #28 as Shipped - Mark #30, #32, #33 as Shipped in Code Intelligence / Infrastructure tables Co-authored-by: Ona <no-reply@ona.com>
1 parent 68f7248 commit 1c13a14

1 file changed

Lines changed: 114 additions & 19 deletions

File tree

README.md

Lines changed: 114 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -366,15 +366,49 @@ The `preserve_recent` setting (default: 10) keeps the most recent entries at ful
366366
## CLI Commands
367367

368368
```bash
369-
distill api # Start standalone API server
370-
distill serve # Start server with vector DB connection
371-
distill mcp # Start MCP server for AI assistants
372-
distill memory # Store, recall, and manage persistent context memories
373-
distill session # Manage token-budgeted context windows for agent sessions
374-
distill analyze # Analyze a file for duplicates
375-
distill sync # Upload vectors to Pinecone with dedup
376-
distill query # Test a query from command line
377-
distill config # Manage configuration files
369+
distill api # Start standalone API server
370+
distill serve # Start server with vector DB connection
371+
distill pipeline # Run full optimisation pipeline (dedup → compress → summarize)
372+
distill mcp # Start MCP server for AI assistants
373+
distill memory # Store, recall, and manage persistent context memories
374+
distill session # Manage token-budgeted context windows for agent sessions
375+
distill analyze # Analyze a file for duplicates
376+
distill sync # Upload vectors to Pinecone with dedup
377+
distill query # Test a query from command line
378+
distill config # Manage configuration files
379+
distill completion # Generate shell completion scripts (bash/zsh/fish/powershell)
380+
```
381+
382+
### Pipeline command
383+
384+
```bash
385+
# Run full pipeline on a JSON chunk array
386+
echo '[{"id":"1","text":"..."}]' | distill pipeline
387+
388+
# From file, with stats
389+
distill pipeline --input chunks.json --output optimised.json --stats
390+
391+
# Tune individual stages
392+
distill pipeline --dedup-threshold 0.2 --compress-ratio 0.4 --summarize --summarize-max-tokens 2000
393+
394+
# Disable a stage
395+
distill pipeline --no-compress
396+
```
397+
398+
### Shell completions
399+
400+
```bash
401+
# Bash (one-time)
402+
distill completion bash > /etc/bash_completion.d/distill
403+
404+
# Zsh
405+
distill completion zsh > "${fpath[1]}/_distill"
406+
407+
# Fish
408+
distill completion fish > ~/.config/fish/completions/distill.fish
409+
410+
# PowerShell
411+
distill completion powershell | Out-String | Invoke-Expression
378412
```
379413

380414
## API Endpoints
@@ -383,6 +417,10 @@ distill config # Manage configuration files
383417
|--------|------|-------------|
384418
| POST | `/v1/dedupe` | Deduplicate chunks |
385419
| POST | `/v1/dedupe/stream` | SSE streaming dedup with per-stage progress |
420+
| POST | `/v1/pipeline` | Full optimisation pipeline (dedup → compress → summarize) |
421+
| POST | `/v1/batch` | Submit async batch job |
422+
| GET | `/v1/batch/{id}` | Poll batch job status and progress |
423+
| GET | `/v1/batch/{id}/results` | Retrieve completed batch results |
386424
| POST | `/v1/retrieve` | Query vector DB with dedup (requires backend) |
387425
| POST | `/v1/memory/store` | Store memories with write-time dedup (requires `--memory`) |
388426
| POST | `/v1/memory/recall` | Recall memories by relevance + recency (requires `--memory`) |
@@ -396,6 +434,58 @@ distill config # Manage configuration files
396434
| GET | `/health` | Health check |
397435
| GET | `/metrics` | Prometheus metrics |
398436

437+
### Pipeline API
438+
439+
```json
440+
POST /v1/pipeline
441+
{
442+
"chunks": [{"id": "1", "text": "..."}],
443+
"options": {
444+
"dedup": {"enabled": true, "threshold": 0.15},
445+
"compress": {"enabled": true, "target_reduction": 0.5},
446+
"summarize": {"enabled": false, "max_tokens": 4000}
447+
}
448+
}
449+
```
450+
451+
Response includes per-stage token counts, reduction ratios, and latency.
452+
453+
### Batch API
454+
455+
```bash
456+
# Submit
457+
curl -X POST /v1/batch -d '{"chunks":[...],"options":{...}}'
458+
# → {"job_id":"batch_1234","status":"queued"}
459+
460+
# Poll
461+
curl /v1/batch/batch_1234
462+
# → {"status":"processing","progress":0.45}
463+
464+
# Results (when completed)
465+
curl /v1/batch/batch_1234/results
466+
# → {"chunks":[...],"stats":{...}}
467+
```
468+
469+
## Logging
470+
471+
Distill uses structured `log/slog` logging. Default output is JSON to stderr.
472+
473+
```go
474+
import "github.com/Siddhant-K-code/distill/pkg/logging"
475+
476+
// JSON logger (production default)
477+
logger := logging.New(logging.Config{Level: "info", Format: logging.FormatJSON})
478+
479+
// Text logger for local development
480+
logger := logging.NewDebug()
481+
482+
// Attach request context
483+
logger = logging.WithRequestID(logger, requestID)
484+
logger = logging.WithTraceID(logger, traceID)
485+
```
486+
487+
Log levels: `debug`, `info` (default), `warn`, `error`.
488+
399489
## Configuration
400490

401491
### Config File
@@ -973,19 +1063,24 @@ Distill is evolving from a dedup utility into a context intelligence layer. Here
9731063

9741064
### Code Intelligence
9751065

976-
| Feature | Issue | Description |
977-
|---------|-------|-------------|
978-
| **Change Impact Graph** | [#30](https://github.com/Siddhant-K-code/distill/issues/30) | Dependency graph + co-change patterns from git history. "This PR changes auth/jwt.go - here's the blast radius." |
979-
| **Semantic Commit Analysis** | [#32](https://github.com/Siddhant-K-code/distill/issues/32) | Find similar past changes, predict incidents. "This diff is 82% similar to the one that caused outage #47." |
1066+
| Feature | Issue | Status | Description |
1067+
|---------|-------|--------|-------------|
1068+
| **Change Impact Graph** | [#30](https://github.com/Siddhant-K-code/distill/issues/30) | Shipped | `pkg/graph`: BFS blast-radius queries over a dependency graph built from Go imports. |
1069+
| **Semantic Commit Analysis** | [#32](https://github.com/Siddhant-K-code/distill/issues/32) | Shipped | `pkg/commits`: Conventional Commits parser, heuristic risk scoring, cosine similarity search over commit embeddings. |
9801070

9811071
### Infrastructure
9821072

983-
| Feature | Issue | Description |
984-
|---------|-------|-------------|
985-
| **Multi-Provider Embeddings** | [#33](https://github.com/Siddhant-K-code/distill/issues/33) | Ollama, Azure OpenAI, Cohere, HuggingFace. Swap providers via config. |
986-
| **Batch API** | [#11](https://github.com/Siddhant-K-code/distill/issues/11) | Async batch processing for large workloads. |
987-
| **Python SDK** | [#5](https://github.com/Siddhant-K-code/distill/issues/5) | `pip install distill-ai` with LangChain/LlamaIndex integrations. |
988-
| **OpenAPI Spec** | [#23](https://github.com/Siddhant-K-code/distill/issues/23) | Swagger UI at `/docs`, auto-generated client SDKs. |
1073+
| Feature | Issue | Status | Description |
1074+
|---------|-------|--------|-------------|
1075+
| **Multi-Provider Embeddings** | [#33](https://github.com/Siddhant-K-code/distill/issues/33) | Shipped | `embedding.NewProvider` factory: OpenAI, Ollama, Cohere via unified `ProviderConfig`. |
1076+
| **Unified Pipeline** | [#4](https://github.com/Siddhant-K-code/distill/issues/4) | Shipped | `POST /v1/pipeline` + `distill pipeline` CLI: dedup → compress → summarize in one call with per-stage stats. |
1077+
| **Batch API** | [#11](https://github.com/Siddhant-K-code/distill/issues/11) | Shipped | `POST /v1/batch`: async job queue with worker pool, progress polling, 24h result retention. |
1078+
| **Structured Logging** | [#27](https://github.com/Siddhant-K-code/distill/issues/27) | Shipped | `pkg/logging`: JSON/text slog logger with debug/info/warn/error levels, request_id and trace_id helpers. |
1079+
| **Shell Completions** | [#26](https://github.com/Siddhant-K-code/distill/issues/26) | Shipped | `distill completion [bash\|zsh\|fish\|powershell]` generates shell completion scripts. |
1080+
| **Benchmark Suite** | [#24](https://github.com/Siddhant-K-code/distill/issues/24) | Shipped | `go test -bench=. ./...` covers cluster, MMR, selector, and compress with deterministic synthetic data. |
1081+
| **Makefile** | [#28](https://github.com/Siddhant-K-code/distill/issues/28) | Shipped | 20+ targets: build, test, bench, lint, fmt, vet, docker, release. |
1082+
| **Python SDK** | [#5](https://github.com/Siddhant-K-code/distill/issues/5) | Planned | `pip install distill-ai` with LangChain/LlamaIndex integrations. |
1083+
| **OpenAPI Spec** | [#23](https://github.com/Siddhant-K-code/distill/issues/23) | Planned | Swagger UI at `/docs`, auto-generated client SDKs. |
9891084

9901085
See all open issues: [github.com/Siddhant-K-code/distill/issues](https://github.com/Siddhant-K-code/distill/issues)
9911086

0 commit comments

Comments
 (0)