Skip to content

feat(pipeline): unified dedup→compress→summarize pipeline (#4)#70

Merged
Siddhant-K-code merged 1 commit into
mainfrom
feat/4-pipeline
May 2, 2026
Merged

feat(pipeline): unified dedup→compress→summarize pipeline (#4)#70
Siddhant-K-code merged 1 commit into
mainfrom
feat/4-pipeline

Conversation

@Siddhant-K-code
Copy link
Copy Markdown
Owner

Closes #4

Summary

Adds a unified pipeline that chains dedup → compress → summarize in a single call, with per-stage stats. Exposed as both a CLI command and an HTTP endpoint.

pkg/pipeline

  • Runner.Run(ctx, chunks, Options) — executes enabled stages in order
  • Each stage independently enable/disable via Options
  • Stats includes per-stage: input/output tokens, reduction ratio, latency
  • DefaultOptions: dedup + compress enabled; summarize opt-in

distill pipeline CLI

# From stdin
echo '[{"id":"1","text":"..."}]' | distill pipeline

# From file with stats
distill pipeline --input chunks.json --output optimised.json --stats

# Tune stages
distill pipeline --dedup-threshold 0.2 --compress-ratio 0.4
distill pipeline --no-compress --summarize --summarize-max-tokens 2000

POST /v1/pipeline

{
  "chunks": [...],
  "options": {
    "dedup":     {"enabled": true, "threshold": 0.15},
    "compress":  {"enabled": true, "target_reduction": 0.5},
    "summarize": {"enabled": false}
  }
}

Response includes chunks and stats with per-stage breakdown.

Files

  • pkg/pipeline/pipeline.go + pipeline_test.go (9 tests)
  • cmd/pipeline.go — CLI command
  • cmd/api_pipeline.go — HTTP handlers (also wires batch routes)
  • cmd/api.go — registers /v1/pipeline and /v1/batch/*

@Siddhant-K-code Siddhant-K-code added the enhancement New feature or request label May 2, 2026
Implements issue #4. Adds:

pkg/pipeline:
- Runner.Run: chains ClusterByThreshold → ExtractiveCompressor →
  HierarchicalSummarizer; each stage independently enable/disable
- StageStats: per-stage input/output tokens, reduction ratio, latency
- DefaultOptions: dedup+compress enabled, summarize opt-in

cmd/pipeline.go:
- 'distill pipeline' CLI reads JSON chunks from stdin/file, writes
  optimised chunks to stdout/file; --stats prints per-stage breakdown

cmd/api_pipeline.go:
- POST /v1/pipeline: synchronous pipeline with full stats response
- POST /v1/batch: submit async job, returns job_id
- GET /v1/batch/{id}: poll status and progress
- GET /v1/batch/{id}/results: retrieve completed results

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code merged commit 1b600f7 into main May 2, 2026
0 of 2 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feat/4-pipeline branch May 2, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Unified Pipeline Command

1 participant