Skip to content

Add semantic compression module#13

Merged
Siddhant-K-code merged 2 commits into
mainfrom
feature/semantic-compression
Jan 31, 2026
Merged

Add semantic compression module#13
Siddhant-K-code merged 2 commits into
mainfrom
feature/semantic-compression

Conversation

@Siddhant-K-code
Copy link
Copy Markdown
Owner

Implements the semantic compression module as specified in #1.

Summary

Adds pkg/compress with three compression strategies to achieve 3-5× additional reduction on top of deduplication.

Components

  • Extractive (extractive.go) - Selects salient spans using sentence-level scoring based on position, length, and content signals
  • Placeholder (placeholder.go) - Compresses JSON/XML/tables to compact summaries while preserving key fields
  • Pruner (pruner.go) - Removes filler phrases ("as mentioned earlier", "basically", etc.) and redundant patterns

API

type Compressor interface {
    Compress(ctx context.Context, chunks []types.Chunk, opts Options) ([]types.Chunk, Stats, error)
}

// Pipeline chains multiple strategies
pipeline := compress.NewPipeline(
    compress.NewPruner(),
    compress.NewExtractiveCompressor(),
)

Files

pkg/compress/
├── compress.go        # Interface, Options, Stats, Pipeline
├── extractive.go      # Salient span extraction
├── placeholder.go     # Structured content compression
├── pruner.go          # Filler phrase removal
└── compress_test.go   # Unit tests

Testing

Includes comprehensive tests for all compressors and the pipeline.

Closes #1

Implements pkg/compress with three compression strategies:
- Extractive: selects salient spans using sentence scoring
- Placeholder: compresses JSON/XML/tables to summaries
- Pruner: removes filler phrases and redundant patterns

Includes Pipeline for chaining strategies and comprehensive tests.

Closes #1

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code added the enhancement New feature or request label Jan 31, 2026
- Remove regex backreference (unsupported in Go regexp)
- Use switch statement for index comparison

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code merged commit 1581b43 into main Jan 31, 2026
2 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feature/semantic-compression branch January 31, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Semantic Compression Module

1 participant