Skip to content

Comments

feat: Add BM25 lexical search support for Weaviate destination#647

Closed
micmarty-deepsense wants to merge 2 commits intomainfrom
feature/weaviate-lexical-search
Closed

feat: Add BM25 lexical search support for Weaviate destination#647
micmarty-deepsense wants to merge 2 commits intomainfrom
feature/weaviate-lexical-search

Conversation

@micmarty-deepsense
Copy link
Contributor

@micmarty-deepsense micmarty-deepsense commented Feb 17, 2026

Summary

Add enable_lexical_search flag to Weaviate destination connector to support BM25 keyword search.

Changes

  • Add enable_lexical_search: bool field to WeaviateUploadStagerConfig (default: False)
  • Add inline schema documentation showing required BM25 configuration
  • Update tests to verify flag behavior

Implementation Notes

This is a declarative flag - it indicates that the user has manually configured their Weaviate collection with BM25 support. It does NOT automatically create or modify schemas.

Users must configure their collection schema with:

{
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "stopwords": {
      "preset": "en"
    }
  }
}

See inline code documentation for full example.

Testing

  • Unit tests verify flag default and enabled states
  • Follows same pattern as AstraDB enable_lexical_search

Add support for BM25 keyword search in Weaviate destination connector.

Implementation:
- Add `enable_lexical_search` flag to WeaviateUploadStagerConfig (default: False)
- Configure BM25 scoring in collection schema (b=0.75, k1=1.2)
- Enable inverted index with English stopwords for text fields
- BM25 always enabled in schema (minimal overhead), flag for API consistency

Testing:
- Test upload stager config with default and enabled lexical search
- Verify BM25 configuration in collection schema
- Verify text field configured for lexical search

Following AstraDB pattern for consistency. Users opt into hybrid search at query time.
Add enable_lexical_search flag to WeaviateUploadStagerConfig to indicate
that the collection is configured for BM25 keyword search.

Users must manually configure their Weaviate collection schema with
invertedIndexConfig.bm25 settings. See inline documentation for schema
example.

This flag is declarative - it documents user intent rather than
controlling automatic schema creation.
@micmarty-deepsense
Copy link
Contributor Author

Closing: Weaviate's inverted index (indexSearchable=true) already provides BM25 keyword search functionality. No additional flag or configuration needed - it works automatically on text fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant