Skip to content

MULLER Vectorized Hybrid Search Engine #16

@heathersherry

Description

@heathersherry

MULLER provides a comprehensive suite of query functionalities tailored for AI data lakes:

  • Comparison Operators: Supports exact and range matching using >,<, >=, and <= for numerical types (int/float) where the tensor htype is generic.
  • Equality and Inequality: Supports == and != for int, float, str, and bool types (generic or text htypes). Users can optionally build inverted indexes to significantly accelerate retrieval performance.
  • Range Queries: Supports the BETWEEN keyword for numerical types (int/float). This feature requires an inverted index.
  • Full-Text Search: Supports the CONTAINS operator for str types (text htype), backed by an inverted index. For Chinese text, tokenization is handled by the open-source Jieba tokenizer.
  • Pattern Matching: Supports LIKE for regular expression matching on str types (text htype).
  • Boolean Logic: Supports complex query compositions using AND, OR, and NOT logical connectors.
  • Pagination: Supports query results with OFFSET and LIMIT clauses for efficient data sampling.
  • Data Aggregation: Supports standard SQL-like aggregation workflows, including SELECT, GROUP BY, and ORDER BY, alongside aggregate functions such as COUNT, AVG, MIN, MAX, and SUM.
  • Vector Similarity Search: Supports high-dimensional vector similarity retrieval based on IVFPQ, HNSW and DISKANN for AI-centric embedding analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions