-
Notifications
You must be signed in to change notification settings - Fork 48
Add python binding for vectorstore #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
36ff226
752c8dd
b436853
44bbe0e
c63f290
7e66438
6a3899f
3745f03
c0ae05f
32acf84
45df85e
8b8b946
b2797bd
216c328
bd71c33
a860c2a
05c939c
a563e12
0d2b050
344817e
18c33a3
22965f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall, I don't like this query filter builder API - it's neither easier nor clearer than writing raw EdgeQL segment in a string. And this deviates from the JS-style query builder, at the same time, it's not Pythonic. Why did we want to add a filter API like this in the first place? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, it looks a bit complex, this is how LamaIndex did it, I think. @deepbuzin There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @elprans can you pls take a look at this too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Same global reason as for building the entire vectorstore - to enable users to use Gel in LLM-related applications and not have to learn EdgeQL (or anything really). It's meant to be dumb and familiar enough to justify using it in place of other vectorstores. We spent some time discussing this way of implementing filters on a call about a month ago (FWIW I'm not thrilled with it either), and couldn't think of a better way. That's not to say there isn't one, but we had to pick something and move on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although now that I think about it, maybe raw EdgeQL in our own binding isn’t the worst idea… If someone wants a LlamaIndex-style filter constructor they might as well use the LlamaIndex binding. But downsides of the raw string are 1. no help from LSP and 2. EdgeQL for filters is not very pretty or concise with all the json fiddling. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I see. I've also left some comments in my proposal, but feel free to continue with what the team had agreed on. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
from __future__ import annotations | ||
from dataclasses import dataclass | ||
from typing import Union, List | ||
from enum import Enum | ||
|
||
|
||
class FilterOperator(str, Enum): | ||
EQ = "=" | ||
NE = "!=" | ||
GT = ">" | ||
LT = "<" | ||
GTE = ">=" | ||
LTE = "<=" | ||
IN = "in" | ||
NOT_IN = "not in" | ||
LIKE = "like" | ||
ILIKE = "ilike" | ||
ANY = "any" | ||
ALL = "all" | ||
CONTAINS = "contains" | ||
EXISTS = "exists" | ||
|
||
|
||
class FilterCondition(str, Enum): | ||
AND = "and" | ||
OR = "or" | ||
|
||
|
||
@dataclass | ||
class MetadataFilter: | ||
diksipav marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Represents a single metadata filter condition.""" | ||
|
||
key: str | ||
value: Union[int, float, str] | ||
operator: FilterOperator = FilterOperator.EQ | ||
|
||
def __repr__(self): | ||
value = f'"{self.value}"' if isinstance(self.value, str) else self.value | ||
diksipav marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return ( | ||
f'MetadataFilter(key="{self.key}", ' | ||
f"value={value}, " | ||
f'operator="{self.operator.value}")' | ||
) | ||
|
||
|
||
@dataclass | ||
class CompositeFilter: | ||
""" | ||
Allows grouping multiple MetadataFilter instances using AND/OR. | ||
""" | ||
|
||
filters: List[Union[CompositeFilter, MetadataFilter]] | ||
condition: FilterCondition = FilterCondition.AND | ||
|
||
def __repr__(self): | ||
return ( | ||
f'CompositeFilter(condition="{self.condition.value}", ' | ||
f"filters={self.filters})" | ||
) | ||
|
||
|
||
def get_filter_clause(filters: CompositeFilter) -> str: | ||
""" | ||
Get the filter clause for a given CompositeFilter. | ||
""" | ||
|
||
subclauses = [] | ||
for filter in filters.filters: | ||
subclause = "" | ||
|
||
if isinstance(filter, CompositeFilter): | ||
subclause = get_filter_clause(filter) | ||
elif isinstance(filter, MetadataFilter): | ||
formatted_value = ( | ||
f'"{filter.value}"' | ||
if isinstance(filter.value, str) | ||
else filter.value | ||
) | ||
|
||
# Simple comparison operators | ||
if filter.operator in { | ||
FilterOperator.EQ, | ||
FilterOperator.NE, | ||
FilterOperator.GT, | ||
FilterOperator.GTE, | ||
FilterOperator.LT, | ||
FilterOperator.LTE, | ||
FilterOperator.LIKE, | ||
FilterOperator.ILIKE, | ||
}: | ||
subclause = ( | ||
f'<typeof {formatted_value}>json_get(.metadata, "{filter.key}") ' | ||
f"{filter.operator.value} {formatted_value}" | ||
) | ||
# casting should be fixed | ||
elif filter.operator in {FilterOperator.IN, FilterOperator.NOT_IN}: | ||
subclause = ( | ||
f"{formatted_value} " | ||
f"{filter.operator.value} " | ||
f'<str>json_get(.metadata, "{filter.key}")' | ||
) | ||
# casting should be fixed | ||
# works only for equality, should be updated to support the rest, | ||
# example: select all({1, 2, 3, 4} < 4); | ||
elif filter.operator in {FilterOperator.ANY, FilterOperator.ALL}: | ||
subclause = ( | ||
f"{filter.operator.value} (" | ||
f'<str>json_get(.metadata, "{filter.key}")' | ||
f" = {formatted_value})" | ||
) | ||
|
||
elif filter.operator == FilterOperator.EXISTS: | ||
subclause = f'exists <str>json_get(.metadata, "{filter.key}")' | ||
|
||
# casting should be fixed | ||
# edgeql contains supports different types like range etc which | ||
# we don't support here | ||
elif filter.operator == FilterOperator.CONTAINS: | ||
subclause = ( | ||
f'contains (<str>json_get(.metadata, "{filter.key}"), ' | ||
f"{formatted_value})" | ||
) | ||
else: | ||
raise ValueError(f"Unknown operator: {filter.operator}") | ||
|
||
subclauses.append(subclause) | ||
|
||
if filters.condition in {FilterCondition.AND, FilterCondition.OR}: | ||
filter_clause = f" {filters.condition.value} ".join(subclauses) | ||
return ( | ||
"(" + filter_clause + ")" if len(subclauses) > 1 else filter_clause | ||
) | ||
else: | ||
raise ValueError(f"Unknown condition: {filters.condition}") |
Uh oh!
There was an error while loading. Please reload this page.