Skip to content

Commit 0c5394f

Browse files
Merge pull request #907 from greengori11a/new-master
Implement FulltextFilter
2 parents 3f40d34 + 72c2757 commit 0c5394f

File tree

8 files changed

+841
-21
lines changed

8 files changed

+841
-21
lines changed

doc/source/semantic_indexes.rst

Lines changed: 49 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
.. _Semantic Indexes:
1+
.. _Semantic Indexes:
22

33
==================================
44
Semantic Indexes
55
==================================
66

77
Full Text Index
88
----------------
9-
From version x.x (version number tbc) neomodel provides a way to interact with neo4j `Full Text indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/>`_.
10-
The Full Text Index can be be created for both node and relationship properties. Only available for Neo4j version 5.16 or higher.
9+
From version 6.0.0 neomodel provides a way to interact with neo4j `Full Text indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/>`_.
10+
The Full Text Index can be created for both node and relationship properties. Only available for Neo4j version 5.16 or higher.
1111

1212
Defining a Full Text Index on a Property
1313
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1414
Within neomodel, indexing is a decision that is made at class definition time as the index needs to be built. A Full Text index is defined using :class:`~neomodel.properties.FulltextIndex`
15-
To define a property with a full text index we use the following symantics::
15+
To define a property with a full text index we use the following syntax::
1616
17-
StringProperty(fulltext_index=FulltextIndex(analyzer="standard-no-stop-words", eventually_consistent=False)
17+
StringProperty(fulltext_index=FulltextIndex(analyzer="standard-no-stop-words", eventually_consistent=False))
1818

1919
Where,
2020
- ``analyzer``: The analyzer to use. The default is ``standard-no-stop-words``.
@@ -27,34 +27,67 @@ Please refer to the `Neo4j documentation <https://neo4j.com/docs/cypher-manual/c
2727
Querying a Full Text Index on a Property
2828
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2929

30-
This is not currently implemented as a native neomodel query type. If you would like this please submit a github issue highlighting your useage pattern
30+
Node Property
31+
^^^^^^^^^^^^^
32+
The following Fulltext Index property::
33+
34+
class Product(StructuredNode):
35+
name = StringProperty()
36+
description = StringProperty(
37+
fulltext_index=FulltextIndex(
38+
analyzer="standard-no-stop-words", eventually_consistent=False
39+
)
40+
)
41+
42+
Can be queried using :class:`~neomodel.semantic_filters.FulltextFilter`. Such as::
43+
44+
from neomodel.semantic_filters import FulltextFilter
45+
result = Product.nodes.filter(
46+
fulltext_filter=FulltextFilter(
47+
topk=10,
48+
fulltext_attribute_name="description",
49+
query_string="product")).all()
50+
51+
Where the result will be a list of length topk of nodes with the form (ProductNode, score).
52+
53+
The :class:`~neomodel.semantic_filters.FulltextFilter` can be used in conjunction with the normal filter types.
54+
55+
.. attention::
56+
If you use FulltextFilter in conjunction with normal filter types, only nodes that fit the filters will return thus, you may get less than the topk specified.
57+
Furthermore, all node filters **should** work with FulltextFilter, relationship filters will also work but WILL NOT return the fulltext similarity score alongside the relationship filter, instead the topk nodes and their appropriate relationships will be returned.
58+
59+
RelationshipProperty
60+
^^^^^^^^^^^^^^^^^^^^
3161

32-
Alternatively, whilst this has not bbeen implemetned yet you can still leverage `db.cypher_query` with the correct syntax to perform your required query.
62+
Currently neomodel has not implemented an OGM method for querying full text indexes on relationships.
63+
If this is something that you would like, please submit a GitHub issue with requirements highlighting your usage pattern.
64+
65+
Alternatively, whilst this has not been implemented yet you can still leverage `db.cypher_query` with the correct syntax to perform your required query.
3366

3467
Vector Index
3568
------------
36-
From version x.x (version number tbc) neomodel provides a way to interact with neo4j `vector indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/>`_.
69+
From version 5.5.0 neomodel provides a way to interact with neo4j `vector indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/>`_.
3770

3871
The Vector Index can be created on both node and relationship properties. Only available for Neo4j version 5.15 (node) and 5.18 (relationship) or higher.
3972

4073
Defining a Vector Index on a Property
4174
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4275

4376
Within neomodel, indexing is a decision that is made at class definition time as the index needs to be built. A vector index is defined using :class:`~neomodel.properties.VectorIndex`.
44-
To define a property with a vector index we use the following symantics::
77+
To define a property with a vector index we use the following syntax::
4578

46-
ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine")
79+
ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine"))
4780
4881
Where,
4982
- ``dimensions``: The dimension of the vector. The default is 1536.
5083
- ``similarity_function``: The similarity algorithm to use. The default is ``cosine``.
5184

52-
The index must then be built, this occurs when the function :func:`~neomodel.sync_.core.install_all_labels` is run
85+
The index must then be built, this occurs when the function :func:`~neomodel.sync_.core.install_all_labels` is run.
5386

5487
The vector indexes will then have the name "vector_index_{node.__label__}_{propertyname_with_vector_index}".
5588

5689
.. attention::
57-
Neomodel creates a new vectorindex for each specified property, thus you cannot have two distinct properties being placed into the same index.
90+
Neomodel creates a new vector index for each specified property, thus you cannot have two distinct properties being placed into the same index.
5891

5992
Querying a Vector Index on a Property
6093
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -64,10 +97,10 @@ Node Property
6497
The following node vector index property::
6598

6699
class someNode(StructuredNode):
67-
vector = ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine")
100+
vector = ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine"))
68101
name = StringProperty()
69102

70-
Can be queried using :class:`~neomodel.sematic_filters.VectorFilter`. Such as::
103+
Can be queried using :class:`~neomodel.semantic_filters.VectorFilter`. Such as::
71104

72105
from neomodel.semantic_filters import VectorFilter
73106
result = someNode.nodes.filter(vector_filter=VectorFilter(topk=3, vector_attribute_name="vector")).all()
@@ -78,12 +111,11 @@ The :class:`~neomodel.semantic_filters.VectorFilter` can be used in conjunction
78111

79112
.. attention::
80113
If you use VectorFilter in conjunction with normal filter types, only nodes that fit the filters will return thus, you may get less than the topk specified.
81-
Furthermore, all node filters **should** work with VectorFilter, relationship filters will also work but WILL NOT return the vector similiarty score alongside the relationship filter, instead the topk nodes and their appropriate relationships will be returned.
114+
Furthermore, all node filters **should** work with VectorFilter, relationship filters will also work but WILL NOT return the vector similarity score alongside the relationship filter, instead the topk nodes and their appropriate relationships will be returned.
82115

83116
RelationshipProperty
84117
^^^^^^^^^^^^^^^^^^^^
85118
Currently neomodel has not implemented an OGM method for querying vector indexes on relationships.
86-
If this is something that you like please submit a github issue requirements highlighting your usage pattern.
119+
If this is something that you would like, please submit a GitHub issue with requirements highlighting your usage pattern.
87120

88121
Alternatively, whilst this has not been implemented yet you can still leverage `db.cypher_query` with the correct syntax to perform your required query.
89-

neomodel/async_/match.py

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from neomodel.exceptions import MultipleNodesReturned
1313
from neomodel.match_q import Q, QBase
1414
from neomodel.properties import AliasProperty, ArrayProperty, Property
15-
from neomodel.semantic_filters import VectorFilter
15+
from neomodel.semantic_filters import FulltextFilter, VectorFilter
1616
from neomodel.typing import Subquery, Transformation
1717
from neomodel.util import RelationshipDirection
1818

@@ -403,6 +403,7 @@ class QueryAST:
403403
additional_return: list[str] | None
404404
is_count: bool | None
405405
vector_index_query: type | None
406+
fulltext_index_query: type | None
406407

407408
def __init__(
408409
self,
@@ -420,6 +421,7 @@ def __init__(
420421
additional_return: list[str] | None = None,
421422
is_count: bool | None = False,
422423
vector_index_query: type | None = None,
424+
fulltext_index_query: type | None = None,
423425
) -> None:
424426
self.match = match if match else []
425427
self.optional_match = optional_match if optional_match else []
@@ -437,6 +439,7 @@ def __init__(
437439
)
438440
self.is_count = is_count
439441
self.vector_index_query = vector_index_query
442+
self.fulltext_index_query = fulltext_index_query
440443
self.subgraph: dict = {}
441444
self.mixed_filters: bool = False
442445

@@ -467,6 +470,15 @@ async def build_ast(self) -> "AsyncQueryBuilder":
467470
):
468471
self.build_vector_query(self.node_set.vector_query, self.node_set.source)
469472

473+
if (
474+
isinstance(self.node_set, AsyncNodeSet)
475+
and hasattr(self.node_set, "fulltext_query")
476+
and self.node_set.fulltext_query
477+
):
478+
self.build_fulltext_query(
479+
self.node_set.fulltext_query, self.node_set.source
480+
)
481+
470482
await self.build_source(self.node_set)
471483

472484
if hasattr(self.node_set, "skip"):
@@ -573,6 +585,31 @@ def build_vector_query(self, vectorfilter: "VectorFilter", source: "AsyncNodeSet
573585
self._ast.return_clause = f"{vectorfilter.node_set_label}, score"
574586
self._ast.result_class = source.__class__
575587

588+
def build_fulltext_query(self, fulltextquery: "FulltextFilter", source: "NodeSet"):
589+
"""
590+
Query a free text indexed property on the node.
591+
"""
592+
try:
593+
attribute = getattr(source, fulltextquery.fulltext_attribute_name)
594+
except AttributeError as e:
595+
raise AttributeError(
596+
f"Atribute '{fulltextquery.fulltext_attribute_name}' not found on '{type(source).__name__}'."
597+
) from e
598+
599+
if not attribute.fulltext_index:
600+
raise AttributeError(
601+
f"Attribute {fulltextquery.fulltext_attribute_name} is not declared with a full text index."
602+
)
603+
604+
fulltextquery.index_name = (
605+
f"fulltext_index_{source.__label__}_{fulltextquery.fulltext_attribute_name}"
606+
)
607+
fulltextquery.node_set_label = source.__label__.lower()
608+
609+
self._ast.fulltext_index_query = fulltextquery
610+
self._ast.return_clause = f"{fulltextquery.node_set_label}, score"
611+
self._ast.result_class = source.__class__
612+
576613
async def build_traversal(self, traversal: "AsyncTraversal") -> str:
577614
"""
578615
traverse a relationship from a node to a set of nodes
@@ -974,6 +1011,16 @@ def build_query(self) -> str:
9741011
# This ensures that we bring the context of the new nodeSet and score along with us for metadata filtering
9751012
query += f""" WITH {self._ast.vector_index_query.node_set_label}, score"""
9761013

1014+
if self._ast.fulltext_index_query:
1015+
query += f"""CALL () {{
1016+
CALL db.index.fulltext.queryNodes("{self._ast.fulltext_index_query.index_name}", "{self._ast.fulltext_index_query.query_string}")
1017+
YIELD node AS {self._ast.fulltext_index_query.node_set_label}, score
1018+
RETURN {self._ast.fulltext_index_query.node_set_label}, score LIMIT {self._ast.fulltext_index_query.topk}
1019+
}}
1020+
"""
1021+
# This ensures that we bring the context of the new nodeSet and score along with us for metadata filtering
1022+
query += f""" WITH {self._ast.fulltext_index_query.node_set_label}, score"""
1023+
9771024
# Instead of using only one MATCH statement for every relation
9781025
# to follow, we use one MATCH per relation (to avoid cartesian
9791026
# product issues...).
@@ -1446,6 +1493,7 @@ def __init__(self, source: Any) -> None:
14461493
self._intermediate_transforms: list = []
14471494
self._unique_variables: list[str] = []
14481495
self.vector_query: str | None = None
1496+
self.fulltext_query: str | None = None
14491497

14501498
def __await__(self) -> Any:
14511499
return self.all().__await__() # type: ignore[attr-defined]
@@ -1555,6 +1603,10 @@ def filter(self, *args: Any, **kwargs: Any) -> "AsyncBaseSet":
15551603
for arg in args:
15561604
if isinstance(arg, VectorFilter) and (not self.vector_query):
15571605
self.vector_query = arg
1606+
1607+
if isinstance(arg, FulltextFilter) and (not self.fulltext_query):
1608+
self.fulltext_query = arg
1609+
15581610
new_args.append(arg)
15591611

15601612
new_args = tuple(new_args)
@@ -1566,6 +1618,13 @@ def filter(self, *args: Any, **kwargs: Any) -> "AsyncBaseSet":
15661618
):
15671619
self.vector_query = kwargs.pop("vector_filter")
15681620

1621+
if (
1622+
kwargs.get("fulltext_filter")
1623+
and isinstance(kwargs["fulltext_filter"], FulltextFilter)
1624+
and not self.fulltext_query
1625+
):
1626+
self.fulltext_query = kwargs.pop("fulltext_filter")
1627+
15691628
self.q_filters = Q(self.q_filters & Q(*new_args, **kwargs))
15701629

15711630
return self

neomodel/semantic_filters.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,22 @@ def __init__(
1919
self.index_name = None
2020
self.node_set_label = None
2121
self.vector = candidate_vector
22+
23+
class FulltextFilter(object):
24+
"""
25+
Represents a CALL db.index.fulltext.query* neo function call within the OGM.
26+
:param query_strng: The string you are finding the nearest
27+
:type query_string: str
28+
:param freetext_attribute_name: The property name for the free text indexed property.
29+
:type fulltext_attribute_name: str
30+
:param topk: Amount to nodes to return
31+
:type topk: int
32+
33+
"""
34+
35+
def __init__(self, query_string: str, fulltext_attribute_name: str, topk: int):
36+
self.query_string = query_string
37+
self.fulltext_attribute_name = fulltext_attribute_name
38+
self.index_name = None
39+
self.node_set_label = None
40+
self.topk = topk

0 commit comments

Comments
 (0)