Skip to content

Commit 233da4a

Browse files
Merge branch 'rc/5.5.1' of https://github.com/neo4j-contrib/neomodel into rc/5.5.1
2 parents fda3c5b + f21705e commit 233da4a

File tree

9 files changed

+778
-34
lines changed

9 files changed

+778
-34
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ pyvenv.cfg
2222
coverage_report/
2323
.coverage*
2424
.DS_STORE
25-
cov.xml
25+
cov.xml

doc/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ Contents
7777
filtering_ordering
7878
traversal
7979
advanced_query_operations
80+
semantic_indexes
8081
cypher
8182
transactions
8283
hooks

doc/source/schema_management.rst

Lines changed: 2 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -46,38 +46,11 @@ Indexes
4646
The following indexes are supported:
4747

4848
- ``index=True``: This will create the default Neo4j index on the property (currently RANGE).
49-
- ``fulltext_index=FulltextIndex()``: This will create a FULLTEXT index on the property. Only available for Neo4j version 5.16 or higher. With this one, you can define the following options:
50-
- ``analyzer``: The analyzer to use. The default is ``standard-no-stop-words``.
51-
- ``eventually_consistent``: Whether the index should be eventually consistent. The default is ``False``.
52-
53-
Please refer to the `Neo4j documentation <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/#configuration-settings>`_. for more information on fulltext indexes.
54-
55-
- ``vector_index=VectorIndex()``: This will create a VECTOR index on the property. Only available for Neo4j version 5.15 (node) and 5.18 (relationship) or higher. With this one, you can define the following options:
56-
- ``dimensions``: The dimension of the vector. The default is 1536.
57-
- ``similarity_function``: The similarity algorithm to use. The default is ``cosine``.
58-
59-
Those indexes are available for both node- and relationship properties.
49+
- :ref:`Semantic Indexes`
6050

6151
.. note::
6252
Yes, you can create multiple indexes of a different type on the same property. For example, a default index and a fulltext index.
6353

64-
.. note::
65-
For the semantic indexes (fulltext and vector), this allows you to create indexes, but searching those indexes require using Cypher queries.
66-
This is because Cypher only supports querying those indexes through a specific procedure for now.
67-
68-
Full example: ::
69-
70-
from neomodel import StructuredNode, StringProperty, FulltextIndex, VectorIndex
71-
class VeryIndexedNode(StructuredNode):
72-
name = StringProperty(
73-
index=True,
74-
fulltext_index=FulltextIndex(analyzer='english', eventually_consistent=True)
75-
)
76-
name_embedding = ArrayProperty(
77-
FloatProperty(),
78-
vector_index=VectorIndex(dimensions=512, similarity_function='euclidean')
79-
)
80-
8154
Constraints
8255
===========
8356

@@ -93,4 +66,4 @@ Extracting the schema from a database
9366
=====================================
9467

9568
You can extract the schema from an existing database using the ``neomodel_inspect_database`` script (:ref:`inspect_database_doc`).
96-
This script will output the schema in the neomodel format, including indexes and constraints.
69+
This script will output the schema in the neomodel format, including indexes and constraints.

doc/source/semantic_indexes.rst

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
.. _Semantic Indexes:
2+
3+
==================================
4+
Semantic Indexes
5+
==================================
6+
7+
Full Text Index
8+
----------------
9+
From version x.x (version number tbc) neomodel provides a way to interact with neo4j `Full Text indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/>`_.
10+
The Full Text Index can be be created for both node and relationship properties. Only available for Neo4j version 5.16 or higher.
11+
12+
Defining a Full Text Index on a Property
13+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14+
Within neomodel, indexing is a decision that is made at class definition time as the index needs to be built. A Full Text index is defined using :class:`~neomodel.properties.FulltextIndex`
15+
To define a property with a full text index we use the following symantics::
16+
17+
StringProperty(fulltext_index=FulltextIndex(analyzer="standard-no-stop-words", eventually_consistent=False)
18+
19+
Where,
20+
- ``analyzer``: The analyzer to use. The default is ``standard-no-stop-words``.
21+
- ``eventually_consistent``: Whether the index should be eventually consistent. The default is ``False``.
22+
23+
The index must then be built, this occurs when the function :func:`~neomodel.sync_.core.install_all_labels` is run.
24+
25+
Please refer to the `Neo4j documentation <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/#configuration-settings>`_ for more information on fulltext indexes.
26+
27+
Querying a Full Text Index on a Property
28+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
29+
30+
This is not currently implemented as a native neomodel query type. If you would like this please submit a github issue highlighting your useage pattern
31+
32+
Alternatively, whilst this has not bbeen implemetned yet you can still leverage `db.cypher_query` with the correct syntax to perform your required query.
33+
34+
Vector Index
35+
------------
36+
From version x.x (version number tbc) neomodel provides a way to interact with neo4j `vector indexing <https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/>`_.
37+
38+
The Vector Index can be created on both node and relationship properties. Only available for Neo4j version 5.15 (node) and 5.18 (relationship) or higher.
39+
40+
Defining a Vector Index on a Property
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
43+
Within neomodel, indexing is a decision that is made at class definition time as the index needs to be built. A vector index is defined using :class:`~neomodel.properties.VectorIndex`.
44+
To define a property with a vector index we use the following symantics::
45+
46+
ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine")
47+
48+
Where,
49+
- ``dimensions``: The dimension of the vector. The default is 1536.
50+
- ``similarity_function``: The similarity algorithm to use. The default is ``cosine``.
51+
52+
The index must then be built, this occurs when the function :func:`~neomodel.sync_.core.install_all_labels` is run
53+
54+
The vector indexes will then have the name "vector_index_{node.__label__}_{propertyname_with_vector_index}".
55+
56+
.. attention::
57+
Neomodel creates a new vectorindex for each specified property, thus you cannot have two distinct properties being placed into the same index.
58+
59+
Querying a Vector Index on a Property
60+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61+
62+
Node Property
63+
^^^^^^^^^^^^^
64+
The following node vector index property::
65+
66+
class someNode(StructuredNode):
67+
vector = ArrayProperty(base_property=FloatProperty(), vector_index=VectorIndex(dimensions=512, similarity_function="cosine")
68+
name = StringProperty()
69+
70+
Can be queried using :class:`~neomodel.sematic_filters.VectorFilter`. Such as::
71+
72+
from neomodel.semantic_filters import VectorFilter
73+
result = someNode.nodes.filter(vector_filter=VectorFilter(topk=3, vector_attribute_name="vector")).all()
74+
75+
Where the result will be a list of length topk of tuples having the form (someNode, score).
76+
77+
The :class:`~neomodel.semantic_filters.VectorFilter` can be used in conjunction with the normal filter types.
78+
79+
.. attention::
80+
If you use VectorFilter in conjunction with normal filter types, only nodes that fit the filters will return thus, you may get less than the topk specified.
81+
Furthermore, all node filters **should** work with VectorFilter, relationship filters will also work but WILL NOT return the vector similiarty score alongside the relationship filter, instead the topk nodes and their appropriate relationships will be returned.
82+
83+
RelationshipProperty
84+
^^^^^^^^^^^^^^^^^^^^
85+
Currently neomodel has not implemented an OGM method for querying vector indexes on relationships.
86+
If this is something that you like please submit a github issue requirements highlighting your usage pattern.
87+
88+
Alternatively, whilst this has not been implemented yet you can still leverage `db.cypher_query` with the correct syntax to perform your required query.
89+

neomodel/async_/match.py

Lines changed: 164 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from neomodel.async_ import relationship_manager
1111
from neomodel.async_.core import AsyncStructuredNode, adb
1212
from neomodel.async_.relationship import AsyncStructuredRel
13+
from neomodel.semantic_filters import VectorFilter
1314
from neomodel.exceptions import MultipleNodesReturned
1415
from neomodel.match_q import Q, QBase
1516
from neomodel.properties import AliasProperty, ArrayProperty, Property
@@ -404,7 +405,7 @@ class QueryAST:
404405
lookup: TOptional[str]
405406
additional_return: TOptional[list[str]]
406407
is_count: TOptional[bool]
407-
408+
vector_index_query: TOptional[type]
408409
def __init__(
409410
self,
410411
match: TOptional[list[str]] = None,
@@ -420,6 +421,7 @@ def __init__(
420421
lookup: TOptional[str] = None,
421422
additional_return: TOptional[list[str]] = None,
422423
is_count: TOptional[bool] = False,
424+
vector_index_query: TOptional[type] = None,
423425
) -> None:
424426
self.match = match if match else []
425427
self.optional_match = optional_match if optional_match else []
@@ -436,6 +438,7 @@ def __init__(
436438
additional_return if additional_return else []
437439
)
438440
self.is_count = is_count
441+
self.vector_index_query = vector_index_query
439442
self.subgraph: dict = {}
440443
self.mixed_filters: bool = False
441444

@@ -458,6 +461,10 @@ async def build_ast(self) -> "AsyncQueryBuilder":
458461
):
459462
for relation in self.node_set.relations_to_fetch:
460463
self.build_traversal_from_path(relation, self.node_set.source)
464+
465+
if isinstance(self.node_set, AsyncNodeSet) and hasattr(self.node_set, "_vector_query"):
466+
if self.node_set._vector_query:
467+
self.build_vector_query(self.node_set._vector_query, self.node_set.source)
461468

462469
await self.build_source(self.node_set)
463470

@@ -540,6 +547,27 @@ def build_order_by(self, ident: str, source: "AsyncNodeSet") -> None:
540547
order_by.append(f"{result[0]}.{prop}")
541548
self._ast.order_by = order_by
542549

550+
551+
def build_vector_query(self, vectorfilter: "VectorFilter", source: "NodeSet"):
552+
"""
553+
Query a vector indexed property on the node.
554+
"""
555+
try:
556+
attribute = getattr(source, vectorfilter.vector_attribute_name)
557+
except AttributeError:
558+
raise # This raises the base AttributeError and provides potential correction
559+
560+
if not attribute.vector_index:
561+
raise AttributeError(f"Attribute {vectorfilter.vector_attribute_name} is not declared with a vector index.")
562+
563+
vectorfilter.index_name = f"vector_index_{source.__label__}_{vectorfilter.vector_attribute_name}"
564+
vectorfilter.nodeSetLabel = source.__label__.lower()
565+
566+
self._ast.vector_index_query = vectorfilter
567+
self._ast.return_clause = f"{vectorfilter.nodeSetLabel}, score"
568+
self._ast.result_class = source.__class__
569+
570+
543571
async def build_traversal(self, traversal: "AsyncTraversal") -> str:
544572
"""
545573
traverse a relationship from a node to a set of nodes
@@ -933,6 +961,17 @@ def build_query(self) -> str:
933961
if self._ast.lookup:
934962
query += self._ast.lookup
935963

964+
if self._ast.vector_index_query:
965+
966+
query += f"""CALL () {{
967+
CALL db.index.vector.queryNodes("{self._ast.vector_index_query.index_name}", {self._ast.vector_index_query.topk}, {self._ast.vector_index_query.vector})
968+
YIELD node AS {self._ast.vector_index_query.nodeSetLabel}, score
969+
RETURN {self._ast.vector_index_query.nodeSetLabel}, score
970+
}}"""
971+
972+
# This ensures that we bring the context of the new nodeSet and score along with us for metadata filtering
973+
query += f""" WITH {self._ast.vector_index_query.nodeSetLabel}, score"""
974+
936975
# Instead of using only one MATCH statement for every relation
937976
# to follow, we use one MATCH per relation (to avoid cartesian
938977
# product issues...).
@@ -1404,6 +1443,7 @@ def __init__(self, source: Any) -> None:
14041443
self._subqueries: list[Subquery] = []
14051444
self._intermediate_transforms: list = []
14061445
self._unique_variables: list[str] = []
1446+
self._vector_query: str = None
14071447

14081448
def __await__(self) -> Any:
14091449
return self.all().__await__() # type: ignore[attr-defined]
@@ -1506,6 +1546,129 @@ def filter(self, *args: Any, **kwargs: Any) -> "AsyncBaseSet":
15061546
:return: self
15071547
"""
15081548
if args or kwargs:
1549+
# Need to grab and remove the VectorFilter from both args and kwargs
1550+
new_args = [] # As args are a tuple, theyre immutable. But we need to remove the vectorfilter from the arguments so they dont go into Q.
1551+
for arg in args:
1552+
if isinstance(arg, VectorFilter) and (not self._vector_query):
1553+
self._vector_query = arg
1554+
new_args.append(arg)
1555+
1556+
new_args = tuple(new_args)
1557+
1558+
if kwargs.get("vector_filter"):
1559+
if isinstance(kwargs["vector_filter"], VectorFilter) and (not self._vector_query):
1560+
self._vector_query = kwargs.pop("vector_filter")
1561+
1562+
1563+
self.q_filters = Q(self.q_filters & Q(*new_args, **kwargs))
1564+
1565+
return self
1566+
1567+
def exclude(self, *args: Any, **kwargs: Any) -> "BaseSet":
1568+
"""
1569+
Exclude nodes from the NodeSet via filters.
1570+
1571+
:param kwargs: filter parameters see syntax for the filter method
1572+
:return: self
1573+
"""
1574+
if args or kwargs:
1575+
self.q_filters = Q(self.q_filters & ~Q(*args, **kwargs))
1576+
return self
1577+
1578+
def has(self, **kwargs: Any) -> "BaseSet":
1579+
must_match, dont_match = process_has_args(self.source_class, kwargs)
1580+
self.must_match.update(must_match)
1581+
self.dont_match.update(dont_match)
1582+
return self
1583+
1584+
def order_by(self, *props: Any) -> "BaseSet":
1585+
"""
1586+
Order by properties. Prepend with minus to do descending. Pass None to
1587+
remove ordering.
1588+
"""
1589+
should_remove = len(props) == 1 and props[0] is None
1590+
if not hasattr(self, "order_by_elements") or should_remove:
1591+
self.order_by_elements = []
1592+
if should_remove:
1593+
return self
1594+
if "?" in props:
1595+
self.order_by_elements.append("?")
1596+
else:
1597+
for prop in props:
1598+
if isinstance(prop, RawCypher):
1599+
self.order_by_elements.append(prop)
1600+
continue
1601+
prop = prop.strip()
1602+
if prop.startswith("-"):
1603+
prop = prop[1:]
1604+
desc = True
1605+
else:
1606+
desc = False
1607+
1608+
if prop in self.source_class.defined_properties(rels=False):
1609+
property_obj = getattr(self.source_class, prop)
1610+
if isinstance(property_obj, AliasProperty):
1611+
prop = property_obj.aliased_to()
1612+
1613+
self.order_by_elements.append(prop + (" DESC" if desc else ""))
1614+
1615+
return self
1616+
1617+
def _register_relation_to_fetch(
1618+
self, relation_def: Any, alias: TOptional[str] = None
1619+
) -> "Path":
1620+
if isinstance(relation_def, Path):
1621+
item = relation_def
1622+
else:
1623+
item = Path(
1624+
value=relation_def,
1625+
)
1626+
if alias:
1627+
item.alias = alias
1628+
return item
1629+
1630+
def unique_variables(self, *paths: tuple[str, ...]) -> "NodeSet":
1631+
"""Generate unique variable names for the given paths."""
1632+
self._unique_variables = paths
1633+
return self
1634+
1635+
def traverse(self, *paths: tuple[str, ...], **aliased_paths: dict) -> "NodeSet":
1636+
"""Specify a set of paths to traverse."""
1637+
relations = []
1638+
for path in paths:
1639+
relations.append(self._register_relation_to_fetch(path))
1640+
for alias, aliased_path in aliased_paths.items():
1641+
relations.append(
1642+
self._register_relation_to_fetch(aliased_path, alias=alias)
1643+
)
1644+
self.relations_to_fetch = relations
1645+
return self
1646+
1647+
def fetch_relations(self, *relation_names: tuple[str, ...]) -> "NodeSet":
1648+
"""Specify a set of relations to traverse and return."""
1649+
warnings.warn(
1650+
"fetch_relations() will be deprecated in version 6, use traverse() instead.",
1651+
DeprecationWarning,
1652+
)
1653+
relations = []
1654+
for relation_name in relation_names:
1655+
if isinstance(relation_name, Optional):
1656+
relation_name = Path(value=relation_name.relation, optional=True)
1657+
relations.append(self._register_relation_to_fetch(relation_name))
1658+
self.relations_to_fetch = relations
1659+
return self
1660+
1661+
def traverse_relations(
1662+
self, *relation_names: tuple[str, ...], **aliased_relation_names: dict
1663+
) -> "NodeSet":
1664+
"""Specify a set of relations to traverse only."""
1665+
1666+
warnings.warn(
1667+
"traverse_relations() will be deprecated in version 6, use traverse() instead.",
1668+
DeprecationWarning,
1669+
)
1670+
1671+
def convert_to_path(input: Union[str, Optional]) -> Path:
15091672
self.q_filters = Q(self.q_filters & Q(*args, **kwargs))
15101673
return self
15111674

0 commit comments

Comments
 (0)