Skip to content

Add support for "true" cursor based pagination in connections #730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5f57eb6
prototyping with cursor based connections
diesieben07 Apr 2, 2025
cea6194
Working on cursor style pagination for Django
diesieben07 Apr 2, 2025
86e2262
Working on cursor style pagination for Django
diesieben07 Apr 2, 2025
ed2abab
Working on cursor style pagination for Django
diesieben07 Apr 2, 2025
f5a54de
Implement forward and backward cursor pagination
diesieben07 Apr 3, 2025
5e5fee3
Implement better order_by extraction and tuple comparison
diesieben07 Apr 3, 2025
3d4cdb4
Implement better order_by extraction and tuple comparison
diesieben07 Apr 3, 2025
d90a9ee
Implement hasPeviousPage and hasNextPage
diesieben07 Apr 3, 2025
c018206
Formatting
diesieben07 Apr 4, 2025
5b0e8fc
More robust field value extraction
diesieben07 Apr 4, 2025
4ea320b
Handle dynamic output fields in field extraction
diesieben07 Apr 4, 2025
7321244
Optimize ordering by a field in the base table
diesieben07 Apr 4, 2025
d8ef9b2
Add cursor connection tests
diesieben07 Apr 4, 2025
2885f05
Add more cursor connection tests
diesieben07 Apr 4, 2025
b307c9d
Fix typing issues
diesieben07 Apr 4, 2025
61e9a97
Fix relay cursor pagination in async context
diesieben07 Apr 9, 2025
b78b6fc
Fix get_queryset_config and add tests for it
diesieben07 Apr 9, 2025
1611301
Remove debug code
diesieben07 Apr 9, 2025
074b2b4
Add test for cursor connection with custom resolver
diesieben07 Apr 9, 2025
cfa740f
Add documentation for DjangoCursorConnection
diesieben07 Apr 9, 2025
7bed380
Fix typo in docs
diesieben07 Apr 9, 2025
ab3a7e4
Documentation rewording
diesieben07 Apr 9, 2025
7237d01
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2025
e4fae12
Fix schema test
diesieben07 Apr 9, 2025
2692f3e
Merge remote-tracking branch 'origin/cursor-based-connection' into cu…
diesieben07 Apr 9, 2025
b009e65
Revert unnecessary schema change
diesieben07 Apr 9, 2025
bb317b7
Fix cursor connection for null values and add tests for NULLS LAST / …
diesieben07 Apr 9, 2025
50d7f35
Add tests for invalid cursors
diesieben07 Apr 9, 2025
2d8541c
Remove unused extract_cursor_values method
diesieben07 Apr 9, 2025
4f3213f
Add total_count to DjangoCursorConnection
diesieben07 Apr 9, 2025
c9bc3e3
Add tests for more error conditions
diesieben07 Apr 9, 2025
3ed07d9
Add tests for deferred fields in queryset when using cursor connection
diesieben07 Apr 9, 2025
39dcdd6
Implement proper pagination when using first and last together
diesieben07 Apr 9, 2025
b57dfae
Use a dataclass for OrderingDescriptor and OrderedCollectionCursor
diesieben07 Apr 14, 2025
866589f
Add test for empty results
diesieben07 Apr 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/guide/relay.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,36 @@ For more customization options, like changing the pagination algorithm, adding e
to the `Connection`/`Edge` type, take a look at the
[official strawberry relay integration](https://strawberry.rocks/docs/guides/relay)
as those are properly explained there.

## Cursor based connections

As an alternative to the default `ListConnection`, `DjangoCursorConnection` is also available.
It supports pagination through a Django `QuerySet` via "true" cursors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Clarify the meaning of "true cursors". Suggest using "offset-based cursors" vs "range-based cursors" to distinguish the approaches.

The term "true cursors" might be confusing to users. Using more descriptive terms like "offset-based cursors" and "range-based cursors" would improve clarity.

Suggested implementation:

It supports pagination through a Django `QuerySet` via range-based cursors.

`ListConnection` uses offset-based cursors (slicing) to achieve pagination, which can negatively affect performance for huge datasets,

`ListConnection` uses slicing to achieve pagination, which can negatively affect performance for huge datasets,
because large page numbers require a large `OFFSET` in SQL.
Instead, `DjangoCursorConnection` uses range queries such as `Q(due_date__gte=...)` for pagination. In combination
with an Index, this makes for more efficient queries.

`DjangoCursorConnection` requires a _strictly_ ordered `QuerySet`, that is, no two entries in the `QuerySet`
must be considered equal by its ordering. `order_by('due_date')` for example is not strictly ordered, because two
items could have the same due date. `DjangoCursorConnection` will automatically resolve such situations by
also ordering by the primary key.

When the order for the connection is configurable by the user (for example via
[`@strawberry_django.order`](./ordering.md)) then cursors created by `DjangoCursorConnection` will not be compatible
between different orders.

The drawback of cursor based pagination is that users cannot jump to a particular page immediately. Therefor
cursor based pagination is better suited for special use-cases like an infinitely scrollable list.

Otherwise `DjangoCursorConnection` behaves like other connection classes:

```python
@strawberry.type
class Query:
fruit: DjangoCursorConnection[FruitType] = strawberry_django.connection()

@strawberry_django.connection(DjangoCursorConnection[FruitType])
def fruit_with_custom_resolver(self) -> list[Fruit]:
return Fruit.objects.all()
```
6 changes: 4 additions & 2 deletions strawberry_django/fields/field.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,8 +323,10 @@ def get_queryset(self, queryset, info, **kwargs):
)

# If optimizer extension is enabled, optimize this queryset
ext = optimizer.optimizer.get()
if ext is not None:
if (
not self.disable_optimization
and (ext := optimizer.optimizer.get()) is not None
):
queryset = ext.optimize(queryset, info=info)

return queryset
Expand Down
46 changes: 31 additions & 15 deletions strawberry_django/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
from strawberry.relay.utils import SliceMetadata
from strawberry.schema.schema import Schema
from strawberry.schema.schema_converter import get_arguments
from strawberry.types import get_object_definition
from strawberry.types import get_object_definition, has_object_definition
from strawberry.types.base import StrawberryContainer
from strawberry.types.info import Info
from strawberry.types.lazy_type import LazyType
Expand Down Expand Up @@ -94,7 +94,6 @@
"optimize",
]

NESTED_PREFETCH_MARK = "_strawberry_nested_prefetch_optimized"
_M = TypeVar("_M", bound=models.Model)

_sentinel = object()
Expand Down Expand Up @@ -496,6 +495,8 @@ def _optimize_prefetch_queryset(
StrawberryDjangoField,
)

from .relay_cursor import DjangoCursorConnection, apply_cursor_pagination

if (
not config
or not config.enable_nested_relations_prefetch
Expand Down Expand Up @@ -571,6 +572,17 @@ def _optimize_prefetch_queryset(
limit=slice_metadata.end - slice_metadata.start,
max_results=connection_extension.max_results,
)
elif connection_type is DjangoCursorConnection:
qs, _ = apply_cursor_pagination(
qs,
related_field_id=related_field_id,
info=Info(_raw_info=info, _field=field),
first=field_kwargs.get("first"),
last=field_kwargs.get("last"),
before=field_kwargs.get("before"),
after=field_kwargs.get("after"),
max_results=connection_extension.max_results,
)
else:
mark_optimized = False

Expand Down Expand Up @@ -1246,13 +1258,20 @@ def _get_model_hints_from_connection(
if edge.name.value != "edges":
continue

e_definition = get_object_definition(relay.Edge, strict=True)
e_type = e_definition.resolve_generic(
relay.Edge[cast("type[relay.Node]", n_type)],
)
e_field = object_definition.get_field("edges")
if e_field is None:
break

e_definition = e_field.type
while isinstance(e_definition, StrawberryContainer):
e_definition = e_definition.of_type
if has_object_definition(e_definition):
e_definition = get_object_definition(e_definition, strict=True)
assert isinstance(e_definition, StrawberryObjectDefinition)

e_gql_definition = _get_gql_definition(
schema,
get_object_definition(e_type, strict=True),
e_definition,
)
assert isinstance(e_gql_definition, (GraphQLObjectType, GraphQLInterfaceType))
e_info = _generate_selection_resolve_info(
Expand Down Expand Up @@ -1451,20 +1470,17 @@ def optimize(


def is_optimized(qs: QuerySet) -> bool:
return get_queryset_config(qs).optimized or is_optimized_by_prefetching(qs)
config = get_queryset_config(qs)
return config.optimized or config.optimized_by_prefetching


def mark_optimized_by_prefetching(qs: QuerySet[_M]) -> QuerySet[_M]:
# This is a bit of a hack, but there is no easy way to mark a related manager
# as optimized at this phase, so we just add a mark to the queryset that
# we can check leater on using is_optimized_by_prefetching
return qs.annotate(**{
NESTED_PREFETCH_MARK: models.Value(True),
})
get_queryset_config(qs).optimized_by_prefetching = True
return qs
Comment on lines +1478 to +1479
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is this change safe? If I remember correctly, the reason we would do this was to mark a prefetch query as optimized, because that config there could be lost

Is that not an issue anymore with the changes here? I sincerely don't remember how it would be lost and even if we have a test that would break without this (hopefully we do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding it is safe, yes. If I change is_optimized_by_prefetching to just return False then various tests fail, most importantly this one:

def test_nested_pagination(gql_client: utils.GraphQLTestClient):
but also various others that work with nested connections.

My understanding of why this code was written as it was before my change is as follows:

  1. get_queryset_config was broken in that it never set the configuration on the QuerySet. This wasn't caught because
    1. There were no tests for it.
    2. If it fails, that is only catastrophic for is_optimized_by_prefetching. The other things in the QuerySet config can cause the optimizer to optimize a QuerySet twice or they can cause a type's get_queryset to be called twice. Both of these are usually idempotent so don't cause any issues except for wasted CPU cycles.
  2. The above caused the get_queryset_config bug to never surface until is_optimized_by_prefetching was introduced and at that point the config disappearing was wrongly attributed to being related to prefetch_related when in reality get_queryset_config never worked.



def is_optimized_by_prefetching(qs: QuerySet) -> bool:
return NESTED_PREFETCH_MARK in qs.query.annotations
return get_queryset_config(qs).optimized_by_prefetching


optimizer: contextvars.ContextVar[DjangoOptimizerExtension | None] = (
Expand Down
9 changes: 8 additions & 1 deletion strawberry_django/queryset.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
if TYPE_CHECKING:
from strawberry import Info

from strawberry_django.relay_cursor import OrderingDescriptor

_M = TypeVar("_M", bound=Model)

CONFIG_KEY = "_strawberry_django_config"
Expand All @@ -17,11 +19,16 @@
@dataclasses.dataclass
class StrawberryDjangoQuerySetConfig:
optimized: bool = False
optimized_by_prefetching: bool = False
type_get_queryset_did_run: bool = False
ordering_descriptors: list[OrderingDescriptor] | None = None


def get_queryset_config(queryset: QuerySet) -> StrawberryDjangoQuerySetConfig:
return getattr(queryset, CONFIG_KEY, None) or StrawberryDjangoQuerySetConfig()
config = getattr(queryset, CONFIG_KEY, None)
if config is None:
setattr(queryset, CONFIG_KEY, (config := StrawberryDjangoQuerySetConfig()))
return config


def run_type_get_queryset(
Expand Down
Loading
Loading