Skip to content

Add support for "true" cursor based pagination in connections #730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

diesieben07
Copy link
Contributor

@diesieben07 diesieben07 commented Apr 9, 2025

This pull request adds a new connection class DjangoCursorConnection which supports efficient cursor-based pagination through any Django QuerySet without relying on offset-slicing.

Description

ListConnection uses slicing to achieve pagination. This works for the general case, but can be inefficient for large datasets, because large page numbers result in a large OFFSET in SQL. An alternative to limit/offset pagination is cursor based pagination, which replaces OFFSET by range queries such as Q(due_date__gt=...). DjangoCursorConnection implements this approach.

How it works

DjangoCursorConnection inspects the QuerySet and extracts its ordering parameters. It then uses those parameters to construct the cursors. For example, for order_by("due_date", "pk") the ordering parameters would be due_date and pk. If the ordering parameter is an expression or not a direct field on the model (e.g. order_by(Upper("name")) or order_by("project__name"), a new annotation will be added to the queryset, mirroring the ordering expression, so that the value can be extracted later.
The extracted values are then encoded into a cursor.
When paginating, the cursor is deconstructed into its parts again and those parts are then used to build a pagination filter.
For example, when ordering by "due_date", "pk", the cursor might contain the parts "2025-03-01" and ,"3". If that cursor is passed for after, the following filter would be constructed:
Q(due_date__gt="2025-03-01") | (Q(due_date="2025-03-01") & Q(pk__gt="3"))

Serializing and deserializing the field values to strings is dedicated to the model field implementation, ensuring maximal compatibility.

Other

  • During the implementation I discovered a bug in get_queryset_config. It is used in the code as if it did a "get or create" operation, setting the config on the QuerySet if not already present. However it did not actually do so.
    This is likely the cause for why this hack was implemented.

  • While writing tests I have noticed that strawberry_django.field(disable_optimization=True) had no effect when used on a top level field. I have fixed this.

  • Currently, the code lives in a separate relay_cursor.py file. I have chosen to do this for now to make the diff of this PR easier to parse. However I think the relay code should be refactored so that relay.py is removed and we have relay/__init__.py instead. Then the code can be split up into multiple files and still offer the same imports. What do you think?

I have fixed get_queryset_config, added specific tests for it and removed the (now unnecessary) hack in is_optimized_by_prefetching.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Documentation

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

Copy link
Contributor

sourcery-ai bot commented Apr 9, 2025

Reviewer's Guide by Sourcery

This pull request introduces DjangoCursorConnection for efficient cursor-based pagination and fixes a bug in get_queryset_config, removing a related hack. It enhances performance for large datasets and improves the reliability of queryset optimization.

Updated class diagram for StrawberryDjangoQuerySetConfig

classDiagram
  class StrawberryDjangoQuerySetConfig {
    -optimized: bool
    -optimized_by_prefetching: bool
    -type_get_queryset_did_run: bool
    -ordering_descriptors: list[OrderingDescriptor] | None
  }
Loading

File-Level Changes

Change Details Files
Introduces DjangoCursorConnection to enable cursor-based pagination, enhancing performance for large datasets by using range queries instead of offset-based slicing.
  • Adds a new connection class DjangoCursorConnection.
  • Implements cursor-based pagination using range queries.
  • Inspects QuerySet ordering parameters to construct cursors.
  • Serializes and deserializes field values using model field implementations.
  • Adds DjangoCursorConnection to strawberry_django.relay_cursor.
  • Adds tests for cursor pagination in tests/relay/test_cursor_pagination.py.
  • Adds a milestone_cursor_conn field to the Query type in tests/projects/schema.py.
strawberry_django/optimizer.py
docs/guide/relay.md
tests/projects/schema.py
tests/relay/test_cursor_pagination.py
strawberry_django/relay_cursor.py
Fixes a bug in get_queryset_config to ensure it correctly sets the config on the QuerySet, and removes a related hack in is_optimized_by_prefetching.
  • Corrects get_queryset_config to properly set the config on the QuerySet.
  • Removes an unnecessary hack in is_optimized_by_prefetching.
  • Adds specific tests for get_queryset_config.
  • Adds optimized_by_prefetching to StrawberryDjangoQuerySetConfig.
  • Modifies is_optimized_by_prefetching to use get_queryset_config.
  • Updates mark_optimized_by_prefetching to set optimized_by_prefetching in get_queryset_config.
strawberry_django/optimizer.py
strawberry_django/queryset.py
tests/test_queryset_config.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @diesieben07 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a section to the documentation that compares and contrasts ListConnection and DjangoCursorConnection, highlighting when each should be used.
  • It might be worth adding a note about the implications of strictly ordered QuerySets in the documentation.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

## Cursor based connections

As an alternative to the default `ListConnection`, `DjangoCursorConnection` is also available.
It supports pagination through a Django `QuerySet` via "true" cursors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Clarify the meaning of "true cursors". Suggest using "offset-based cursors" vs "range-based cursors" to distinguish the approaches.

The term "true cursors" might be confusing to users. Using more descriptive terms like "offset-based cursors" and "range-based cursors" would improve clarity.

Suggested implementation:

It supports pagination through a Django `QuerySet` via range-based cursors.

`ListConnection` uses offset-based cursors (slicing) to achieve pagination, which can negatively affect performance for huge datasets,

return [getattr(obj, descriptor.attname) for descriptor in descriptors]


def build_tuple_compare(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the complex, nested logic in build_tuple_compare and apply_cursor_pagination into smaller, well-named helper functions to improve readability and maintainability by reducing nested code blocks and improving readability..

Consider splitting the complex, nested logic into smaller helper functions. For example, the loop in build_tuple_compare mixes comparator and equality handling with nested conditionals. You can extract the "compare" logic into a helper:

def _compare_field(descriptor: OrderingDescriptor, field_value: Any, before: bool) -> Q:
    value_expr = Value(field_value, output_field=descriptor.order_by.expression.output_field)
    comparator = descriptor.get_comparator(value_expr, before)
    eq = descriptor.get_eq(value_expr)
    if comparator is None:
        return eq
    return comparator | (eq & Q())

def build_tuple_compare(
    descriptors: list[OrderingDescriptor],
    cursor_values: list,
    before: bool,
) -> Q:
    comparators = [
        _compare_field(descriptor, value, before)
        for descriptor, value in zip(reversed(descriptors), reversed(cursor_values))
    ]
    # Combine comparators with an 'AND' chain
    current = Q()
    for comp in comparators:
        current &= comp
    return current

Similarly, consider isolating parts of the slicing logic in apply_cursor_pagination into a small helper. For example:

def _apply_slice(qs: QuerySet, slice_: slice, related_field_id: Optional[str]) -> QuerySet:
    if related_field_id:
        offset = slice_.start or 0
        return apply_window_pagination(qs, related_field_id=related_field_id, offset=offset, limit=slice_.stop - offset)
    return qs[slice_]

# Then in apply_cursor_pagination replace:
if slice_ is not None:
    qs = _apply_slice(qs, slice_, related_field_id)

This approach keeps all functionality intact while reducing nested code blocks and improving readability.

@diesieben07
Copy link
Contributor Author

I don't know why the Typing test fails. pyright reports no problems for me locally and the reported file here is a file that I haven't touched.

@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 99.04459% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.16%. Comparing base (27a8066) to head (f011c1b).

Files with missing lines Patch % Lines
strawberry_django/relay_impl/list_connection.py 96.61% 2 Missing ⚠️
strawberry_django/optimizer.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #730      +/-   ##
==========================================
+ Coverage   88.45%   89.16%   +0.70%     
==========================================
  Files          42       45       +3     
  Lines        4002     4253     +251     
==========================================
+ Hits         3540     3792     +252     
+ Misses        462      461       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@diesieben07
Copy link
Contributor Author

diesieben07 commented Apr 9, 2025

It's currently broken for nulls in ordering. I'm working on fixing that. Fixed

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job here, really appreciate your contributions ❤️

Left a couple of comments/suggestions

@diesieben07
Copy link
Contributor Author

See #736 (comment) for the typing issue.

@bellini666 As I said on Discord I have updated this for the changes in Strawberry Core, feel free to review this again or merge it if you think it's good to go!

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one last ask, and then it is an automerge.

Can we have all of this under strawberry_django.relay?

We can do that by creating a relay directory, rename the current relay.py to utils.py, move the ListConnectionWithTotalCount in it to a connections.py module, and put everything you have on relay_cursor inside it. Then we export the important stuff on relay/__init__.py. Wdyt?

We would end up with something like

strawberry_django/
  relay/
    __init__.py
    connections.py
    utils.py

@diesieben07
Copy link
Contributor Author

Can we have all of this under strawberry_django.relay?

Yes, 100%! That was my plan all along, I did it this way for now to make the diff easier to parse and limit the changes to what I am actually changing in this PR. I'll refactor it as you suggested.

@diesieben07
Copy link
Contributor Author

@bellini666 I've applied your suggestion, DjangoCursorConnection can now be imported from strawberry_django.relay.
I've kept cursor_connection.py and list_connection.py as separate files.
I had to move the actual relay implementation to relay_impl and introduce relay as an "interface package" to resolve complex import cycles that arose from just having the implementation in relay. See this video (with timestamp) for info: https://www.youtube.com/watch?v=UnKa_t-M_kM&t=329s

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty very much! ❤️

@bellini666 bellini666 merged commit b243d3f into strawberry-graphql:main Apr 30, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants