Add schema extension for estimating query complexity #3721

serramatutu · 2024-12-11T13:18:07Z

Summary

This commit introduces a new QueryComplexityEstimator class. It's a SchemaExtension which traverses the query tree during validation time and estimates the complexity of executing of each node of the tree. Its intended use is primarily token-bucket rate-limiting based on query complexity, but the API is flexible enough to allow users to do whatever they want with the results.

Types of Changes

Issues Fixed or Closed by This PR

Support for query complexity #2988

Public API

I introduced the following types users can interact with:

`QueryComplexityEstimator`

This is the SchemaExtension that should get added to the global Strawberry Schema. It has 3 possible arguments:

default_estimator: FieldComplexityEstimator | int: The default FieldComplexityEstimator that should be used for fields which don't specify an override. This provides a nice default so users don't need to bother adding complexity estimation logic for every single field. Just throwing in a good default should solve 90% of your problems. If this is an int, we'll automatically use a ConstantFieldComplexityEstimator.
callback: Callable[[Dict[str, int], ExecutionContext], None] | None: In this callback, users can implement their own logic that uses the complexity value, such as raising exceptions for rate-limiting, logging this complexity to Datadog or whatever.
response_key: str | None: If it's provided, the extension will return the complexity map to the client under result.extensions[response_key]. If it's not provided, complexities will not be returned to clients.

`FieldComplexityEstimator`

This is the base class that estimates the complexity of running a single field. Users can inherit from this class to implement their own complexity calculation logic. It has a estimate_complexity() method which accepts the following args:

child_complexities: Iterator[int]: an iterator over the complexities of child nodes (i.e nodes in the selection set). If this is empty, it means the node we're currently evaluating is a leaf. It's worth noting that this iterator is lazy, meaning that if an estimator doesn't use the child complexities, they won't get calculated. This should help a bit with performance and prune some branches of the tree if the complexity can be resolved at the parent.
arguments: FieldArgumentsType: This is a dictionary of all the arguments (parameters) for a field. This can be used to calculate the complexity based on a user-defined parameter like page_size, document_length or whatever.

The following 2 types are concrete implementations of FieldComplexityEstimator that users can use out of the box without bothering to implement them. I added these since I expect these patterns to be quite common.

`ConstantFieldComplexityEstimator`

An implementation of FieldComplexityEstimator that takes in a single complexity: int parameter and always returns that, no matter the cost of its children or parameter values.

`SimpleFieldComplexityEstimator`

An implementation of FieldComplexityEstimator that takes in a single scalar_complexity: int and returns that for scalar nodes. Object nodes return the sum of their children.

Admittedly, this name kinda sucks. I couldn't think of anything better though, so I'm happy to change if anyone has a better idea.

Example

Here's an example that combines all of that to make it easier to digest:

from typing import Iterator

from graphql.error import GraphQLError

import strawberry
from strawberry.types import ExecutionContext
from strawberry.extensions import FieldComplexityEstimator, QueryComplexityEstimator


class MyEstimator(FieldComplexityEstimator):
    def estimate_complexity(
        self, child_complexities: Iterator[int], arguments: dict[str, Any]
    ) -> int:
        children_sum = sum(child_complexities)
        # scalar fields cost 1
        if children_sum == 0:
            return 1

        # non-list object fields cost the sum of their children
        if "page_size" not in arguments:
            return children_sum

        # paginated fields cost gets multiplied by page size
        return children_sum * arguments["page_size"]


# initialize your rate-limiter somehow
rate_limiter = ...


def my_callback(
    complexities: dict[str, int], execution_context: ExecutionContext
) -> None:
    # add complexities to execution context
    execution_context.context["complexities"] = complexities

    # apply a token-bucket rate-limiter by user_id
    total_cost = sum(complexities.values())
    bucket = rate_limiter.get_bucket_for_key(execution_context.context["user_id"])
    tokens_left = bucket.take_tokens(total_cost)
    if tokens_left <= 0:
        raise GraphQLError(
            "Rate-limit exhausted. Please wait for some time before trying again."
        )


schema = strawberry.Schema(
    Query,
    extensions=[
        QueryComplexityEstimator(
            default_estimator=MyEstimator(),
            callback=my_callback,
        ),
    ],
)

Caveats, possible issues and other solutions I considered

There are some caveats to this implementation, and I wanted to bring them to your attention. I'm happy to discuss and/or change the implementation if any of these are dealbreakers :)

Traversing the tree twice

QueryComplexityEstimator runs on on_validate and it traverses the GraphQL document AST using its own logic. This means that if it gets used with another extension that does the same such as QueryDepthLimiter, it is possible that a significant chunk of the tree will get traversed twice since each extension traverses the tree on its own. This could introduce performance issues, especially for large queries.

The idea I had to optimize this was implement a generic Visitor class that gets called by the Strawberry core only once, and then all the "visiting" extensions would register themselves to this visitor. While it's probably faster since it traverses the tree only once, this other approach isn't as flexible since extensions wouldn't get to pick visiting order, and the state management in each extension probably gets a little more annoying. It would also require some significant changes to the Strawberry core, which I thought was out of scope for this PR.

The estimator is not eager

Differently from MaxTokensLimiter and QueryDepthLimiter, the QueryComplexityEstimator is not eager. In other words, there is no predefined "limit" that will stop tree traversal and simply return "you're over the limit". Instead, it always calculates the complexity of the entire query, and it is on the developer to use the callback and do whatever they want with the results.

The main reasoning behind this is that by the nature of rate-limiting, the limit fluctuates, and needs to be dynamically assessed, so we can't create a global limit threshold. I thought of solving this by making a pre hook that would return the current limit given the ExecutionContext. IMO this would make the API more convoluted and annoying than it needs to be, so in the end I discarded that idea and just went with callback.

This decision has one important consideration: users should be strongly encouraged to use MaxTokensLimiter and QueryDepthLimiter to limit the size of the GraphQL documents before passing them into QueryComplexityEstimator, otherwise huge queries could cause unbounded recursion and become an attack vector.

Would love to hear your thoughts on this, though!j

Does not calculate "actual" cost

In the original issue, some people suggested this API should be able to calculate the "actual" cost of a query after resolving it. I decided not to do that since it would add another layer of complexity and overhead, and calculating an "actual" cost can be tricky and result in unpredictable results for clients.

For example, if your database had a failover that made a specific request take longer because the connection was cold, a naive implementation of "actual" query cost might blame the user's query for that and assume it took longer because the query was complex, resulting in them being rate-limited earlier due to a system fault. This would make rate-limiting non-deterministic and could add unpredictability to your APIs.

I'm not saying it isn't possible to calculate "actual" query cost deterministically, but it's a much harder effort than estimating query costs based solely on requested fields and on their input parameters. So I thought the juice was not worth the squeeze.

This is why I named the classes "estimator" and not "calculator".

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
I have tested the changes and verified that they work and don't break anything (as well as I can manage).

Thank you for creating and maintaing Strawberry :D

Summary by Sourcery

Add a new QueryComplexityEstimator class to estimate query complexity for rate-limiting and other purposes, along with supporting classes and documentation.

New Features:

Introduce a new QueryComplexityEstimator class as a SchemaExtension to estimate the complexity of executing each node in a query tree, primarily for rate-limiting purposes.
Add FieldComplexityEstimator as a base class for estimating the complexity of running a single field, allowing users to implement custom complexity calculation logic.
Provide ConstantFieldComplexityEstimator and SimpleFieldComplexityEstimator as concrete implementations of FieldComplexityEstimator for common use cases.

Documentation:

Add documentation for the new QueryComplexityEstimator, including usage examples and API reference.

Tests:

Introduce tests for the QueryComplexityEstimator to ensure it calculates complexities correctly and integrates well with other components.

This commit introduces a new `QueryComplexityEstimator` class. It's a `SchemaExtension` which traverses the query tree during validation time and estimates the complexity of executing of each node of the tree. Its intended use is primarily token-bucket rate-limiting based on query-complexity.

sourcery-ai · 2024-12-11T13:18:12Z

Reviewer's Guide by Sourcery

This PR introduces a new schema extension for estimating GraphQL query complexity. The extension traverses the query tree during validation and calculates complexity scores for each operation, which can be used for rate limiting or other purposes. The implementation includes a base estimator class and two concrete implementations for common use cases.

Class diagram for QueryComplexityEstimator and related classes

classDiagram
    class SchemaExtension
    class FieldExtension
    class FieldComplexityEstimator {
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
        +resolve(next_: Callable[..., Any], source: Any, info: Info, **kwargs: Any) Any
    }
    FieldComplexityEstimator --|> FieldExtension
    class SimpleFieldComplexityEstimator {
        +scalar_complexity: int
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
    }
    SimpleFieldComplexityEstimator --|> FieldComplexityEstimator
    class ConstantFieldComplexityEstimator {
        +complexity: int
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
    }
    ConstantFieldComplexityEstimator --|> FieldComplexityEstimator
    class QueryComplexityEstimator {
        +default_estimator: FieldComplexityEstimator
        +callback: Optional[Callable[[Dict[str, int], ExecutionContext], None]]
        +response_key: Optional[str]
        +get_results() Dict[str, Any]
        +on_validate() Iterator[None]
    }
    QueryComplexityEstimator --|> SchemaExtension
    class ExecutionContext
    class Info
    class FieldArgumentsType

File-Level Changes

Change	Details	Files
Added new QueryComplexityEstimator schema extension	Implements query tree traversal to calculate complexity scores Supports custom complexity estimation logic via FieldComplexityEstimator base class Provides callback mechanism for handling complexity results Allows optional inclusion of complexity scores in GraphQL response	`strawberry/extensions/query_complexity_estimator.py`
Added two concrete FieldComplexityEstimator implementations	ConstantFieldComplexityEstimator returns a fixed complexity value SimpleFieldComplexityEstimator handles scalar and object fields differently	`strawberry/extensions/query_complexity_estimator.py`
Added comprehensive test suite for query complexity estimation	Tests for basic complexity estimation scenarios Tests for fragment handling Tests for variable handling Tests for different estimator implementations	`tests/schema/extensions/test_query_complexity_estimator.py`
Added documentation for the new extension	Detailed usage examples with rate limiting API reference for QueryComplexityEstimator and FieldComplexityEstimator Explanation of complexity estimation concepts	`docs/extensions/query-complexity-estimator.md`

Possibly linked issues

Support for query complexity #2988: The PR implements the query complexity support discussed in the issue by adding a new extension.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @serramatutu - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 4 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

strawberry/extensions/query_complexity_estimator.py

docs/extensions/query-complexity-estimator.md

botberry · 2024-12-11T13:20:57Z

Hi, thanks for contributing to Strawberry 🍓!

We noticed that this PR is missing a RELEASE.md file. We use that to automatically do releases here on GitHub and, most importantly, to PyPI!

So as soon as this PR is merged, a release will be made 🚀.

Here's an example of RELEASE.md:

Release type: patch

Description of the changes, ideally with some examples, if adding a new feature.

Release type can be one of patch, minor or major. We use semver, so make sure to pick the appropriate type. If in doubt feel free to ask :)

Here's the tweet text:

🆕 Release (next) is out! Thanks to Lucas Valente for the PR 👏

Get it here 👉 https://strawberry.rocks/release/(next)

serramatutu · 2024-12-11T13:22:40Z

I'll wait for review on the main functionality before adding a proper RELEASES.md with all the examples etc.

codspeed-hq · 2024-12-11T13:31:28Z

CodSpeed Performance Report

Merging #3721 will not alter performance

_{Comparing serramatutu:serramatutu/query-complexity (2e7a806) with main (6553c9e)}

Summary

✅ 15 untouched benchmarks

codecov · 2024-12-11T13:33:24Z

Codecov Report

Attention: Patch coverage is 98.21429% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.01%. Comparing base (6553c9e) to head (2e7a806).

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #3721    +/-   ##
========================================
  Coverage   97.00%   97.01%            
========================================
  Files         501      503     +2     
  Lines       33490    33714   +224     
  Branches     5592     5621    +29     
========================================
+ Hits        32487    32706   +219     
- Misses        791      795     +4     
- Partials      212      213     +1

serramatutu · 2024-12-11T13:33:28Z

strawberry/extensions/query_complexity_estimator.py

+            variables = self.execution_context.variables or {}
+            node_body = node
+            args = {
+                to_snake_case(arg.name.value): variables.get(arg.value.name.value, None)


Is there an inverse function to self.execution_context.schema.config.name_converter.apply_naming_config that will turn camel into snake case without hardcoding it here? What we need is to get the python_name of a field from its graphql_name

serramatutu · 2024-12-11T15:33:05Z

strawberry/extensions/query_complexity_estimator.py

+
+from strawberry.extensions.base_extension import SchemaExtension
+from strawberry.extensions.field_extension import FieldExtension
+from strawberry.extensions.query_depth_limiter import (


NOTE: I reused some of the pre-existing machinery from QueryDepthLimiter. I didn't move it out of there to some sort of shared utils file since that code seems to have a special license.

jacobmoshipco · 2024-12-11T16:29:58Z

I'm noticing estimate_complexity doesn't seem to take information about what field is having its complexity estimated. Would the extension be able to take a StrawberryField instance like strawberry.Info contains, or is it too early in the GraphQL parsing flow to be able to instantiate that? Apologies if I'm just missing something.

serramatutu · 2024-12-11T17:09:58Z

@jacobmoshipco Yea I think that's a good point. I can't really see any practical use cases right now that would require the field itself, but just having it there would make the API a lot more flexible in case users wanna do introspection etc

I will add that :)

serramatutu mentioned this pull request Dec 11, 2024

Support for query complexity #2988

Open

sourcery-ai bot reviewed Dec 11, 2024

View reviewed changes

Docs

2e7a806

serramatutu force-pushed the serramatutu/query-complexity branch from 81ca2fb to 2e7a806 Compare December 11, 2024 13:21

serramatutu commented Dec 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add schema extension for estimating query complexity #3721

Add schema extension for estimating query complexity #3721

Uh oh!

serramatutu commented Dec 11, 2024 •

edited

Loading

Uh oh!

sourcery-ai bot commented Dec 11, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

botberry commented Dec 11, 2024 •

edited

Loading

Uh oh!

serramatutu commented Dec 11, 2024

Uh oh!

codspeed-hq bot commented Dec 11, 2024

Uh oh!

codecov bot commented Dec 11, 2024 •

edited

Loading

Uh oh!

serramatutu Dec 11, 2024 •

edited

Loading

Uh oh!

serramatutu Dec 11, 2024 •

edited

Loading

Uh oh!

jacobmoshipco commented Dec 11, 2024

Uh oh!

serramatutu commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

Add schema extension for estimating query complexity #3721

Are you sure you want to change the base?

Add schema extension for estimating query complexity #3721

Uh oh!

Conversation

serramatutu commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Types of Changes

Issues Fixed or Closed by This PR

Public API

QueryComplexityEstimator

FieldComplexityEstimator

ConstantFieldComplexityEstimator

SimpleFieldComplexityEstimator

Example

Caveats, possible issues and other solutions I considered

Traversing the tree twice

The estimator is not eager

Does not calculate "actual" cost

Checklist

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide by Sourcery

Class diagram for QueryComplexityEstimator and related classes

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

botberry commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serramatutu commented Dec 11, 2024

Uh oh!

codspeed-hq bot commented Dec 11, 2024

CodSpeed Performance Report

Merging #3721 will not alter performance

Summary

Uh oh!

codecov bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

serramatutu Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serramatutu Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobmoshipco commented Dec 11, 2024

Uh oh!

serramatutu commented Dec 11, 2024

Uh oh!

Uh oh!

serramatutu commented Dec 11, 2024 •

edited

Loading

`QueryComplexityEstimator`

`FieldComplexityEstimator`

`ConstantFieldComplexityEstimator`

`SimpleFieldComplexityEstimator`

sourcery-ai bot commented Dec 11, 2024 •

edited

Loading

botberry commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

serramatutu Dec 11, 2024 •

edited

Loading

serramatutu Dec 11, 2024 •

edited

Loading