Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema extension for estimating query complexity #3721

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

serramatutu
Copy link
Contributor

@serramatutu serramatutu commented Dec 11, 2024

Summary

This commit introduces a new QueryComplexityEstimator class. It's a SchemaExtension which traverses the query tree during validation time and estimates the complexity of executing of each node of the tree. Its intended use is primarily token-bucket rate-limiting based on query complexity, but the API is flexible enough to allow users to do whatever they want with the results.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Documentation

Issues Fixed or Closed by This PR

Public API

I introduced the following types users can interact with:

QueryComplexityEstimator

This is the SchemaExtension that should get added to the global Strawberry Schema. It has 3 possible arguments:

  • default_estimator: FieldComplexityEstimator | int: The default FieldComplexityEstimator that should be used for fields which don't specify an override. This provides a nice default so users don't need to bother adding complexity estimation logic for every single field. Just throwing in a good default should solve 90% of your problems. If this is an int, we'll automatically use a ConstantFieldComplexityEstimator.
  • callback: Callable[[Dict[str, int], ExecutionContext], None] | None: In this callback, users can implement their own logic that uses the complexity value, such as raising exceptions for rate-limiting, logging this complexity to Datadog or whatever.
  • response_key: str | None: If it's provided, the extension will return the complexity map to the client under result.extensions[response_key]. If it's not provided, complexities will not be returned to clients.

FieldComplexityEstimator

This is the base class that estimates the complexity of running a single field. Users can inherit from this class to implement their own complexity calculation logic. It has a estimate_complexity() method which accepts the following args:

  • child_complexities: Iterator[int]: an iterator over the complexities of child nodes (i.e nodes in the selection set). If this is empty, it means the node we're currently evaluating is a leaf. It's worth noting that this iterator is lazy, meaning that if an estimator doesn't use the child complexities, they won't get calculated. This should help a bit with performance and prune some branches of the tree if the complexity can be resolved at the parent.
  • arguments: FieldArgumentsType: This is a dictionary of all the arguments (parameters) for a field. This can be used to calculate the complexity based on a user-defined parameter like page_size, document_length or whatever.

The following 2 types are concrete implementations of FieldComplexityEstimator that users can use out of the box without bothering to implement them. I added these since I expect these patterns to be quite common.

ConstantFieldComplexityEstimator

An implementation of FieldComplexityEstimator that takes in a single complexity: int parameter and always returns that, no matter the cost of its children or parameter values.

SimpleFieldComplexityEstimator

An implementation of FieldComplexityEstimator that takes in a single scalar_complexity: int and returns that for scalar nodes. Object nodes return the sum of their children.

Admittedly, this name kinda sucks. I couldn't think of anything better though, so I'm happy to change if anyone has a better idea.

Example

Here's an example that combines all of that to make it easier to digest:

from typing import Iterator

from graphql.error import GraphQLError

import strawberry
from strawberry.types import ExecutionContext
from strawberry.extensions import FieldComplexityEstimator, QueryComplexityEstimator


class MyEstimator(FieldComplexityEstimator):
    def estimate_complexity(
        self, child_complexities: Iterator[int], arguments: dict[str, Any]
    ) -> int:
        children_sum = sum(child_complexities)
        # scalar fields cost 1
        if children_sum == 0:
            return 1

        # non-list object fields cost the sum of their children
        if "page_size" not in arguments:
            return children_sum

        # paginated fields cost gets multiplied by page size
        return children_sum * arguments["page_size"]


# initialize your rate-limiter somehow
rate_limiter = ...


def my_callback(
    complexities: dict[str, int], execution_context: ExecutionContext
) -> None:
    # add complexities to execution context
    execution_context.context["complexities"] = complexities

    # apply a token-bucket rate-limiter by user_id
    total_cost = sum(complexities.values())
    bucket = rate_limiter.get_bucket_for_key(execution_context.context["user_id"])
    tokens_left = bucket.take_tokens(total_cost)
    if tokens_left <= 0:
        raise GraphQLError(
            "Rate-limit exhausted. Please wait for some time before trying again."
        )


schema = strawberry.Schema(
    Query,
    extensions=[
        QueryComplexityEstimator(
            default_estimator=MyEstimator(),
            callback=my_callback,
        ),
    ],
)

Caveats, possible issues and other solutions I considered

There are some caveats to this implementation, and I wanted to bring them to your attention. I'm happy to discuss and/or change the implementation if any of these are dealbreakers :)

Traversing the tree twice

QueryComplexityEstimator runs on on_validate and it traverses the GraphQL document AST using its own logic. This means that if it gets used with another extension that does the same such as QueryDepthLimiter, it is possible that a significant chunk of the tree will get traversed twice since each extension traverses the tree on its own. This could introduce performance issues, especially for large queries.

The idea I had to optimize this was implement a generic Visitor class that gets called by the Strawberry core only once, and then all the "visiting" extensions would register themselves to this visitor. While it's probably faster since it traverses the tree only once, this other approach isn't as flexible since extensions wouldn't get to pick visiting order, and the state management in each extension probably gets a little more annoying. It would also require some significant changes to the Strawberry core, which I thought was out of scope for this PR.

The estimator is not eager

Differently from MaxTokensLimiter and QueryDepthLimiter, the QueryComplexityEstimator is not eager. In other words, there is no predefined "limit" that will stop tree traversal and simply return "you're over the limit". Instead, it always calculates the complexity of the entire query, and it is on the developer to use the callback and do whatever they want with the results.

The main reasoning behind this is that by the nature of rate-limiting, the limit fluctuates, and needs to be dynamically assessed, so we can't create a global limit threshold. I thought of solving this by making a pre hook that would return the current limit given the ExecutionContext. IMO this would make the API more convoluted and annoying than it needs to be, so in the end I discarded that idea and just went with callback.

This decision has one important consideration: users should be strongly encouraged to use MaxTokensLimiter and QueryDepthLimiter to limit the size of the GraphQL documents before passing them into QueryComplexityEstimator, otherwise huge queries could cause unbounded recursion and become an attack vector.

Would love to hear your thoughts on this, though!j

Does not calculate "actual" cost

In the original issue, some people suggested this API should be able to calculate the "actual" cost of a query after resolving it. I decided not to do that since it would add another layer of complexity and overhead, and calculating an "actual" cost can be tricky and result in unpredictable results for clients.

For example, if your database had a failover that made a specific request take longer because the connection was cold, a naive implementation of "actual" query cost might blame the user's query for that and assume it took longer because the query was complex, resulting in them being rate-limited earlier due to a system fault. This would make rate-limiting non-deterministic and could add unpredictability to your APIs.

I'm not saying it isn't possible to calculate "actual" query cost deterministically, but it's a much harder effort than estimating query costs based solely on requested fields and on their input parameters. So I thought the juice was not worth the squeeze.

This is why I named the classes "estimator" and not "calculator".

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

Thank you for creating and maintaing Strawberry :D

Summary by Sourcery

Add a new QueryComplexityEstimator class to estimate query complexity for rate-limiting and other purposes, along with supporting classes and documentation.

New Features:

  • Introduce a new QueryComplexityEstimator class as a SchemaExtension to estimate the complexity of executing each node in a query tree, primarily for rate-limiting purposes.
  • Add FieldComplexityEstimator as a base class for estimating the complexity of running a single field, allowing users to implement custom complexity calculation logic.
  • Provide ConstantFieldComplexityEstimator and SimpleFieldComplexityEstimator as concrete implementations of FieldComplexityEstimator for common use cases.

Documentation:

  • Add documentation for the new QueryComplexityEstimator, including usage examples and API reference.

Tests:

  • Introduce tests for the QueryComplexityEstimator to ensure it calculates complexities correctly and integrates well with other components.

This commit introduces a new `QueryComplexityEstimator` class. It's a
`SchemaExtension` which traverses the query tree during validation time
and estimates the complexity of executing of each node of the tree. Its
intended use is primarily token-bucket rate-limiting based on
query-complexity.
Copy link
Contributor

sourcery-ai bot commented Dec 11, 2024

Reviewer's Guide by Sourcery

This PR introduces a new schema extension for estimating GraphQL query complexity. The extension traverses the query tree during validation and calculates complexity scores for each operation, which can be used for rate limiting or other purposes. The implementation includes a base estimator class and two concrete implementations for common use cases.

Class diagram for QueryComplexityEstimator and related classes

classDiagram
    class SchemaExtension
    class FieldExtension
    class FieldComplexityEstimator {
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
        +resolve(next_: Callable[..., Any], source: Any, info: Info, **kwargs: Any) Any
    }
    FieldComplexityEstimator --|> FieldExtension
    class SimpleFieldComplexityEstimator {
        +scalar_complexity: int
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
    }
    SimpleFieldComplexityEstimator --|> FieldComplexityEstimator
    class ConstantFieldComplexityEstimator {
        +complexity: int
        +estimate_complexity(child_complexities: Iterator[int], arguments: FieldArgumentsType) int
    }
    ConstantFieldComplexityEstimator --|> FieldComplexityEstimator
    class QueryComplexityEstimator {
        +default_estimator: FieldComplexityEstimator
        +callback: Optional[Callable[[Dict[str, int], ExecutionContext], None]]
        +response_key: Optional[str]
        +get_results() Dict[str, Any]
        +on_validate() Iterator[None]
    }
    QueryComplexityEstimator --|> SchemaExtension
    class ExecutionContext
    class Info
    class FieldArgumentsType
Loading

File-Level Changes

Change Details Files
Added new QueryComplexityEstimator schema extension
  • Implements query tree traversal to calculate complexity scores
  • Supports custom complexity estimation logic via FieldComplexityEstimator base class
  • Provides callback mechanism for handling complexity results
  • Allows optional inclusion of complexity scores in GraphQL response
strawberry/extensions/query_complexity_estimator.py
Added two concrete FieldComplexityEstimator implementations
  • ConstantFieldComplexityEstimator returns a fixed complexity value
  • SimpleFieldComplexityEstimator handles scalar and object fields differently
strawberry/extensions/query_complexity_estimator.py
Added comprehensive test suite for query complexity estimation
  • Tests for basic complexity estimation scenarios
  • Tests for fragment handling
  • Tests for variable handling
  • Tests for different estimator implementations
tests/schema/extensions/test_query_complexity_estimator.py
Added documentation for the new extension
  • Detailed usage examples with rate limiting
  • API reference for QueryComplexityEstimator and FieldComplexityEstimator
  • Explanation of complexity estimation concepts
docs/extensions/query-complexity-estimator.md

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @serramatutu - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 4 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

docs/extensions/query-complexity-estimator.md Show resolved Hide resolved
docs/extensions/query-complexity-estimator.md Outdated Show resolved Hide resolved
docs/extensions/query-complexity-estimator.md Outdated Show resolved Hide resolved
@botberry
Copy link
Member

botberry commented Dec 11, 2024

Hi, thanks for contributing to Strawberry 🍓!

We noticed that this PR is missing a RELEASE.md file. We use that to automatically do releases here on GitHub and, most importantly, to PyPI!

So as soon as this PR is merged, a release will be made 🚀.

Here's an example of RELEASE.md:

Release type: patch

Description of the changes, ideally with some examples, if adding a new feature.

Release type can be one of patch, minor or major. We use semver, so make sure to pick the appropriate type. If in doubt feel free to ask :)

Here's the tweet text:

🆕 Release (next) is out! Thanks to Lucas Valente for the PR 👏

Get it here 👉 https://strawberry.rocks/release/(next)

@serramatutu serramatutu force-pushed the serramatutu/query-complexity branch from 81ca2fb to 2e7a806 Compare December 11, 2024 13:21
@serramatutu
Copy link
Contributor Author

I'll wait for review on the main functionality before adding a proper RELEASES.md with all the examples etc.

Copy link

codspeed-hq bot commented Dec 11, 2024

CodSpeed Performance Report

Merging #3721 will not alter performance

Comparing serramatutu:serramatutu/query-complexity (2e7a806) with main (6553c9e)

Summary

✅ 15 untouched benchmarks

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 98.21429% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.01%. Comparing base (6553c9e) to head (2e7a806).

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3721    +/-   ##
========================================
  Coverage   97.00%   97.01%            
========================================
  Files         501      503     +2     
  Lines       33490    33714   +224     
  Branches     5592     5621    +29     
========================================
+ Hits        32487    32706   +219     
- Misses        791      795     +4     
- Partials      212      213     +1     

variables = self.execution_context.variables or {}
node_body = node
args = {
to_snake_case(arg.name.value): variables.get(arg.value.name.value, None)
Copy link
Contributor Author

@serramatutu serramatutu Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an inverse function to self.execution_context.schema.config.name_converter.apply_naming_config that will turn camel into snake case without hardcoding it here? What we need is to get the python_name of a field from its graphql_name


from strawberry.extensions.base_extension import SchemaExtension
from strawberry.extensions.field_extension import FieldExtension
from strawberry.extensions.query_depth_limiter import (
Copy link
Contributor Author

@serramatutu serramatutu Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I reused some of the pre-existing machinery from QueryDepthLimiter. I didn't move it out of there to some sort of shared utils file since that code seems to have a special license.

@jacobmoshipco
Copy link
Contributor

I'm noticing estimate_complexity doesn't seem to take information about what field is having its complexity estimated. Would the extension be able to take a StrawberryField instance like strawberry.Info contains, or is it too early in the GraphQL parsing flow to be able to instantiate that? Apologies if I'm just missing something.

@serramatutu
Copy link
Contributor Author

@jacobmoshipco Yea I think that's a good point. I can't really see any practical use cases right now that would require the field itself, but just having it there would make the API a lot more flexible in case users wanna do introspection etc

I will add that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants