Fetching current_transaction for every GraphQL field results in performance regression

I believe I'm hitting the same issue as https://github.com/newrelic/newrelic-python-agent/issues/605, however, I don't think `graphene` is part of the problem. I think the real problem is `newrelic`+`graphql-core`.

I think that the reason that NewRelic+GraphQL leads to a performance regression is because the NR agent tries to retrieve the `current_transaction` for every single field that needs to be resolved. It's not a problem if you have a query that returns a single, shallow object, but any kind of list query that returns complex and/or nested objects is likely to be negatively impacted by the many, many requests to `current_transaction`. I think it is incorrect for `newrelic` to assume it can call `current_transaction` for every field that needs to be resolved.

DISCLAIMER: I'm pretty sure `current_transaction` is part of the problem, but it may not be the whole problem. If I override `current_transaction` to always return `None`, I still see a regression, but a less substantial regression (1.5x vs 2-3x+). That said, a bunch of logic depends on `current_transaction` not being `None`, so there's a chance that other code that is being circumvented is the true culprit, but based on the output from `yappi`, I'm not sure what else to blame.

Here's some example info from `yappi` that suggests `current_transaction`:

`GraphQL Execute` timing when running without New Relic:
```
full_name: /usr/local/lib/python3.10/site-packages/graphql/execution/execute.py:965 execute
ncall: 999/999
ttot: 28.11397
```

Time spent in `current_transaction` when running with New Relic:
```
full_name: /usr/local/lib/python3.10/site-packages/newrelic/api/transaction.py:1736 current_transaction
ncall: 3997998/3997998
ttot: 20.82567
```

Those stats come from different test runs, but those numbers support the notion that calls to `current_transaction` could be responsible for almost doubling the time it takes to execute the query.

According to #605, the regression occurred [between these two releases](https://github.com/newrelic/newrelic-python-agent/compare/v6.4.4.161...v6.6.0.162), which the referenced code is part of.


**Description**
A GraphQL request that takes 1 unit of time without the NewRelic agent running takes 2-3 units of time with the NewRelic agent running. The severity of the performance regression is likely API specific and seems likely to depend on the total number of fields that need to be resolved.

**Expected Behavior**
Using NewRelic agent in combination with GraphQL should not cause a significant performance regression.

**[Troubleshooting](https://discuss.newrelic.com/t/troubleshooting-frameworks/108787) or [NR Diag](https://docs.newrelic.com/docs/using-new-relic/cross-product-functions/troubleshooting/new-relic-diagnostics) results**
I started with a pretty thick stack and was able to reduce the problem to resulting from the combination of `graphql-core` and `newrelic` python packages.

I made a fork of the `newrelic-python-agent` repo and was able to specifically trace the problem to being related to the newrelic hook that wraps graphql with the `instrument_graphql_execute` wrapper. I made a draft PR that just shows what I disabled while still reproducing the performance regression: https://github.com/newrelic/newrelic-python-agent/pull/830/files. YMMV, but disabling some of the other hooks/wrappers can help reduce noise to make it easier to figure out where the problem is coming from.

**Steps to Reproduce**
I made [this repo to reproduce the problem with a strawman GraphQL query](https://github.com/tdg5/newrelic-python-agent-graphql-issue). The [README.md](https://github.com/tdg5/newrelic-python-agent-graphql-issue/blob/main/README.md) offers some steps for reproducing the problem but there's also a Github workflow that reproduces the problem here: https://github.com/tdg5/newrelic-python-agent-graphql-issue/blob/main/.github/workflows/reproduction.yml

I'm not sure if it's visible to all, but a run of that workflow that demonstrates the significant difference in performance can be found here: https://github.com/tdg5/newrelic-python-agent-graphql-issue/actions/runs/5256745026/jobs/9498499333#step:3:1012. The relevant output is:

```
***********************************************
*       BGN: RUN WITHOUT NEW RELIC AGENT      *
***********************************************
71.92
^^^^^^ WITHOUT NEW RELIC AGENT WALL TIME ^^^^^^
***********************************************
*       END: RUN WITHOUT NEW RELIC AGENT      *
***********************************************
*****************************************
*      BGN: RUN WITH NEW RELIC AGENT    *
*****************************************
153.98
^^^^^ WITH NEW RELIC AGENT WALL TIME ^^^^
*****************************************
*      END: RUN WITH NEW RELIC AGENT    *
*****************************************
```

**Your Environment**
I've been using the latest release of newrelic and I've only tried ubuntu environments, but it's not obvious to me that this would be an OS specific issue.

**Additional context**
I've attached the output from a `yappi` functional analysis both with and without new relic to try to identify how the app is spending its time differently. The main culprit that sticks out is the calls to `current_transaction`, but I'm inclined to believe there is another source of performance regression because I still encounter a non-trivial performance regression (1.5x vs 2-3x) if I override `current_transaction` to always return `None`, but perhaps that's just the cost of using New Relic.

`yappi.get_func_stats()` WITHOUT new relic: [no-nr-func-stats.txt](https://github.com/newrelic/newrelic-python-agent/files/11736289/no-nr-func-stats.txt)

`yappi.get_func_stats()` WITH new relic: [nr-func-stats.txt](https://github.com/newrelic/newrelic-python-agent/files/11736294/nr-func-stats.txt)


Please let me know if there's anything I can do to help you reproduce or correct this problem. I have GraphQL APIs in prod that I am unable to instrument because of the enormous performance regression that instrumentation introduces. 😿 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fetching current_transaction for every GraphQL field results in performance regression #831

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fetching current_transaction for every GraphQL field results in performance regression #831

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions