Description
I'm not sure if other people have run into this problem - I searched for days and couldn't find a clear answer. But eventually I figured out a solution, so wanted to document it here in case other people do run into it.
Background
Our setup uses a Postgres database which provides data for a GraphQL API, for which we initially used Flask, following the example. However, our schema uses a lot of 1:N relationships, so our performance was degraded due to the N+1 Round Trip Problem. The fix for this is, supposedly, very simple: just set batching=True
in the meta class. However, doing this triggers the use of dataloaders, which operate asynchronously. So we switched from Flask to using the aiohttp web framework and asyncpg
database driver, which are better suited for asynchronous tasks. This is when we first stumbled on the MissingGreenlet error: sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place?
Issue
The problem here comes from the RelationshipLoader.batch_load_fn()
(link). As the function comment explains, that function has to "piggyback on some internal APIs" of sqlalchemy. As a result (I believe), the necessary greenlet_spawn
function never gets called, and the above error is thrown. (Greenlets, AFAICT, are small thread-like objects that allow synchronous requests to be performed in an asynchronous environment. In this situation, the dataloader is what actually materializes the expected attributes, so it is necessary to make these calls synchronously)
Solution
So on to the fix: we just need to call greenlet_spawn
ourselves. This can be done by simply redefining the batch_load_fn
attribute on the RelationshipLoader
class, so that all invocations call the corrected function rather than the incorrect code. My stripped down code is below (I removed some extraneous comments/checks/conditions from batch_load_fn
for simplicitly, make sure that you compare to the original code and update the correct path for your situation):
from graphene_sqlalchemy.batching import RelationshipLoader
from sqlalchemy.orm import Session
from sqlalchemy.util import immutabledict
from sqlalchemy.util._concurrency_py3k import greenlet_spawn
def patch_relationship_loader() -> None:
"""
CALL THIS FUNCTION ONCE AS PART OF SETUP CODE
"""
RelationshipLoader.batch_load_fn = batch_load_fn
async def batch_load_fn(self: RelationshipLoader, parents: Any) -> list[Any]:
"""
FOLLOWS THE `SQL_VERSION_HIGHER_EQUAL_THAN_2` PATH
"""
child_mapper = self.relationship_prop.mapper
parent_mapper = self.relationship_prop.parent
session = Session.object_session(parents[0])
for parent in parents:
assert session is Session.object_session(parent)
assert session and parent not in session.dirty
states = [(sqlalchemy.inspect(parent), True) for parent in parents]
query_context = None
if session:
parent_mapper_query = session.query(parent_mapper.entity)
query_context = parent_mapper_query._compile_context()
# CALL greenlet_spawn HERE RATHER THAN CALLING _load_for_path DIRECTLY
await greenlet_spawn(
self.selectin_loader._load_for_path,
query_context,
parent_mapper._path_registry,
states,
None,
child_mapper,
None,
None, # recursion depth can be none
immutabledict(), # default value for selectinload->lazyload
)
result = [
getattr(parent, self.relationship_prop.key) for parent in parents
]
return result
Conclusion
I don't have enough knowledge of graphene-sqlalchemy or other database drivers to know what a universal solution would look like (or even if this is a universal problem - the dearth of information about this issue suggests not), so I don't want to submit a PR for this change, but at least in our situation this was the best fix. Hope it helps someone else out there!