-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@gibsondan @salazarm lmk what you think. i'll add tests if this seems like the right approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the high level strategy here of fetching from the asset record makes a lot of sense to me. I'm surprised that this is the first time this class has needed to do that, i guess most of the other places we do it is in GrapheneAssetNode?
We might want to abstract away the fact that it's a storage ID under the hood as I know we've been trying to move away from having that in our public API (to give us some more flexibility to change how things are stored in the future). That could be like a latestEventSortKey or something? @shalabhc might have thoughts there as I know he's been thinking about this class of issues
I think the main thing would be to double check that this doesn't result in additional surprise data fetching of the asset record in the queries where we are going to use it - I think AssetRecord loader caching should prevent that from happening?
Yeah, happy to rename, pinning us to storage id is not ideal
I'm not 100% on how the AssetRecord loader works wrt caching, do you know who the right person to ask about it is? maybe @alangenfeld? |
d8ea8c9
to
a3aff0c
Compare
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit b43b82f. |
e79234a
to
713d737
Compare
@gibsondan - i added tests and did the rename. this is good for another round of review while i figure out the caching thing |
So the caching will make it so any call to |
To put another way, the current setup will ensure that the calls get batched in to one if they happen across a list of assets. But if we are already fetching the records directly from the instance in the request it wont dedupe against those without updating those callsites. I think it shouldn't be hard to update graphql callsites to |
ok - we aren't fetching asset records via the instance in |
were we fetching asset records at all though, via any method? I don't think they're super expensive but i could imagine some perf difference from going from zero asset records to one record per asset in the query |
Like in any gql resolver at all? |
713d737
to
946c9f9
Compare
@gibsondan pinging for another review pass. In terms of fetching asset records in the GQL layer - i didn't find any calls of |
fbc4689
to
a9df0eb
Compare
not in any gql resolver at all, but rather the existing queries that you're planning to add this field to. We may want to profile them before and after this change to see if there's any perf impact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine with one small change, we should make this an ID (basically a string) insted of a BigInt
@@ -247,6 +248,7 @@ class GrapheneAsset(graphene.ObjectType): | |||
cursor=graphene.String(), | |||
) | |||
definition = graphene.Field("dagster_graphql.schema.asset_graph.GrapheneAssetNode") | |||
latestEventSortKey = graphene.Field(graphene.BigInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigInt is not safe in javascript unfortunately, it gets truncated. We should make this an ID instead (see #25673 as an example)
Yeah, this looks good to me. I do think it's somewhat inevitable that we will want to sort/paginate by this so that we can don't have to load everything in the frontend. Which means that we will need a combined timestamp column on the table that we can do sql order bys on. But I think this feels sufficient to avoid tackling the data migration problem for now. |
a9df0eb
to
c7cca75
Compare
added some preliminary stats to the pr description
c7cca75
to
64adce8
Compare
64adce8
to
b43b82f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssetCatalogTableQuery would already be fetching the asset record since it includes assets that aren't in hte graph, so it makes sense that that wouldn't change. i bet the asset graph queries also end up fetching the record for things like loading teh last materialization?
Summary & Motivation
resolver on
GrapheneAsset
that returns the latest storage id of any event associated with that asset. This lets us sort assets by the time they were last "modified" (planned, materialized, failed, observed).Figured i'd do storage id since its 1) monotonically increasing and 2) already available for planned events on AssetEntry (we don't store the full event or the event timestamp for planned events)
Some perf stats pulled from shadow-gql against
dogfood-test-1
as a place to start:AssetCatalogTableQuery
with limit of 10000 as used in the UI (which is effectively fetching all assets in dogfood-test-1)master
: 43s, 37s, 37sthis branch: 39s, 42s, 38s
AssetCatalogTableQuery
with limit of 10000 times as reported in the network tab:dogfood-test-1
: 9.6s, 8.2s, 9.3selementl
prod: 6.3s, 4.9s, 8.8sdocumenting these if we want to compare after this branch merges and the
AssetCatalogTableQuery
is updated, but it'll be easier to do more direct comparison of theAssetCatalogTableQuery
with and without this new resolver once it landsHow I Tested These Changes
Changelog