Skip to content

[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 21, 2025

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Apr 14, 2025

Summary & Motivation

resolver on GrapheneAsset that returns the latest storage id of any event associated with that asset. This lets us sort assets by the time they were last "modified" (planned, materialized, failed, observed).

Figured i'd do storage id since its 1) monotonically increasing and 2) already available for planned events on AssetEntry (we don't store the full event or the event timestamp for planned events)

Some perf stats pulled from shadow-gql against dogfood-test-1 as a place to start:
AssetCatalogTableQuery with limit of 10000 as used in the UI (which is effectively fetching all assets in dogfood-test-1)
master: 43s, 37s, 37s
this branch: 39s, 42s, 38s

AssetCatalogTableQuery with limit of 10000 times as reported in the network tab:
dogfood-test-1: 9.6s, 8.2s, 9.3s
elementl prod: 6.3s, 4.9s, 8.8s
documenting these if we want to compare after this branch merges and the AssetCatalogTableQuery is updated, but it'll be easier to do more direct comparison of the AssetCatalogTableQuery with and without this new resolver once it lands

How I Tested These Changes

Changelog

Insert changelog entry or delete this section.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@jamiedemaria jamiedemaria marked this pull request as ready for review April 14, 2025 19:50
@jamiedemaria
Copy link
Contributor Author

@gibsondan @salazarm lmk what you think. i'll add tests if this seems like the right approach

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the high level strategy here of fetching from the asset record makes a lot of sense to me. I'm surprised that this is the first time this class has needed to do that, i guess most of the other places we do it is in GrapheneAssetNode?

We might want to abstract away the fact that it's a storage ID under the hood as I know we've been trying to move away from having that in our public API (to give us some more flexibility to change how things are stored in the future). That could be like a latestEventSortKey or something? @shalabhc might have thoughts there as I know he's been thinking about this class of issues

I think the main thing would be to double check that this doesn't result in additional surprise data fetching of the asset record in the queries where we are going to use it - I think AssetRecord loader caching should prevent that from happening?

@jamiedemaria jamiedemaria requested a review from salazarm April 14, 2025 21:54
@jamiedemaria
Copy link
Contributor Author

That could be like a latestEventSortKey or something?

Yeah, happy to rename, pinning us to storage id is not ideal

I think the main thing would be to double check that this doesn't result in additional surprise data fetching of the asset record in the queries where we are going to use it - I think AssetRecord loader caching should prevent that from happening?

I'm not 100% on how the AssetRecord loader works wrt caching, do you know who the right person to ask about it is? maybe @alangenfeld?

@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from d8ea8c9 to a3aff0c Compare April 15, 2025 14:54
Copy link

github-actions bot commented Apr 15, 2025

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-bwigg8lsk-elementl.vercel.app
https://jamie-order-by-resolver.core-storybook.dagster-docs.io

Built with commit b43b82f.
This pull request is being automatically deployed with vercel-action

@jamiedemaria jamiedemaria changed the title [gql] latest event id resolver for Asset so that we can order by id [gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id Apr 15, 2025
@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from e79234a to 713d737 Compare April 15, 2025 16:19
@jamiedemaria jamiedemaria requested a review from gibsondan April 15, 2025 16:46
@jamiedemaria
Copy link
Contributor Author

@gibsondan - i added tests and did the rename. this is good for another round of review while i figure out the caching thing

@alangenfeld
Copy link
Member

So the caching will make it so any call to AssetRecord.gen() with the same request context and key wont result in multiple keys. If we are creating AssetRecords in the request in some other way they wont be in the cache unless we put them there. For example direct instance.get_asset_record currently don't populate the cache and would need to update to be fetched via AssetRecord.gen_many

@alangenfeld
Copy link
Member

To put another way, the current setup will ensure that the calls get batched in to one if they happen across a list of assets.

But if we are already fetching the records directly from the instance in the request it wont dedupe against those without updating those callsites. I think it shouldn't be hard to update graphql callsites to gen_many since its effectively the same signature and resolvers can be made async.

@jamiedemaria
Copy link
Contributor Author

ok - we aren't fetching asset records via the instance in GrapheneAsset so i think that means we're good right? no risk of fetching the same thing twice if we weren't fetching it before this PR

Copy link
Member

were we fetching asset records at all though, via any method? I don't think they're super expensive but i could imagine some perf difference from going from zero asset records to one record per asset in the query

@jamiedemaria
Copy link
Contributor Author

Like in any gql resolver at all?

@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from 713d737 to 946c9f9 Compare April 16, 2025 19:29
@jamiedemaria
Copy link
Contributor Author

@gibsondan pinging for another review pass.

In terms of fetching asset records in the GQL layer - i didn't find any calls of get_asset_records when i grep-ed dagster_graphql (oss or internal). on GrapheneAssetNode there are some AssetRecord.gen callsites

@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch 3 times, most recently from fbc4689 to a9df0eb Compare April 17, 2025 15:41
@gibsondan gibsondan requested a review from prha April 18, 2025 14:39
Copy link
Member

not in any gql resolver at all, but rather the existing queries that you're planning to add this field to. We may want to profile them before and after this change to see if there's any perf impact

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine with one small change, we should make this an ID (basically a string) insted of a BigInt

@@ -247,6 +248,7 @@ class GrapheneAsset(graphene.ObjectType):
cursor=graphene.String(),
)
definition = graphene.Field("dagster_graphql.schema.asset_graph.GrapheneAssetNode")
latestEventSortKey = graphene.Field(graphene.BigInt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigInt is not safe in javascript unfortunately, it gets truncated. We should make this an ID instead (see #25673 as an example)

@prha
Copy link
Member

prha commented Apr 18, 2025

Yeah, this looks good to me.

I do think it's somewhat inevitable that we will want to sort/paginate by this so that we can don't have to load everything in the frontend. Which means that we will need a combined timestamp column on the table that we can do sql order bys on.

But I think this feels sufficient to avoid tackling the data migration problem for now.

@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from a9df0eb to c7cca75 Compare April 18, 2025 18:53
@jamiedemaria jamiedemaria requested a review from gibsondan April 21, 2025 13:41
@jamiedemaria jamiedemaria dismissed gibsondan’s stale review April 21, 2025 13:42

added some preliminary stats to the pr description

@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from c7cca75 to 64adce8 Compare April 21, 2025 16:28
@jamiedemaria jamiedemaria force-pushed the jamie/order-by-resolver branch from 64adce8 to b43b82f Compare April 21, 2025 16:29
Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssetCatalogTableQuery would already be fetching the asset record since it includes assets that aren't in hte graph, so it makes sense that that wouldn't change. i bet the asset graph queries also end up fetching the record for things like loading teh last materialization?

@jamiedemaria jamiedemaria merged commit 54a323d into master Apr 21, 2025
6 checks passed
@jamiedemaria jamiedemaria deleted the jamie/order-by-resolver branch April 21, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants