[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264

jamiedemaria · 2025-04-14T19:45:15Z

Summary & Motivation

resolver on GrapheneAsset that returns the latest storage id of any event associated with that asset. This lets us sort assets by the time they were last "modified" (planned, materialized, failed, observed).

Figured i'd do storage id since its 1) monotonically increasing and 2) already available for planned events on AssetEntry (we don't store the full event or the event timestamp for planned events)

Some perf stats pulled from shadow-gql against dogfood-test-1 as a place to start:
AssetCatalogTableQuery with limit of 10000 as used in the UI (which is effectively fetching all assets in dogfood-test-1)
master: 43s, 37s, 37s
this branch: 39s, 42s, 38s

AssetCatalogTableQuery with limit of 10000 times as reported in the network tab:
dogfood-test-1: 9.6s, 8.2s, 9.3s
elementl prod: 6.3s, 4.9s, 8.8s
documenting these if we want to compare after this branch merges and the AssetCatalogTableQuery is updated, but it'll be easier to do more direct comparison of the AssetCatalogTableQuery with and without this new resolver once it lands

How I Tested These Changes

Changelog

Insert changelog entry or delete this section.

jamiedemaria · 2025-04-14T19:45:32Z

[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

jamiedemaria · 2025-04-14T19:50:36Z

@gibsondan @salazarm lmk what you think. i'll add tests if this seems like the right approach

gibsondan

the high level strategy here of fetching from the asset record makes a lot of sense to me. I'm surprised that this is the first time this class has needed to do that, i guess most of the other places we do it is in GrapheneAssetNode?

We might want to abstract away the fact that it's a storage ID under the hood as I know we've been trying to move away from having that in our public API (to give us some more flexibility to change how things are stored in the future). That could be like a latestEventSortKey or something? @shalabhc might have thoughts there as I know he's been thinking about this class of issues

I think the main thing would be to double check that this doesn't result in additional surprise data fetching of the asset record in the queries where we are going to use it - I think AssetRecord loader caching should prevent that from happening?

jamiedemaria · 2025-04-14T22:00:33Z

That could be like a latestEventSortKey or something?

Yeah, happy to rename, pinning us to storage id is not ideal

I think the main thing would be to double check that this doesn't result in additional surprise data fetching of the asset record in the queries where we are going to use it - I think AssetRecord loader caching should prevent that from happening?

I'm not 100% on how the AssetRecord loader works wrt caching, do you know who the right person to ask about it is? maybe @alangenfeld?

github-actions · 2025-04-15T14:58:17Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-bwigg8lsk-elementl.vercel.app
https://jamie-order-by-resolver.core-storybook.dagster-docs.io

Built with commit b43b82f.
This pull request is being automatically deployed with vercel-action

jamiedemaria · 2025-04-15T16:47:25Z

@gibsondan - i added tests and did the rename. this is good for another round of review while i figure out the caching thing

alangenfeld · 2025-04-15T17:01:22Z

So the caching will make it so any call to AssetRecord.gen() with the same request context and key wont result in multiple keys. If we are creating AssetRecords in the request in some other way they wont be in the cache unless we put them there. For example direct instance.get_asset_record currently don't populate the cache and would need to update to be fetched via AssetRecord.gen_many

alangenfeld · 2025-04-15T17:06:14Z

To put another way, the current setup will ensure that the calls get batched in to one if they happen across a list of assets.

But if we are already fetching the records directly from the instance in the request it wont dedupe against those without updating those callsites. I think it shouldn't be hard to update graphql callsites to gen_many since its effectively the same signature and resolvers can be made async.

jamiedemaria · 2025-04-15T17:38:06Z

ok - we aren't fetching asset records via the instance in GrapheneAsset so i think that means we're good right? no risk of fetching the same thing twice if we weren't fetching it before this PR

gibsondan · 2025-04-15T17:43:38Z

were we fetching asset records at all though, via any method? I don't think they're super expensive but i could imagine some perf difference from going from zero asset records to one record per asset in the query

jamiedemaria · 2025-04-15T17:46:19Z

Like in any gql resolver at all?

jamiedemaria · 2025-04-16T19:32:23Z

@gibsondan pinging for another review pass.

In terms of fetching asset records in the GQL layer - i didn't find any calls of get_asset_records when i grep-ed dagster_graphql (oss or internal). on GrapheneAssetNode there are some AssetRecord.gen callsites

gibsondan · 2025-04-18T14:40:33Z

not in any gql resolver at all, but rather the existing queries that you're planning to add this field to. We may want to profile them before and after this change to see if there's any perf impact

gibsondan

seems fine with one small change, we should make this an ID (basically a string) insted of a BigInt

gibsondan · 2025-04-18T14:41:52Z

python_modules/dagster-graphql/dagster_graphql/schema/pipelines/pipeline.py

@@ -247,6 +248,7 @@ class GrapheneAsset(graphene.ObjectType):
        cursor=graphene.String(),
    )
    definition = graphene.Field("dagster_graphql.schema.asset_graph.GrapheneAssetNode")
+    latestEventSortKey = graphene.Field(graphene.BigInt)


BigInt is not safe in javascript unfortunately, it gets truncated. We should make this an ID instead (see #25673 as an example)

prha · 2025-04-18T16:17:10Z

Yeah, this looks good to me.

I do think it's somewhat inevitable that we will want to sort/paginate by this so that we can don't have to load everything in the frontend. Which means that we will need a combined timestamp column on the table that we can do sql order bys on.

But I think this feels sufficient to avoid tackling the data migration problem for now.

added some preliminary stats to the pr description

gibsondan

AssetCatalogTableQuery would already be fetching the asset record since it includes assets that aren't in hte graph, so it makes sense that that wouldn't change. i bet the asset graph queries also end up fetching the record for things like loading teh last materialization?

jamiedemaria marked this pull request as ready for review April 14, 2025 19:50

gibsondan reviewed Apr 14, 2025

View reviewed changes

jamiedemaria requested a review from salazarm April 14, 2025 21:54

jamiedemaria force-pushed the jamie/order-by-resolver branch from d8ea8c9 to a3aff0c Compare April 15, 2025 14:54

jamiedemaria changed the title ~~[gql] latest event id resolver for Asset so that we can order by id~~ [gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id Apr 15, 2025

jamiedemaria force-pushed the jamie/order-by-resolver branch from e79234a to 713d737 Compare April 15, 2025 16:19

jamiedemaria requested a review from gibsondan April 15, 2025 16:46

jamiedemaria force-pushed the jamie/order-by-resolver branch from 713d737 to 946c9f9 Compare April 16, 2025 19:29

jamiedemaria force-pushed the jamie/order-by-resolver branch 3 times, most recently from fbc4689 to a9df0eb Compare April 17, 2025 15:41

gibsondan requested a review from prha April 18, 2025 14:39

gibsondan previously requested changes Apr 18, 2025

View reviewed changes

jamiedemaria force-pushed the jamie/order-by-resolver branch from a9df0eb to c7cca75 Compare April 18, 2025 18:53

jamiedemaria requested a review from gibsondan April 21, 2025 13:41

jamiedemaria force-pushed the jamie/order-by-resolver branch from c7cca75 to 64adce8 Compare April 21, 2025 16:28

jamiedemaria added 11 commits April 21, 2025 12:29

[gql] latest event id resolver for Asset so that we can order by id

1b2eefe

update name

eb9a975

tests

6696f3a

snapshots

7ec7e97

snap

12d171f

wip to make the test better

c201abe

update test

4b1410a

fix it

ba55e71

make bigint

0c7ecb7

use id

5b01815

fix test

b43b82f

jamiedemaria force-pushed the jamie/order-by-resolver branch from 64adce8 to b43b82f Compare April 21, 2025 16:29

gibsondan reviewed Apr 21, 2025

View reviewed changes

gibsondan approved these changes Apr 21, 2025

View reviewed changes

jamiedemaria merged commit 54a323d into master Apr 21, 2025
6 checks passed

jamiedemaria deleted the jamie/order-by-resolver branch April 21, 2025 19:13

[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264

[gql] latestEventSortKey resolver for GrapheneAsset so that we can order by id #29264

Uh oh!

Conversation

jamiedemaria commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary & Motivation

How I Tested These Changes

Changelog

Uh oh!

jamiedemaria commented Apr 14, 2025

Uh oh!

jamiedemaria commented Apr 14, 2025

Uh oh!

gibsondan left a comment

Choose a reason for hiding this comment

Uh oh!

jamiedemaria commented Apr 14, 2025

Uh oh!

github-actions bot commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamiedemaria commented Apr 15, 2025

Uh oh!

alangenfeld commented Apr 15, 2025

Uh oh!

alangenfeld commented Apr 15, 2025

Uh oh!

jamiedemaria commented Apr 15, 2025

Uh oh!

gibsondan commented Apr 15, 2025

Uh oh!

jamiedemaria commented Apr 15, 2025

Uh oh!

jamiedemaria commented Apr 16, 2025

Uh oh!

gibsondan commented Apr 18, 2025

Uh oh!

gibsondan left a comment

Choose a reason for hiding this comment

Uh oh!

gibsondan Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

prha commented Apr 18, 2025

Uh oh!

gibsondan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jamiedemaria commented Apr 14, 2025 •

edited

Loading

github-actions bot commented Apr 15, 2025 •

edited

Loading