Skip to content

GraphQL Api returns deleted assets #20214

Open
@mkleinbort-ic

Description

@mkleinbort-ic

What's the issue or suggestion?

When using the GraphQL api to get the assets, I see many assets that were defined at one point, but have long been removed from the asset definitions (the code that defined them was deleted).

I don't see a way to filter them out? Or a process to clean them...

The only way I can infer that they are gone is that they "dependencies" field is Null instead of an empty list.

Additional information

This is how I'm fetching the data:

import asyncio
from gql import Client, gql
from gql.transport.aiohttp import AIOHTTPTransport
import polars as pl 


async def query_dagster(query:str, timeout=10):
    transport = AIOHTTPTransport(url=DAGSTER_URL)

    async with Client(
        transport=transport,
        fetch_schema_from_transport=True,
        execute_timeout = timeout
    ) as session:
        gql_query = gql(query)
        result = await session.execute(gql_query)
        return result

def get_all_dagster_assets(timeout:int=60) -> pl.DataFrame:
    '''Return a dataframe with all Dagster asset'''
    def fetch_raw_data(timeout:int=timeout) -> pl.DataFrame:
        q = """ 
                query {
                assetsOrError {
                    __typename
                    ... on AssetConnection {
                    nodes {
                        key { path }
                        definition {
                        description
                        dataVersion
                        groupName
                        type {
                            ... on RegularDagsterType {displayName}
                        }
                        dependedBy {
                            asset {
                            assetKey { path }
                            }
                        }
                        dependencies {
                            asset {
                            assetKey { path }
                            }
                        }
                        }

                        assetMaterializations(limit: 1) {
                        runId
                        timestamp
                        metadataEntries {
                            label
                        }
                        }
                    }
                    }
                    ... on PythonError {
                    message
                    }
                }
                }

                """

        assetInfo = asyncio.run(query_dagster(q, timeout=timeout))

        node_list: list[dict] = assetInfo["assetsOrError"]["nodes"]

        df = pl.DataFrame(node_list)
        return df

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions