Skip to content

[Feature request] OpenLineage integration produces columnLineage facet without transformations #25568

Open
@dolfinus

Description

@dolfinus

According to OpenLineage spec, columnLineage facet contains transformations field:
https://openlineage.io/docs/spec/facets/dataset-facets/column_lineage_facet/
https://github.com/OpenLineage/OpenLineage/blob/89ac77f816216465e4a8ee2b80e452e4089cc9b4/spec/tests/ColumnLineageDatasetFacet/2.json

For query like this:

insert into def(id,value)
select cde.id, abc.value
from cde
join abc on cde.id = abc.id
where abc.id < 10

It should produce OL event with facet like this:

{
    "columnLineage": {
        "_producer": "https://github.com/trinodb/trino/plugin/trino-openlineage",
        "_schemaURL": "https://openlineage.io/spec/facets/1-2-0/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet",
        "fields": {
            "id": {
                "inputFields": [
                    {
                        "namespace": "trino://localhost:8080",
                        "name": "pg.public.cde",
                        "field": "id",
                        "transformations": [
                            {
                                "type": "DIRECT",
                                "subtype": "IDENTITY",
                                "description": "",
                                "masking": false
                            }
                        ]
                    }
                ]
            },
            "value": {
                "inputFields": [
                    {
                        "namespace": "trino://localhost:8080",
                        "name": "pg.public.abc",
                        "field": "value",
                        "transformations": [
                            {
                                "type": "DIRECT",
                                "subtype": "IDENTITY",
                                "description": "",
                                "masking": false
                            }
                        ]
                    }
                ]
            }
        },
        "dataset": [
            {
                "namespace": "trino://localhost:8080",
                "name": "pg.public.abc",
                "field": "id",
                "transformations": [
                    {
                        "type": "INDIRECT",
                        "subtype": "JOIN",
                        "description": ""
                    },
                    {
                        "type": "INDIRECT",
                        "subtype": "FILTER",
                        "description": ""
                    }
                ]
            },
            {
                "namespace": "trino://localhost:8080",
                "name": "pg.public.cde",
                "field": "id",
                "transformations": [
                    {
                        "type": "INDIRECT",
                        "subtype": "JOIN",
                        "description": ""
                    }
                ]
            }
        ]
    }
}

But instead it produces event like this:

{
    "columnLineage": {
        "_producer": "https://github.com/trinodb/trino/plugin/trino-openlineage",
        "_schemaURL": "https://openlineage.io/spec/facets/1-2-0/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet",
        "fields": {
            "id": {
                "inputFields": [
                    {
                        "namespace": "trino://localhost:8080",
                        "name": "pg.public.cde",
                        "field": "id"
                    }
                ]
            },
            "value": {
                "inputFields": [
                    {
                        "namespace": "trino://localhost:8080",
                        "name": "pg.public.abc",
                        "field": "value"
                    }
                ]
            }
        },
        "dataset": [
            {
                "namespace": "trino://localhost:8080",
                "name": "pg.public.cde",
                "field": "id"
            },
            {
                "namespace": "trino://localhost:8080",
                "name": "pg.public.abc",
                "field": "id"
            },
            {
                "namespace": "trino://localhost:8080",
                "name": "pg.public.abc",
                "field": "value"
            }
        ]
    }
}

ol_compex_query_start.json
ol_complex_query_complete.json

transformations attribute is missing, and some fields are included to columnLineage.dataset facet although only some of them are a part of INDIRECT column lineage (used in JOIN, WHERE, and not included to final dataset).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions