Skip to content

[BUG] Symlink facet create an empty namespace with 0 dataset in it #2645

@dkt-sophie-ly

Description

@dkt-sophie-ly

Hi all !

According to our conversation on slack i'm opening this issue.

image

When a symlink is specified an empty namespace is created with no dataset in it.

Way to reproduce the issue:

I'm sending this event to marquez:

{
  "job": {
    "name": "symlink_job_0",
    "facets": {},
    "namespace": "test_namespace"
  },
  "run": {
    "runId": "3c50b3d8-6776-11ee-9331-00163e015999",
    "facets": {}
  },
  "inputs": [
    {
      "name": "dataset_0",
      "facets": {
        **"symlinks": {
          "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/client/python",
          "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json",
          "identifiers": [
            {
              "name": "symlink_prefix",
              "type": "DB_TABLE",
              "namespace": "s3://symlynk_test"
            }
          ]
        }**,
        "SchemaDatasetFacet": {...}
      },
      "namespace": "test_namespace"
    }
  ],
  "outputs": [...],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.17.0/client/python",
  "eventTime": "2023-10-10T14:06:41.079514Z",
  "eventType": "START",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
}

Current behavior:

It creates an empty namespace named 's3://symlynk_test' with no dataset in it.

image

Expected behavior:

Not sure how it should be done but in our example dataset s3://symlynk_test.symlink_prefix and test_namespace.dataset_0 must be the same dataset so they should have the same lineage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions