Skip to content

JobEvent with outputs populated fails to write with nullPointerException #2925

@seanmullane

Description

@seanmullane

Emitting a JobEvent with input and/or output datasets causes a HTTP500 error in the API, which results from a nullPointerException in Marquez.

Fixing this is important to allow static lineage graphs to be able to be generated without being associated with active runs. This is useful in cases where an integration is not yet available to consume pipeline runs for a given system or where a pipeline is not yet fleshed out but we want to enter the job in Marquez to see how it would relate to other jobs.

The attached code includes a purely json version generated the OpenLineage client which can prompt the bug in Marquez. I also included the python code the json derives from and the Marquez error log.

Environment:

Marquez 0.49.0 running via docker-compose per the Marquez example with --seed
openlineage-python 1.22.0
python 3.11.9

nullPointerException.txt
reproduce_bug.zip

More detail on this from phix on Slack:

It looks like we’re not processing the “outputFacets” on the IO fields without a runId provided. The event should save if you drop that field that’s the empty object for now… We should take a look at the OL spec for this

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions