GitHub - MobileTeleSystems/data-rentgen: NextGen DataMotion Lineage

What is Data.Rentgen?

Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.

Note: service is under active development, and is not ready to use yet.

Goals

Collect lineage events produced by OpenLineage clients & integrations.
Store operation-grained events for better detalization (instead of job grained Marquez).
Provide API for fetching job/run ↔ dataset lineage, not dataset ↔ dataset lineage (like Datahub and OpenMetadata).

Features

Support consuming large amounts of lineage events, use Apache Kafka as event buffer.
Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.
Lineage graph is build with user-specified time boundaries (unlike Marquez where lineage is build only for last job run).
Lineage graph can be build with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.
Column-level lineage support.
Authentication support.

Non-goals

This is not a Data Catalog, DataRentgen doesn't track dataset schema change, owner and so on. Use Datahub or OpenMetadata instead.
Static Data Lineage like view → table is not supported.

Limitations

For now, only Apache Spark, Apache Airflow, Apache Flink and DBT are supported as lineage event sources. OpenLineage also supports Hive, Trino and other lineage sources. DataRentgen support may be added later.
Unlike Marquez, DataRentgen parses only limited set of facets send by OpenLineage, and doesn't store custom facets. This can be changed in future.

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
.github		.github
data_rentgen		data_rentgen
docker		docker
docs		docs
tests		tests
.dockerignore		.dockerignore
.env.docker		.env.docker
.env.local		.env.local
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.spdx-license-header.txt		.spdx-license-header.txt
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.rst		README.rst
SECURITY.rst		SECURITY.rst
codecov.yml		codecov.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is Data.Rentgen?

Goals

Features

Non-goals

Limitations

Documentation

Screenshots

Lineage graph

Datasets

Runs

Spark application

Spark run

Spark operation

Airflow DagRun

Airflow TaskInstance

About

Uh oh!

Releases 3

Uh oh!

Contributors 4

Uh oh!

Languages

License

MobileTeleSystems/data-rentgen

Folders and files

Latest commit

History

Repository files navigation

What is Data.Rentgen?

Goals

Features

Non-goals

Limitations

Documentation

Screenshots

Lineage graph

Datasets

Runs

Spark application

Spark run

Spark operation

Airflow DagRun

Airflow TaskInstance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 4

Uh oh!

Languages