An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

One of the key features of Hudi is its support for incremental data processing. This means that Hudi can efficiently process only the changes that have occurred since the last time data was processed, rather than processing the entire dataset every time. This can result in significant performance improvements and reduced processing times.

Let's move on to learning how to use Hudi Incremental Data Processing to power downstream systems. Search applications like Elasticsearch, relational databases, and non-relational databases are examples of downstream systems.

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes. The code logic can be shown in the following flow chart:

Please fork the repository and submit a merge request if you notice any flaws or ideas to improve the template.

Videos

https://www.youtube.com/watch?v=c6DCJR91rBQ&t=105s
https://www.youtube.com/watch?v=Ls--9CnweoY

How To Use

Code Logic

PlantUML

NOTE| Make sure your Enviroment varibales are set for AWS Access and secret keys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README.md

README.md

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

Videos

How To Use

Code Logic

PlantUML

Demo

Performing some inserts

Running template

Metadata File on S3

Metadata File on S3

Performing one more insert

Running template

Files

README.md

Latest commit

History

README.md

File metadata and controls

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

Videos

How To Use

Code Logic

PlantUML

Demo

Performing some inserts

Running template

Metadata File on S3

Metadata File on S3

Performing one more insert

Running template