Skip to content

Files

Latest commit

 Cannot retrieve latest commit at this time.

History

History
53 lines (31 loc) · 2.66 KB

File metadata and controls

53 lines (31 loc) · 2.66 KB

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

One of the key features of Hudi is its support for incremental data processing. This means that Hudi can efficiently process only the changes that have occurred since the last time data was processed, rather than processing the entire dataset every time. This can result in significant performance improvements and reduced processing times.

Let's move on to learning how to use Hudi Incremental Data Processing to power downstream systems. Search applications like Elasticsearch, relational databases, and non-relational databases are examples of downstream systems.

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes. The code logic can be shown in the following flow chart:

Please fork the repository and submit a merge request if you notice any flaws or ideas to improve the template.

Videos


How To Use

Snap (1)


Code Logic

incremental drawio

PlantUML

image

  • NOTE| Make sure your Enviroment varibales are set for AWS Access and secret keys

Demo

Performing some inserts

image

Running template

image

Metadata File on S3

image

Metadata File on S3

image

Performing one more insert

image

Running template

image