Skip to content

jeffreykky/powertools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PowerTools

PowerTools is a utility library designed to simplify and enhance your experience with Python, Apache Spark, and AWS Glue Spark. It provides a collection of tools and functions to streamline your data processing workflows.

Table of Contents

Installation

You can install PowerTools using pip:

pip install powertools

Usage

Quick Start

from lps_glue import LPSGlue

with LPSGlue(spark_shell=True) as lpsglue:
    df = lpsglue.read.csv(path)   # Read data from CSV
    df = lpsglue.tran.add_column(df, 'example_col1', f.lit('example'))  # Add column
    lpsglue.write.hudi(
        df=df,
        path=path,
        primary_key='pk1',
        partition_by=["part1", "part2"]
        order_by='ts',
        dedup=False
    ) # Write df in HUDI format

Python Utilities

*Work In Progress:*
  1. data manipulation using pandas
  2. parallelization using concurrent.futures
  3. and more. Stay tuned for updates!

Spark Utilities

*Coming Soon*

Glue Spark Utilities

There are 5 main modules available in Glue Spark Utilities.

1. Read

Read data in ANY format using Spark without dependencies installation.

CSV

  lpsglue.read.csv(path=filename)
PARQUET
  lpsglue.read.parquet(path=filename)

HUDI

  lpsglue.read.hudi(path=filename)

DELTA LAKE

  lpsglue.read.delta(path=filename)

2. Tran

3. Write

4. Log

5. AWS

Contributing

We welcome contributions to PowerTools! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on our GitHub repository.

License

PowerTools is licensed under the MIT License.

About

Python Power Tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages