PowerTools

PowerTools is a utility library designed to simplify and enhance your experience with Python, Apache Spark, and AWS Glue Spark. It provides a collection of tools and functions to streamline your data processing workflows.

Installation

You can install PowerTools using pip:

pip install powertools

Usage

Quick Start

from lps_glue import LPSGlue

with LPSGlue(spark_shell=True) as lpsglue:
    df = lpsglue.read.csv(path)   # Read data from CSV
    df = lpsglue.tran.add_column(df, 'example_col1', f.lit('example'))  # Add column
    lpsglue.write.hudi(
        df=df,
        path=path,
        primary_key='pk1',
        partition_by=["part1", "part2"]
        order_by='ts',
        dedup=False
    ) # Write df in HUDI format

Python Utilities

*Work In Progress:*

data manipulation using pandas
parallelization using concurrent.futures
and more. Stay tuned for updates!

Spark Utilities

*Coming Soon*

Glue Spark Utilities

There are 5 main modules available in Glue Spark Utilities.

1. Read

Read data in ANY format using Spark without dependencies installation.

CSV

  lpsglue.read.csv(path=filename)

PARQUET

  lpsglue.read.parquet(path=filename)

HUDI

  lpsglue.read.hudi(path=filename)

DELTA LAKE

  lpsglue.read.delta(path=filename)

2. Tran

3. Write

4. Log

5. AWS

Contributing

We welcome contributions to PowerTools! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on our GitHub repository.

License

PowerTools is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src/powertools/glue_spark		src/powertools/glue_spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PowerTools

Table of Contents

Installation

Usage

Quick Start

Python Utilities

Spark Utilities

Glue Spark Utilities

1. Read

CSV

PARQUET

HUDI

DELTA LAKE

2. Tran

3. Write

4. Log

5. AWS

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jeffreykky/powertools

Folders and files

Latest commit

History

Repository files navigation

PowerTools

Table of Contents

Installation

Usage

Quick Start

Python Utilities

Spark Utilities

Glue Spark Utilities

1. Read

CSV

PARQUET

HUDI

DELTA LAKE

2. Tran

3. Write

4. Log

5. AWS

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages