My PySpark Reference Repo

This repository is a complete reference for everyday PySpark functions and workflows, built and tested on Databricks. It contains code snippets and use cases you’ll often encounter when working with PySpark in real-world data engineering scenarios.

🚀 What You’ll Find Inside

Data Reading: Load data from multiple formats (CSV, JSON, etc.)
Schema Definitions: Explicit schemas for better control
Transformations including Joins, aggregations, window functions, user-defined functions (UDFs)
Data Writing: Save data back in multiple formats
Spark SQL: Querying data with SQL syntax
Practical Use Cases: Examples of transformations and everyday PySpark operations

⚙️ Requirements

This repo is designed to run on Databricks Notebooks, but you can also run it locally with PySpark.

If running locally, install PySpark with:

pip install pyspark

▶️ How to Use

Option 1: On Databricks

Clone or import this repo into your Databricks workspace.
Attach your notebook to a running cluster.
Run the cells to explore PySpark functions.

Option 2: Locally

Clone the repo:

git clone https://github.com/NdukaClara/my_pyspark_reference_repo.git
```

Run scripts in Jupyter Notebook or your IDE (with PySpark installed).

📂 Repo Structure

my_pyspark_reference_repo/
│── notebooks/           # Databricks notebooks with examples
│── scripts/             # Python scripts for each concept
│── data/                # Sample datasets (if included)
│── README.md            # Project documentation

💡 Why This Repo?

Instead of digging through documentation every time, this repo serves as a one-stop reference for PySpark on Databricks, perfect for learners and practitioners who want quick, practical examples.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
full_course_code.ipynb		full_course_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My PySpark Reference Repo

🚀 What You’ll Find Inside

⚙️ Requirements

▶️ How to Use

Option 1: On Databricks

Option 2: Locally

📂 Repo Structure

💡 Why This Repo?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

My PySpark Reference Repo

🚀 What You’ll Find Inside

⚙️ Requirements

▶️ How to Use

Option 1: On Databricks

Option 2: Locally

📂 Repo Structure

💡 Why This Repo?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages