Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advanced Use Cases with Polars\n",
"\n",
"In this notebook, we'll explore some advanced features of the Polars library, an efficient DataFrame library in Python.\n",
"We'll assume that you're familiar with the basics of Polars from a previous introduction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import polars as pl\n\n",
"# Sample dataset:\n",
"data = {\n",
" \"product\": [\"A\", \"B\", \"C\"],\n",
" \"sales\": [100, 170, 90],\n",
" \"budget\": [80, 160, 95]\n",
"}\n",
"\n",
"# Create DataFrame\n",
"df = pl.DataFrame(data)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced GroupBy Operations\n",
"\n",
"Polars allows you to perform complex grouping and aggregations efficiently. Let's see how to use custom aggregation functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define a custom aggregation function\n",
"custom_agg = pl.Expr.mean((pl.col(\"sales\") - pl.col(\"budget\")).alias(\"over_budget\"))\n\n",
"# Group by product and apply the custom function\n",
"result = df.groupby(\"product\").agg(custom_agg)\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Join Operations\n",
"\n",
"Polars supports fast join operations with DataFrames. Here's an example of performing a join between two DataFrames."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"another_data = {\n",
" \"product\": [\"A\", \"B\", \"D\"],\n",
" \"profit\": [50, 60, 30]\n",
"}\n",
"\n",
"df2 = pl.DataFrame(another_data)\n\n",
"# Perform an inner join on 'product'\n",
"joined_df = df.join(df2, on=\"product\", how=\"inner\")\n",
"joined_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pivoting and Melting\n",
"\n",
"Transformations like pivoting and melting DataFrames can be easily accomplished using Polars."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example: Pivot operation\n",
"pivot_df = df.pivot(values=\"sales\", index=\"product\", columns=\"budget\")\n",
"pivot_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"This notebook provided a glimpse into the complex operations one can perform with Polars, showcasing its abilities to efficiently handle large datasets. For further learning, consider exploring more of Polars' official documentation and experimenting with its various functionalities."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
130 changes: 130 additions & 0 deletions docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic Table Handling with Polars\n",
"\n",
"In this notebook, we will introduce how to use the Polars library for basic table handling. Polars is a fast DataFrame library implemented in Rust and is designed to process large data efficiently.\n",
"\n",
"We will cover the following operations:\n",
"- Creating a DataFrame\n",
"- Basic DataFrame Operations\n",
"- Querying and Filtering\n",
"- Addition of new columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import Polars\n",
"import polars as pl"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating a DataFrame\n",
"\n",
"We can create a DataFrame in Polars using the `pl.DataFrame()` method. Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = {\n",
" 'column_1': [1, 2, 3],\n",
" 'column_2': ['a', 'b', 'c']\n",
"}\n",
"\n",
"df = pl.DataFrame(data)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic DataFrame Operations\n",
"We can perform several basic operations on DataFrames such as selecting columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Selecting a single column\n",
"df.select('column_1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying and Filtering\n",
"\n",
"Polars provides a powerful API to perform queries and filter data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Filtering rows where column_1 > 1\n",
"df.filter(pl.col('column_1') > 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Addition of New Columns\n",
"\n",
"We can add new columns to the DataFrame as shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Add a new column\n",
"df.with_columns(pl.Series('column_3', [4, 5, 6]))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
3 changes: 2 additions & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,8 @@ parts:
- file: 40_tabular_data_wrangling/summarizing_subsets
- file: 40_tabular_data_wrangling/pivot_tables
- file: 40_tabular_data_wrangling/tidy_data
- file: 40_tabular_data_wrangling/polars_basic_table_handling.ipynb
- file: 40_tabular_data_wrangling/polars_advanced_use_cases.ipynb

- file: 40a_sql/readme
sections:
Expand Down Expand Up @@ -429,4 +431,3 @@ parts:
- file: 01_introduction/glossary

- file: imprint