diff --git a/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb b/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb new file mode 100644 index 000000000..eb5e38c25 --- /dev/null +++ b/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Advanced Use Cases with Polars\n", + "\n", + "In this notebook, we'll explore some advanced features of the Polars library, an efficient DataFrame library in Python.\n", + "We'll assume that you're familiar with the basics of Polars from a previous introduction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import polars as pl\n\n", + "# Sample dataset:\n", + "data = {\n", + " \"product\": [\"A\", \"B\", \"C\"],\n", + " \"sales\": [100, 170, 90],\n", + " \"budget\": [80, 160, 95]\n", + "}\n", + "\n", + "# Create DataFrame\n", + "df = pl.DataFrame(data)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced GroupBy Operations\n", + "\n", + "Polars allows you to perform complex grouping and aggregations efficiently. Let's see how to use custom aggregation functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a custom aggregation function\n", + "custom_agg = pl.Expr.mean((pl.col(\"sales\") - pl.col(\"budget\")).alias(\"over_budget\"))\n\n", + "# Group by product and apply the custom function\n", + "result = df.groupby(\"product\").agg(custom_agg)\n", + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Join Operations\n", + "\n", + "Polars supports fast join operations with DataFrames. Here's an example of performing a join between two DataFrames." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "another_data = {\n", + " \"product\": [\"A\", \"B\", \"D\"],\n", + " \"profit\": [50, 60, 30]\n", + "}\n", + "\n", + "df2 = pl.DataFrame(another_data)\n\n", + "# Perform an inner join on 'product'\n", + "joined_df = df.join(df2, on=\"product\", how=\"inner\")\n", + "joined_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pivoting and Melting\n", + "\n", + "Transformations like pivoting and melting DataFrames can be easily accomplished using Polars." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Pivot operation\n", + "pivot_df = df.pivot(values=\"sales\", index=\"product\", columns=\"budget\")\n", + "pivot_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "This notebook provided a glimpse into the complex operations one can perform with Polars, showcasing its abilities to efficiently handle large datasets. For further learning, consider exploring more of Polars' official documentation and experimenting with its various functionalities." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb b/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb new file mode 100644 index 000000000..d7c7e052c --- /dev/null +++ b/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb @@ -0,0 +1,130 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Basic Table Handling with Polars\n", + "\n", + "In this notebook, we will introduce how to use the Polars library for basic table handling. Polars is a fast DataFrame library implemented in Rust and is designed to process large data efficiently.\n", + "\n", + "We will cover the following operations:\n", + "- Creating a DataFrame\n", + "- Basic DataFrame Operations\n", + "- Querying and Filtering\n", + "- Addition of new columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import Polars\n", + "import polars as pl" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating a DataFrame\n", + "\n", + "We can create a DataFrame in Polars using the `pl.DataFrame()` method. Here's an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = {\n", + " 'column_1': [1, 2, 3],\n", + " 'column_2': ['a', 'b', 'c']\n", + "}\n", + "\n", + "df = pl.DataFrame(data)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic DataFrame Operations\n", + "We can perform several basic operations on DataFrames such as selecting columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Selecting a single column\n", + "df.select('column_1')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Querying and Filtering\n", + "\n", + "Polars provides a powerful API to perform queries and filter data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Filtering rows where column_1 > 1\n", + "df.filter(pl.col('column_1') > 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Addition of New Columns\n", + "\n", + "We can add new columns to the DataFrame as shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add a new column\n", + "df.with_columns(pl.Series('column_3', [4, 5, 6]))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/_toc.yml b/docs/_toc.yml index 73c440203..dd0120b5a 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -368,6 +368,8 @@ parts: - file: 40_tabular_data_wrangling/summarizing_subsets - file: 40_tabular_data_wrangling/pivot_tables - file: 40_tabular_data_wrangling/tidy_data + - file: 40_tabular_data_wrangling/polars_basic_table_handling.ipynb + - file: 40_tabular_data_wrangling/polars_advanced_use_cases.ipynb - file: 40a_sql/readme sections: @@ -429,4 +431,3 @@ parts: - file: 01_introduction/glossary - file: imprint -