From 0304e173ede975468aa335b824032f86f61e686d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 19 Aug 2024 17:15:37 +0000 Subject: [PATCH 1/3] Created a new Jupyter notebook to demonstrate basic table handling with the Polars library. --- .../basic_table_handling_polars.ipynb | 130 ++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb diff --git a/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb b/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb new file mode 100644 index 000000000..d7c7e052c --- /dev/null +++ b/docs/40_tabular_data_wrangling/basic_table_handling_polars.ipynb @@ -0,0 +1,130 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Basic Table Handling with Polars\n", + "\n", + "In this notebook, we will introduce how to use the Polars library for basic table handling. Polars is a fast DataFrame library implemented in Rust and is designed to process large data efficiently.\n", + "\n", + "We will cover the following operations:\n", + "- Creating a DataFrame\n", + "- Basic DataFrame Operations\n", + "- Querying and Filtering\n", + "- Addition of new columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import Polars\n", + "import polars as pl" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating a DataFrame\n", + "\n", + "We can create a DataFrame in Polars using the `pl.DataFrame()` method. Here's an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = {\n", + " 'column_1': [1, 2, 3],\n", + " 'column_2': ['a', 'b', 'c']\n", + "}\n", + "\n", + "df = pl.DataFrame(data)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic DataFrame Operations\n", + "We can perform several basic operations on DataFrames such as selecting columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Selecting a single column\n", + "df.select('column_1')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Querying and Filtering\n", + "\n", + "Polars provides a powerful API to perform queries and filter data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Filtering rows where column_1 > 1\n", + "df.filter(pl.col('column_1') > 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Addition of New Columns\n", + "\n", + "We can add new columns to the DataFrame as shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add a new column\n", + "df.with_columns(pl.Series('column_3', [4, 5, 6]))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 5847d6803a60ac37d8cf3a53fb745e20ebf746a7 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 19 Aug 2024 17:15:50 +0000 Subject: [PATCH 2/3] Created an advanced use case notebook for the Polars library with examples of complex operations. --- .../advanced_use_cases_polars.ipynb | 131 ++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb diff --git a/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb b/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb new file mode 100644 index 000000000..eb5e38c25 --- /dev/null +++ b/docs/40_tabular_data_wrangling/advanced_use_cases_polars.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Advanced Use Cases with Polars\n", + "\n", + "In this notebook, we'll explore some advanced features of the Polars library, an efficient DataFrame library in Python.\n", + "We'll assume that you're familiar with the basics of Polars from a previous introduction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import polars as pl\n\n", + "# Sample dataset:\n", + "data = {\n", + " \"product\": [\"A\", \"B\", \"C\"],\n", + " \"sales\": [100, 170, 90],\n", + " \"budget\": [80, 160, 95]\n", + "}\n", + "\n", + "# Create DataFrame\n", + "df = pl.DataFrame(data)\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced GroupBy Operations\n", + "\n", + "Polars allows you to perform complex grouping and aggregations efficiently. Let's see how to use custom aggregation functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a custom aggregation function\n", + "custom_agg = pl.Expr.mean((pl.col(\"sales\") - pl.col(\"budget\")).alias(\"over_budget\"))\n\n", + "# Group by product and apply the custom function\n", + "result = df.groupby(\"product\").agg(custom_agg)\n", + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Join Operations\n", + "\n", + "Polars supports fast join operations with DataFrames. Here's an example of performing a join between two DataFrames." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "another_data = {\n", + " \"product\": [\"A\", \"B\", \"D\"],\n", + " \"profit\": [50, 60, 30]\n", + "}\n", + "\n", + "df2 = pl.DataFrame(another_data)\n\n", + "# Perform an inner join on 'product'\n", + "joined_df = df.join(df2, on=\"product\", how=\"inner\")\n", + "joined_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pivoting and Melting\n", + "\n", + "Transformations like pivoting and melting DataFrames can be easily accomplished using Polars." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Pivot operation\n", + "pivot_df = df.pivot(values=\"sales\", index=\"product\", columns=\"budget\")\n", + "pivot_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "This notebook provided a glimpse into the complex operations one can perform with Polars, showcasing its abilities to efficiently handle large datasets. For further learning, consider exploring more of Polars' official documentation and experimenting with its various functionalities." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 023fe0a824afb6207b0132a7a399e62351b406e1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 19 Aug 2024 17:17:03 +0000 Subject: [PATCH 3/3] Added two new notebook entries under the "Tabular data, plots and statistics" section to include polars tutorials. --- docs/_toc.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/_toc.yml b/docs/_toc.yml index 73c440203..dd0120b5a 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -368,6 +368,8 @@ parts: - file: 40_tabular_data_wrangling/summarizing_subsets - file: 40_tabular_data_wrangling/pivot_tables - file: 40_tabular_data_wrangling/tidy_data + - file: 40_tabular_data_wrangling/polars_basic_table_handling.ipynb + - file: 40_tabular_data_wrangling/polars_advanced_use_cases.ipynb - file: 40a_sql/readme sections: @@ -429,4 +431,3 @@ parts: - file: 01_introduction/glossary - file: imprint -