-
Notifications
You must be signed in to change notification settings - Fork 946
Narwhals implementation of from_dataframe
and performance benchmark
#2661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dennisbader
merged 42 commits into
unit8co:master
from
authierj:feature/add_timeseries_from_polars
Feb 28, 2025
Merged
Changes from 2 commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
28a9298
narwhals implementation for and test benchmark
6382082
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 0041203
changes from MarcoGorelli incorporated
576e88e
improvement thanks to reviewers
e013a42
Merge branch 'master' into feature/add_timeseries_from_polars
authierj dbe2cd9
added comments about slow and fast parts of the code
authierj b2ffc67
using pandas index to avoid .to_list()
authierj c5fa503
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 79312c9
bug fix added
authierj fc8bda4
Merge branch 'feature/add_timeseries_from_polars' of https://github.c…
authierj b08a74f
updated test script
authierj 2425fbe
narwhals timeseries added
authierj 36300f2
from_series changed, names changed
authierj ba01df1
changelog updated
authierj ffd1202
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 2e39269
small improvement
authierj 1a9a266
clean test scripts added
authierj a030ea5
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 2c24a39
BUGFIX added for non_pandas df
authierj 89f23fb
tests added for polars df
authierj de0a32d
polars and narwhals added to dependencies. Ideally, polars should be …
authierj 66b770d
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 16bac00
refactoring pd_series and pd_dataframe
authierj 0950910
removed test scripts from git repo
authierj 042f9fb
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 5afc721
Update CHANGELOG.md
authierj 7877dd6
Update darts/timeseries.py
authierj 102a26c
easy corrections applied
authierj 9d66c06
Merge branch 'feature/add_timeseries_from_polars' of https://github.c…
authierj f629089
Merge branch 'master' into feature/add_timeseries_from_polars
authierj 56a20c1
narwhals_test_time removed
authierj f764e19
Update requirements/core.txt
authierj 319a48f
Update darts/timeseries.py
authierj e8925f1
most corrections added
authierj 05a7215
merged
authierj 11d17c1
polars tests removed
authierj a720bb4
Merge branch 'master' into feature/add_timeseries_from_polars
authierj f9f5aa8
tests corrected
authierj e0b4984
Merge branch 'master' into feature/add_timeseries_from_polars
dennisbader c13cc1d
Update darts/timeseries.py
authierj 370d761
Update darts/timeseries.py
authierj 3fa924f
no time_col, define one
authierj File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
authierj marked this conversation as resolved.
Show resolved
Hide resolved
authierj marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
import time | ||
import warnings | ||
from itertools import product | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
from darts.timeseries import TimeSeries | ||
|
||
# Suppress all warnings | ||
warnings.filterwarnings("ignore") | ||
|
||
|
||
def create_random_dataframes( | ||
num_rows: int = 10, | ||
num_columns: int = 3, | ||
index: bool = True, | ||
start_date: str = "2023-01-01", | ||
freq: str = "D", | ||
) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: | ||
""" | ||
Create three pandas DataFrames with random data and dates as the index or as a column. | ||
|
||
Parameters: | ||
- num_rows (int): The number of rows in the DataFrames. | ||
- num_columns (int): The number of columns in the DataFrames. | ||
- index (bool): If True, the date is the index of the DataFrame. If False, the date is a column named 'date'. | ||
- start_date (str): The start date for the date range (used only if date_format is 'date'). | ||
- freq (str): The frequency of the date range (used only if date_format is 'date'). | ||
|
||
Returns: | ||
- tuple: A tuple containing three DataFrames (df_date, df_numpy, df_integer). | ||
""" | ||
# Set a random seed for reproducibility | ||
np.random.seed(42) | ||
|
||
# Generate a date range or integer list based on the date_format parameter | ||
date_values = pd.date_range(start=start_date, periods=num_rows, freq=freq) | ||
integer_values = list(range(1, num_rows + 1)) | ||
numpy_values = np.array( | ||
pd.date_range(start=start_date, periods=num_rows, freq=freq), | ||
dtype="datetime64[D]", | ||
) | ||
|
||
# Create random data for the DataFrames | ||
data = {f"col_{i}": np.random.randn(num_rows) for i in range(num_columns)} | ||
|
||
# Create the DataFrames | ||
df_date = pd.DataFrame(data) | ||
df_numpy = pd.DataFrame(data) | ||
df_integer = pd.DataFrame(data) | ||
|
||
col_names = df_date.columns.values | ||
|
||
# Set the date as index or as a column based on the index parameter | ||
if index: | ||
df_date.index = date_values | ||
df_numpy.index = numpy_values | ||
df_integer.index = integer_values | ||
else: | ||
df_date["date"] = date_values | ||
df_numpy["date"] = numpy_values | ||
df_integer["date"] = integer_values | ||
|
||
if index: | ||
time_col = None | ||
else: | ||
time_col = "date" | ||
|
||
return [ | ||
[df_date, col_names, time_col], | ||
[df_numpy, col_names, time_col], | ||
[df_integer, col_names, time_col], | ||
] | ||
|
||
|
||
def test_dataframes() -> list: | ||
test_config = product( | ||
[10, 100, 1000, 10000], | ||
[10, 100, 500, 1000], | ||
[True, False], | ||
) | ||
|
||
dataframes_list = [ | ||
create_random_dataframes( | ||
num_rows=num_rows, num_columns=num_columns, index=index | ||
) | ||
for num_rows, num_columns, index in test_config | ||
] | ||
|
||
return dataframes_list | ||
|
||
|
||
df_list = test_dataframes() | ||
|
||
############ PANDAS ############ | ||
pandas_timer = time.time() | ||
for df_config in df_list: | ||
for df, col_names, time_col in df_config: | ||
_ = TimeSeries.from_dataframe( | ||
df, value_cols=col_names, time_col=time_col, freq=None | ||
) | ||
df_shuffle = df.sample(frac=1) | ||
_ = TimeSeries.from_dataframe( | ||
df_shuffle, value_cols=col_names, time_col=time_col, freq=None | ||
) | ||
pandas_timer = time.time() - pandas_timer | ||
|
||
############ NARWHALS ############ | ||
narwhals_timer = time.time() | ||
for df_config in df_list: | ||
for df, col_names, time_col in df_config: | ||
_ = TimeSeries.from_narwhals_dataframe( | ||
df, value_cols=col_names, time_col=time_col, freq=None | ||
) | ||
df_shuffle = df.sample(frac=1) | ||
_ = TimeSeries.from_narwhals_dataframe( | ||
df_shuffle, value_cols=col_names, time_col=time_col, freq=None | ||
) | ||
narwhals_timer = time.time() - narwhals_timer | ||
|
||
print("pandas processing time: ", pandas_timer) | ||
print("narwhals processing time: ", narwhals_timer) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.