Skip to content

feat: initial version of the audit & clean data feature#611

Merged
ngupta23 merged 33 commits intomainfrom
feat/audit_missing
Mar 27, 2025
Merged

feat: initial version of the audit & clean data feature#611
ngupta23 merged 33 commits intomainfrom
feat/audit_missing

Conversation

@ngupta23
Copy link
Member

@ngupta23 ngupta23 commented Feb 27, 2025

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2025

Experiment Results

Experiment 1: air-passengers

Description:

variable experiment
h 12
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 12.6793 11.0623 47.8333 76
mape 0.027 0.0232 0.0999 0.1425
mse 213.936 199.132 2571.33 10604.2
total_time 0.7812 1.0374 0.0044 0.0035

Plot:

Experiment 2: air-passengers

Description:

variable experiment
h 24
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 58.1031 58.4587 71.25 115.25
mape 0.1257 0.1267 0.1552 0.2358
mse 4040.21 4110.79 5928.17 18859.2
total_time 0.7719 0.7789 0.0037 0.0033

Plot:

Experiment 3: electricity-multiple-series

Description:

variable experiment
h 24
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 178.293 268.13 269.23 1331.02
mape 0.0234 0.0311 0.0304 0.1692
mse 121589 219485 213677 4.68961e+06
total_time 1.4861 1.545 0.0046 0.0045

Plot:

Experiment 4: electricity-multiple-series

Description:

variable experiment
h 168
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 465.497 346.972 398.956 1119.26
mape 0.062 0.0436 0.0512 0.1583
mse 835021 403760 656723 3.17316e+06
total_time 1.0348 0.9344 0.0048 0.0043

Plot:

Experiment 5: electricity-multiple-series

Description:

variable experiment
h 336
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 558.673 459.757 602.926 1340.95
mape 0.0697 0.0565 0.0787 0.17
mse 1.22723e+06 739114 1.61572e+06 6.04619e+06
total_time 0.9601 1.3808 0.0049 0.0045

Plot:

@jmoralez
Copy link
Contributor

Please just open PRs when things are ready to be merged to main.

@ngupta23
Copy link
Member Author

@MMenchero and I need to contribute to this feature and there is some codependence on each other's code. So we cannot merge the final product to main yet.

We can involve you when we are ready to merge to main if that is what you would prefer. At this stage we are just looking for initial feedback in case something is glaringly off so that we don't have to do a major rework at the end when we submit the PR to main.

@jmoralez
Copy link
Contributor

I don't understand why you didn't just merge the branches locally and opened the PR towards main, the diff in this PR is meaningless if the base branch already has changes. Also I don't understand the difference between getting feedback in a PR towards main and here, since you've already done the work.

@mergenthaler mergenthaler marked this pull request as draft February 27, 2025 22:42
@goodwanghan
Copy link
Contributor

@ngupta23 Shall we have unit tests for this critical change?

@ngupta23
Copy link
Member Author

ngupta23 commented Mar 4, 2025

Yes, these are included inline and at the end of the notebook.

@ngupta23 ngupta23 changed the base branch from feat/data_audit to main March 4, 2025 19:00
@ngupta23 ngupta23 requested review from elephaint and marcopeix March 18, 2025 17:43
@ngupta23
Copy link
Member Author

linting issues has been fixed in this PR. We will merge that before merging this PR.

Copy link
Contributor

@MMenchero MMenchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated latest changes, ready for review.

@ngupta23 ngupta23 changed the title feat: initial version of the audit data feature feat: initial version of the audit & clean data feature Mar 26, 2025
@MMenchero
Copy link
Contributor

@elephaint @marcopeix we would appreciate if you can review this.

Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

For future PRs I'd move to using Narwhals for this kind of functionality, so that we support any DFType by default, now we need to rewrite everything soon or add a lot of code to support Polars. Can you open an issue for that so that we don't forget?

@ngupta23
Copy link
Member Author

LGTM!

For future PRs I'd move to using Narwhals for this kind of functionality, so that we support any DFType by default, now we need to rewrite everything soon or add a lot of code to support Polars. Can you open an issue for that so that we don't forget?

Thanks @elephaint - noted here. We will update to narwhals in the next PR related to this.

@ngupta23 ngupta23 merged commit 5b32973 into main Mar 27, 2025
20 checks passed
@ngupta23 ngupta23 deleted the feat/audit_missing branch March 27, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nbdev (execnb) fails with iPython 9.0.0 on Python 3.11 and Python 3.12 [feat] preprocessing helper

6 participants