Skip to content

Commit 2819f79

Browse files
authored
0.2.0
* Update TODO.md * change default id_col to None * the Canadian one * what fools these mortals be * gotta have sin * magic * mess * its tedious to make up stuff * dreariness to suit * time to plant peas * it's snowing again... * pct_change endless is * what's up dude? * made a change and no bugs. I don't like it, it's too easy. * neeeeed fooooooddd * twit * nothing much really * April snowstorm * zwiftly * cleaning up * sleeeeeeeep * two commas and the pain they caused * Skyfall * always a mistake somewhere * Update gluonts.py * improved genetic recombination * so many params * lost in transformerlandia * sprucing up Ensembling * ensembling continues * horizontal ensembling begins to work! * recursion is strong with the dark side * oddment, blubber * april showers * Finally an RNN * ongoing RNN work * Build me a Death Star! * A Death Star Worthy of Mordor * dandelions bloom * tfp part 1 * TFP headaches + validation wasn't working * another stab at probabilistic inference * a few that missed the cut * passing at same time * Scaled Pinball Loss * spl bug has been smashed * constraint, it's like a quarantine for models * whether tis nobler * ensembling now with validation * a rather late frost * seasonal validation and improved cat * removed annoying kbins out of index error * added detrend regression types * sleeeeeeeeeeeepppppppyyyyyyyyyy * improving template in/out * pacified * and I thought of a bug to fix * 0.2.0a1 * parameter tuning * spring cleaning * A Memory of Light * non-docs stuff * docs rst * docs rst githubpages maybe * github pages 2 * Delete standalone.py * Delete functional_environments.md * replacing names begins * some more renames and bumped max_iter * rename preord_ to future_ regressor * always another bug hiding in plain sight * training intervals, in the program, on the bike * confused I am * clean up * +1 * 0.2.0a2 * cleaning shaping * interrupt and result save improvements * not quite * 0.2.0a3 * darkness falls, if late * always another alpha * terra cotta * unpack error * 0.2.0a4 * getting ready for a major release * versioning change
1 parent 7cdbfc7 commit 2819f79

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+29905
-4403
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# supporting files
2+
standalone.py
3+
functional_environments.md
4+
15
# Byte-compiled / optimized / DLL files
26
__pycache__/
37
*.py[cod]
@@ -65,6 +69,7 @@ instance/
6569

6670
# Sphinx documentation
6771
docs/_build/
72+
!docs/build/
6873

6974
# PyBuilder
7075
target/

README.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
![AutoTS Logo](/img/autots_logo.png)
44

5-
#### Model Selection for Multiple Time Series
5+
**Model Selection for Multiple Time Series**
66

77
Simple package for comparing and predicting with open-source time series implementations.
88

99
For other time series needs, check out the list [here](https://github.com/MaxBenChrist/awesome_time_series_in_python).
1010

1111
## Features
12-
* Fourteen available model classes, with thousands of possible hyperparameter configurations
12+
* Twenty available model classes, with tens of thousands of possible hyperparameter configurations
1313
* Finds optimal time series models by genetic programming
1414
* Handles univariate and multivariate/parallel time series
1515
* Point and probabilistic forecasts
@@ -18,7 +18,7 @@ For other time series needs, check out the list [here](https://github.com/MaxBen
1818
* Allows automatic ensembling of best models
1919
* Multiple cross validation options
2020
* Subsetting and weighting to improve search on many multivariate series
21-
* Option to use one or a combination of SMAPE, RMSE, MAE, and Runtime for model selection
21+
* Option to use one or a combination of metrics for model selection
2222
* Ability to upsample data to a custom frequency
2323
* Import and export of templates allowing greater user customization
2424

@@ -36,32 +36,36 @@ Input data is expected to come in a 'long' format with three columns:
3636
The column name for each of these is passed to .fit().
3737

3838
```
39-
from autots.datasets import load_toy_monthly # also: _daily _yearly or _hourly
40-
df_long = load_toy_monthly()
39+
40+
# also: _hourly, _daily, _weekly, or _yearly
41+
from autots.datasets import load_monthly
42+
df_long = load_monthly()
4143
4244
from autots import AutoTS
43-
model = AutoTS(forecast_length = 3, frequency = 'infer',
44-
prediction_interval = 0.9, ensemble = False, weighted = False,
45-
drop_data_older_than_periods = 240,
46-
max_generations = 5, num_validations = 2, validation_method = 'even')
47-
model = model.fit(df_long, date_col = 'datetime', value_col = 'value', id_col = 'series_id')
45+
model = AutoTS(forecast_length=3, frequency='infer',
46+
prediction_interval=0.9, ensemble=None,
47+
model_list='superfast',
48+
max_generations=5, num_validations=2,
49+
validation_method='even')
50+
model = model.fit(df_long, date_col='datetime',
51+
value_col='value', id_col='series_id')
4852
4953
# Print the name of the best model
5054
print(model)
5155
5256
prediction = model.predict()
5357
# point forecasts dataframe
5458
forecasts_df = prediction.forecast
55-
# accuracy of all tried model results (not including cross validation)
56-
model_results = model.initial_results.model_results
57-
# and including cross validation
58-
validation_results = model.validation_results.model_results
59+
# accuracy of all tried model results
60+
model_results = model.results()
61+
# and aggregated from cross validation
62+
validation_results = model.results("validation")
5963
6064
```
6165

62-
Check out [extended_tutorial.md](https://github.com/winedarksea/AutoTS/blob/master/extended_tutorial.md) for a more detailed guide to features!
66+
Check out [extended_tutorial.md](https://winedarksea.github.io/AutoTS/build/source/tutorial.html) for a more detailed guide to features!
6367

64-
# How to Contribute:
68+
## How to Contribute:
6569
* Give feedback on where you find the documentation confusing
6670
* Use AutoTS and...
6771
* Report errors and request features by adding Issues on GitHub

TODO.md

Lines changed: 164 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -7,68 +7,172 @@
77
* New models need only be sometimes applicable
88
* Fault tolerance: it is perfectly acceptable for model parameters to fail on some datasets, the higher level API will pass over and use others.
99

10+
Latest:
11+
Added Github Pages documentation
12+
Changed default for `series_id` so it is no longer required if univariate
13+
Changed default of `subset` to None.
14+
Removed `weighted` parameter, now passing weights to .fit() alone is sufficient.
15+
Fixed a bug where 'One or more series is 90% or more NaN' was printing when it shouldn't
16+
Fixed (or more accurately, reduced) a bug where multiple initial runs were counting as validation runs.
17+
Fixed bug where validation subsetting was behaving oddly
18+
Fixed bug where regressor wasn't being passed to validation.
19+
Renamed preord_ to future_ regressor.
20+
Renamed sample datasets.
21+
Allowed export of result_file as .pickle along with more complete object.
22+
Added model_interrupt parameter to allow for manually skipping models when enabled.
23+
Made serious efforts to make the code prettier with pylint, still lots to do, however...
24+
Improved genetic recombination so optimal models should be reached more quickly
25+
Improved Point to Probabilistic methods:
26+
'historic_quantile' more stable quantile-based error ranges
27+
'inferred normal' Bayesian-inspired method
28+
Metrics:
29+
Added Scaled Pinball Loss (SPL)
30+
Removed upper/lower MAE
31+
Improved ensembling with new parameter options
32+
Recursive ensembling (ensemble of ensembles) now enabled
33+
Validation:
34+
Added 'seasonal' validation method
35+
Categorical transformer improved, now tolerant to leaving bounds.
36+
Added remove_leading_zeroes option for convenience.
37+
38+
Added a number of new Transformer options
39+
Multiple new Sklearn-sourced transformers (QuantileTransformer, etc)
40+
SinTrend
41+
DifferencedDetrend
42+
CumSumTransformer
43+
PctChangeTransformer
44+
PositiveShift Transformer
45+
Log
46+
IntermittentOccurrence
47+
SeasonalDetrend
48+
bkfilter and cffilter
49+
DatepartRegression
50+
Entirely changed the general transformer to add ~~three~~ four levels of transformation.
51+
Allowed context_slicer to receive direct integer inputs
52+
Added new 'Detrend' options to allow more sklearn linear models.
53+
54+
GLM
55+
Error where it apparently won't tolerate any zeroes was compensated for.
56+
Speed improvement.
57+
RollingRegression
58+
Added SVM model
59+
Added option to tune some model parameters to sklearn
60+
Added new feature construction parameters
61+
Added RNNs with Keras
62+
GluonTS:
63+
fixed the use of context_length, added more options to that param
64+
Dynamic Factor added uncertainty from Statsmodels Statespace
65+
VARMAX added uncertainty from Statsmodels Statespace
66+
67+
New models:
68+
SeasonalNaive model
69+
VAR from Statsmodels (faster than VARMAX statespace)
70+
MotifSimulation
71+
WindowRegression
72+
TensorflowSTS
73+
TFPRegression
74+
ComponentAnalysis
75+
1076
# Errors:
11-
'Detrend' transformation is still buggy (can't convert to Series)
12-
raise AttributeError(("Model String '{}' not recognized").format(model)) -> turn to an allowable exception with a printed warning
13-
Holiday not (always) working
77+
DynamicFactor holidays Exceptions 'numpy.ndarray' object has no attribute 'values'
78+
VECM does not recognize exog to predict
79+
ARIMA with User or Holiday ValueError('Can only compare identically-labeled DataFrame objects',)
80+
Drop Most Recent does not play well logically with added external (future) regressors.
81+
FastICA 'array must not contain infs or NaNs'
82+
How do fillna methods handle datasets that have entirely NaN series?
83+
VAR ValueError('Length of passed values is 4, index implies 9',)
84+
WindowRegression + KerasRNN + 1step + univariate = ValueError('Length mismatch: Expected axis has 54 elements, new values have 9 elements',)
85+
Is Template Eval Error: ValueError('array must not contain infs or NaNs',) related to Point to Probability HISTORIC QUANTILE?
86+
'Fake Date' doesn't work on entirely NaN series - ValueError('Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.',)
87+
88+
89+
### Ignored Errors:
90+
xgboost poisson loss does not accept negatives
91+
GluonTS not accepting quite a lot of frequencies
92+
KerasRNN errors due to parameters not working on all dataset
93+
Tensorflow GPU backend may crash on occasion.
94+
95+
## General Tasks
96+
* test submission
97+
* test whether bottom up significantly overestimates on rollup
98+
* store level hierarchial
1499

15100
# To-Do
16-
* Get the sphinx (google style) documentation and readthedocs.io website up
17-
* Better point to probabilistic (uncertainty of naive last-value forecast) - linear reg of abs error of samples - simulations
18-
* get_prediction for Statsmodels Statespace models to include confidence interval where possible
101+
* drop duplicates as function of TemplateEvalObject
102+
* fake date dataset of many series to improve General Template
103+
* better document ensembling
104+
* 'fast' option for RandomTransformations generator
105+
* optimize randomtransform probabilities
106+
* Add to template: Gluon, Motif, WindowRegression
107+
* Convert 'Holiday' regressors into Datepart + Holiday 2d
108+
* best per series to validation template even if poor on score overall
109+
* Bring GeneralTransformer to higher level API.
110+
* wide_to_long and long_to_wide in higher-level API
111+
* Option to use full traceback in errors in table
112+
* Hierarchial
113+
* every level must be included in forecasting data
114+
* 'bottom-up' and 'mid' levels
115+
* one level. User would have to specify all as based on lowest-level keys if wanted sum-up.
116+
* Better point to probabilistic (uncertainty of naive last-value forecast)
117+
* linear reg of abs error of samples - simulations
118+
* Data, pct change, find window with % max change pos, and neg then avg. Bound is first point + that percent, roll from that first point and adjust if points cross, variant where all series share knowledge
119+
* Bayesian posterior update of prior
120+
* variance of k nearest neighbors
121+
* Data, split, normalize, find distribution exponentially weighted to most recent, center around forecast, shared variant
122+
* Data quantile, recentered around median of forecast.
123+
* Categorical class probabilities as range for RollingRegression
124+
* get_forecast for Statsmodels Statespace models to include confidence interval where possible
19125
* migrate arima_model to arima.model
20-
* Check how fillna methods handle datasets that are entirely NaN
21-
* Better X_maker:
22-
* use feature selection on TSFresh features - autocorrelation lag n, fft/cwt coefficients (abs), abs_energy
23-
* date part and age/expanding regressors
126+
* uncomp with uncertainty intervals
127+
* Window regression
128+
* transfer learning
129+
* RollingRegression
130+
* Better X_maker:
131+
* 1d and 2d variations
132+
* .cov, .skew, .kurt, .var
133+
* https://link.springer.com/article/10.1007/s10618-019-00647-x/tables/1
134+
* Probabilistic:
135+
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html
136+
* other sequence models
137+
* Categorical classifier, ComplementNB
138+
* PCA or similar -> Univariate Series (Unobserved Components)
139+
* Simple performance:
140+
* replace try/except with if/else in some cases
24141
* GluonTS
25-
* Add support for preord_regressor
142+
* Add support for future_regressor
143+
* Make sure of rolling regression setup
26144
* Modify GluonStart if lots of NaN at start of that series
27145
* GPU and CPU ctx
28-
* Print methods for prediction/model objects that give simple readme how-to's
29-
* Get Tsfresh working with small dataset (short, 2 columns) (check feature importance again)
146+
* motif simulation, remove all those for loops
147+
* implement 'borrow' Genetic Recombination for ComponentAnalysis
148+
* Regressor to TensorflowSTS
30149
* Relative/Absolute Imports and reduce package reloading messages
31-
* Format of Regressor - allow multiple input to at least sklearn models
150+
* Allow FillNA to be None
151+
* Replace OrdinalEncoder with non-external code
32152
* 'Age' regressor as an option in addition to User/Holiday in ARIMA, etc.
33-
* Handle categorical forecasts where forecast leaves range of known values, then add to upper/lower forecasts
34153
* Speed improvements, Profiling
35154
* Parallelization, and Distributed options (Dask) for general greater speed
36155
* Improve usability on rarer frequenices (ie monthly data where some series start on 1st, others on 15th, etc.)
37156
* Figures: Add option to output figures of train/test + forecast, other performance figures
38-
* Pre-clustering on many time series
39157
* If all input are Int, convert floats back to int
40-
* Trim whitespace/case-desensitize on string inputs
41-
* Option to print % improvement of best over last value naive
42-
* Hierachial correction (bottom-up to start with)
43-
* Because I'm a biologist, incorporate more genetics and such. Also as a neuro person, there must be a way to fit networks in...
44-
* Improved verbosity controls and options.
45158
* Replace most 'print' with logging.
46-
* Export as simpler code (as TPOT)
47-
* set up the lower-level API to be usable as pipelines
48-
* allow stand-alone pipeline for transformation with format for export data format to other package requirements (use AutoTS just for preprocessing)
49-
* AIC metric, other accuracy metrics
50-
* MAE of upper and lower forecasts, balance with Containment
51-
* Metric to measure if the series follows the same shape (Contour)
52-
* Potentially % change between n and n-1, compare this % change between forecast and actual
53-
* One if same direction, 0 otherwise (sum/len)
54159
* Analyze and return inaccuracy patterns (most inaccurate periods out, days of week, most inaccurate series)
55160
* Development tools:
56-
* Add to Conda distribution as well as pip
161+
* Add to Conda (Forge) distribution as well as pip
57162
* Continuous integration
58163
* Code/documentation quality checkers
59-
* Option to drop series which haven't had a value in last N days
60-
* More thorough use of setting random seed, verbose, n_jobs
61-
* For monthly data account for number of days in month
62-
* add constant to GLM
63164
* Ability to automatically add external datasets of parallel time series of global usability (ie from FRED or others)
165+
* make datetime input optional, just allow dataframes of numbers
64166
* Option to import either long or wide data
65167
* Infer column names for df_long to wide based on which is datetime, which is string, and which is numeric
66168

169+
### Links
170+
* https://link.springer.com/article/10.1007/s10618-019-00647-x/tables/1
171+
* https://github.com/gantheory/TPA-LSTM
172+
* https://github.com/huseinzol05/Stock-Prediction-Models/tree/master/deep-learning
173+
67174
### Faster Convergence / Faster in General
68175
* Only search useful parameters, highest probability for most likely effective parameters
69-
* 'Expert' starting templates to try most likley combinations first
70-
* Recombine best two of each model parameters, if two or more present (plus option to disable this)
71-
* Recombination of transformations
72176
* Remove parameters that are rarely/never useful from get_new_params
73177
* Don't apply transformations to Zeroes naive, possibly other naives
74178
* Option to run generations until generations no longer see improvement of at least X % over n generations
@@ -77,35 +181,38 @@ Holiday not (always) working
77181
* potentially a method = 'deep' to get_new_params used after n generations
78182
* no unlock, but simply very low-probability deep options in get_new_params
79183
* Exempt or reduce slow models from unnecessary runs, particularly with different transformations
80-
* Numba and Cythion acceleration (metrics might be easy to start with)
184+
* Numba and Cython acceleration (metrics might be easy to start with)
185+
* GPU - xgboost, GluontTS
186+
187+
#### New datasets:
188+
Second level data that is music (like a radio stream)
189+
Ecological data
81190

82191
#### New Ensembles:
83-
best 3 (unique algorithms not just variations of same)
84-
forecast distance 30/30/30
85-
best per series ensemble ('horizontal ensemble')
86-
best point with best probalistic containment
192+
Best N combined with Decision Tree
193+
87194
#### New models:
88-
Seasonal Naive
89-
Last Value + Drift Naive
90-
Simple Decomposition forecasting
91-
Statespace variant of ETS which has Confidence Intervals
92-
Tensorflow Probability Structural Time Series
93-
Pytorch Simple LSTM/GRU
195+
Croston, SBA, TSB, ADIDA, iMAPA
196+
Local Linear/Piecewise Regression Model
94197
Simulations
95-
XGBoost (doesn't support multioutput directly)
96-
Sklearn + TSFresh
97-
Sktime
98198
Ta-lib
199+
Pyflux
99200
tslearn
100-
Multivariate GARCH
201+
GARCH (arch library seems best maintained, none have multivariate)
101202
pydlm - baysesian dynamic linear
102-
Isotonic regression
103-
Survival Analysis
104203
MarkovAutoRegression
105-
Motif discovery, and repeat
204+
hmmlearn
106205
TPOT if it adds multioutput functionality
107206
https://towardsdatascience.com/pyspark-forecasting-with-pandas-udf-and-fb-prophet-e9d70f86d802
108-
Compressive Transformer, if they go anywhere
207+
Compressive Transformer
208+
Reinforcement Learning
109209

110210
#### New Transformations:
111-
Test variations on 'RollingMean100thN'
211+
Sklearn iterative imputer
212+
lag and beta to DifferencedTransformer to make it more of an AR process
213+
Weighted moving average
214+
Symbolic aggregate approximation (SAX) and (PAA) (basically these are just binning)
215+
Shared discretization (all series get same shared binning)
216+
Last Value Centering
217+
Constraint as a transformation parameter
218+

autots/__init__.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,21 @@
33
44
https://github.com/winedarksea/AutoTS
55
"""
6-
from autots.datasets import load_toy_hourly
7-
from autots.datasets import load_toy_daily
8-
from autots.datasets import load_toy_monthly
9-
from autots.datasets import load_toy_yearly
10-
from autots.datasets import load_toy_weekly
6+
from autots.datasets import load_hourly
7+
from autots.datasets import load_daily
8+
from autots.datasets import load_monthly
9+
from autots.datasets import load_yearly
10+
from autots.datasets import load_weekly
1111

1212
from autots.evaluator.auto_ts import AutoTS
13+
from autots.tools.transform import GeneralTransformer
14+
from autots.tools.shaping import long_to_wide
1315

14-
__version__ = '0.1.5'
16+
__version__ = '0.2.0'
1517

1618

17-
__all__ = ['load_toy_daily','load_toy_monthly', 'load_toy_yearly', 'load_toy_hourly', 'load_toy_weekly',
18-
'AutoTS']
19+
__all__ = ['load_daily','load_monthly', 'load_yearly', 'load_hourly', 'load_weekly',
20+
'AutoTS', 'GeneralTransformer', 'long_to_wide']
1921

2022
# import logging
2123
# logger = logging.getLogger(__name__)

0 commit comments

Comments
 (0)