|
1 | 1 | # AutoTS |
| 2 | +Unstable prototype: version 0.0.1 |
2 | 3 | ### Project CATS (Catlin Automated Time Series) |
3 | | -Model Selection for Multiple Time Series |
| 4 | +(or maybe eventually: Clustered Automated Time Series) |
| 5 | +#### Model Selection for Multiple Time Series |
4 | 6 |
|
5 | | -Simple package for comparing open-source time series implementations. |
| 7 | +Simple package for comparing and predicting with open-source time series implementations. |
| 8 | +For other time series needs, check out the package list here: https://github.com/MaxBenChrist/awesome_time_series_in_python |
| 9 | + |
| 10 | +`pip install autots` |
| 11 | +#### Requirements: |
| 12 | + Python >= 3.5 (typing) >= 3.6 (GluonTS) |
| 13 | + pandas |
| 14 | + sklearn >= 0.20.0 (ColumnTransformer) |
| 15 | + statsmodels |
| 16 | + holidays |
| 17 | + |
| 18 | +`pip install autots['additional models']` |
| 19 | +#### Requirements |
| 20 | + fbprophet |
| 21 | + fredapi (example datasets) |
| 22 | + |
| 23 | +## Basic Use |
| 24 | +Input data is expected to come in a 'long' format with three columns: Date (ideally already in pd.DateTime format), Value, and Series ID. the column name for each of these is passed to .fit(). For a single time series, series_id can be = None. |
| 25 | + |
| 26 | +``` |
| 27 | +from autots.datasets import load_toy_daily |
| 28 | +df_long = load_toy_daily() |
| 29 | +
|
| 30 | +from autots import AutoTS |
| 31 | +model = AutoTS(forecast_length = 14, frequency = 'infer', |
| 32 | + prediction_interval = 0.9, ensemble = True, weighted = False, |
| 33 | + max_generations = 5, num_validations = 2, validation_method = 'even') |
| 34 | +model = model.fit(df_long, date_col = 'date', value_col = 'value', id_col = 'series_id' ) |
| 35 | +
|
| 36 | +# Print the name of the best mode |
| 37 | +print(model.best_model['Model'].iloc[0]) |
| 38 | +
|
| 39 | +prediction = model.predict() |
| 40 | +# point forecasts dataframe |
| 41 | +forecasts_df = prediction.forecast |
| 42 | +# accuracy of all tried model results (not including cross validation) |
| 43 | +model_results = model.main_results.model_results |
| 44 | +``` |
| 45 | + |
| 46 | +## Underlying Process |
| 47 | +AutoTS works in the following way at present: |
| 48 | +* It begins by taking long data and converting it to a wide dataframe with DateTimeIndex |
| 49 | +* An initial train/test split is generated where the test is the most recent data, of forecast_length |
| 50 | +* A random template of models is generated and tested on the initial train/test |
| 51 | + * Models consist of a pre-transformation step (fill na options, outlier removal options, etc), and algorithm (ie ETS) and model paramters (trend, damped, etc) |
| 52 | +* The top models (selected by a combination of SMAPE, MAE, RMSE) are recombined with random mutations for n_generations |
| 53 | +* A handful of the best models from this process go to cross validation, where they are re-assessed on new train/test splits. |
| 54 | +* The best model in validation is selected as best_model and used in the .predict() method to generate forecasts. |
| 55 | + |
| 56 | +## Caveats and Advice |
| 57 | + |
| 58 | +#### Short Training History |
| 59 | +How much data is 'too little' depends on the seasonality and volatility of the data. |
| 60 | +But less than half a year of daily data or less than two years of monthly data are both going to be tight. |
| 61 | +Minimal training data most greatly impacts the ability to do proper cross validation. Set num_validations = 0 in such cases. |
| 62 | +Since ensembles are based on the test dataset, it would also be wise to set ensemble = False if num_validations = 0. |
| 63 | + |
| 64 | +#### Too much training data. |
| 65 | +Too much data is already handled to some extent by 'context_slicer' in the transformations, which tests using less training data. |
| 66 | +That said, large datasets will be slower and more memory intensive, for high frequency data (say hourly) it can often be advisable to roll that up to a higher level (daily, hourly, etc.). |
| 67 | +Rollup can be accomplished by specifying the frequency = your rollup frequency, and then setting the agg_func = 'sum' or 'mean' or other appropriate statistic. |
| 68 | + |
| 69 | +#### Lots of NaN in data |
| 70 | +Various NaN filling techniques are tested in the transformation. Rolling up data to a lower frequency may also help deal with NaNs. |
| 71 | + |
| 72 | +#### More than one preord regressor |
| 73 | +'Preord' regressor stands for 'Preordained' regressor, to make it clear this is data that will be know with high certainy about the future. |
| 74 | +Such data about the future is rare, one example might be number of stores that will be (planned to be) open each given day in the future when forecast sales. |
| 75 | +Since many algorithms do not handle more than one regressor, only one is handled here. If you would like to use more than one, |
| 76 | +manually select the best variable or use dimensionality reduction to reduce the features to one dimension. |
| 77 | +However, the model can handle quite a lot of parallel time series. Additional regressors can be passed through as additional time series to forecast. |
| 78 | +The regression models here can utilize the information they provide to help improve forecast quality. |
| 79 | +To prevent forecast accuracy for considering these additional series too heavily, input series weights that lower or remove their forecast accuracy from consideration. |
| 80 | + |
| 81 | +#### Categorical Data |
| 82 | +Categorical data is handled, but it is handled poorly. For example, optimization metrics do not currently include any categorical accuracy metrics. |
| 83 | +For categorical data that has a meaningful order (ie 'low', 'medium', 'high') it is best for the user to encode that data before passing it in, |
| 84 | +thus properly capturing the relative sequence (ie 'low' = 1, 'medium' = 2, 'high' = 3). |
| 85 | + |
| 86 | +#### Custom Metrics |
| 87 | +Implementing new metrics is rather difficult. However the internal 'Score' that compares models can easily be adjusted by passing through custom metric weights. |
| 88 | +Higher weighting increases the importance of that metric. |
| 89 | +`metric_weighting = {'smape_weighting' : 9, 'mae_weighting' : 1, 'rmse_weighting' : 5, 'containment_weighting' : 1, 'runtime_weighting' : 0.5}` |
| 90 | +sMAPE is generally the most versatile across multiple series, but doesn't handle forecasts with lots of zeroes well. |
| 91 | +Contaiment measures the percent of test data that falls between the upper and lower forecasts. |
| 92 | + |
| 93 | +## To-Do |
| 94 | +* Smaller |
| 95 | + * Recombine best two of each model, if two or more present |
| 96 | + * Duplicates still seem to be occurring in the genetic template runs |
| 97 | + * Inf appearing in MAE and RMSE (possibly all NaN in test) |
| 98 | + * Na Tolerance for test in simple_train_test_split |
| 99 | + * Relative/Absolute Imports and reduce package reloading |
| 100 | + * User regressor to sklearn model regression_type |
| 101 | + * Import/export template |
| 102 | + * ARIMA + Detrend fails |
| 103 | +* Things needing testing: |
| 104 | + * Confirm per_series weighting works properly |
| 105 | + * Passing in Start Dates - (Test) |
| 106 | + * Different frequencies |
| 107 | + * Various verbose inputs |
| 108 | + * Test holidays on non-daily data |
| 109 | + * Handle categorical forecasts where forecast leaves known values |
| 110 | +* Speed improvements, Profiling, Parallelization, and Distributed options for general greater speed |
| 111 | +* Generate list of functional frequences, and improve usability on rarer frequenices |
| 112 | +* Warning/handling if lots of NaN in most recent (test) part of data |
| 113 | +* Figures: Add option to output figures of train/test + forecast, other performance figures |
| 114 | +* Input and Output saved templates as .csv and .json |
| 115 | +* 'Check Package' to check if optional model packages are installed |
| 116 | +* Pre-clustering on many time series |
| 117 | +* If all input are Int, convert floats back to int |
| 118 | +* Trim whitespace on string inputs |
| 119 | +* Hierachial correction (bottom-up to start with) |
| 120 | +* Improved verbosity controls and options. Replace most 'print' with logging. |
| 121 | +* Export as simpler code (as TPOT) |
| 122 | +* AIC metric, other accuracy metrics |
| 123 | +* Analyze and return inaccuracy patterns (most inaccurate periods out, days of week, most inaccurate series) |
| 124 | +* Used saved results to resume a search partway through |
| 125 | +* Generally improved probabilistic forecasting |
| 126 | +* Option to drop series which haven't had a value in last N days |
| 127 | +* Option to change which metric is being used for model selections |
| 128 | +* Use quantile of training data to provide upper/lower forecast for Last Value Naive (so upper forecast might be 95th percentile largest number) |
| 129 | +* More thorough use of setting random seed |
| 130 | +* For monthly data account for number of days in month |
| 131 | +* Option to run generations until generations no longer see improvement of at least X % over n generations |
| 132 | + |
| 133 | +#### New Ensembles: |
| 134 | + best 3 (unique algorithms not just variations) |
| 135 | + forecast distance 30/30/30 |
| 136 | + best per series ensemble |
| 137 | + best point with best probalistic containment |
| 138 | +#### New models: |
| 139 | + Seasonal Naive |
| 140 | + Last Value + Drift Naive |
| 141 | + Simple Decomposition forecasting |
| 142 | + GluonTS Models |
| 143 | + Simulations |
| 144 | + Sklearn + TSFresh |
| 145 | + Sklearn + polynomial features |
| 146 | + Sktime |
| 147 | + Ta-lib |
| 148 | + tslearn |
| 149 | + pydlm |
| 150 | + Isotonic regression |
| 151 | + TPOT if it adds multioutput functionality |
0 commit comments