winedarksea
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 20 additions & 16 deletions b/‎README.md‎
Lines changed: 20 additions & 16 deletions
diff --git a/‎TODO.md‎
Lines changed: 164 additions & 57 deletions b/‎TODO.md‎
Lines changed: 164 additions & 57 deletions
diff --git a/‎autots/__init__.py‎
Lines changed: 10 additions & 8 deletions b/‎autots/__init__.py‎
Lines changed: 10 additions & 8 deletions
@@ -1,3 +1,7 @@
+# supporting files
+standalone.py
+functional_environments.md
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@@ -65,6 +69,7 @@ instance/
 
 # Sphinx documentation
 docs/_build/
+!docs/build/
 
 # PyBuilder
 target/
 
@@ -2,14 +2,14 @@
 
 ![AutoTS Logo](/img/autots_logo.png)
 
-#### Model Selection for Multiple Time Series
+**Model Selection for Multiple Time Series**
 
 Simple package for comparing and predicting with open-source time series implementations.
 
 For other time series needs, check out the list [here](https://github.com/MaxBenChrist/awesome_time_series_in_python).
 
 ## Features
-* Fourteen available model classes, with thousands of possible hyperparameter configurations
+* Twenty available model classes, with tens of thousands of possible hyperparameter configurations
 * Finds optimal time series models by genetic programming
 * Handles univariate and multivariate/parallel time series
 * Point and probabilistic forecasts
@@ -18,7 +18,7 @@ For other time series needs, check out the list [here](https://github.com/MaxBen
 * Allows automatic ensembling of best models
 * Multiple cross validation options
 * Subsetting and weighting to improve search on many multivariate series
-* Option to use one or a combination of SMAPE, RMSE, MAE, and Runtime for model selection
+* Option to use one or a combination of metrics for model selection
 * Ability to upsample data to a custom frequency
 * Import and export of templates allowing greater user customization
 
@@ -36,32 +36,36 @@ Input data is expected to come in a 'long' format with three columns:
 The column name for each of these is passed to .fit(). 
 
 ```
-from autots.datasets import load_toy_monthly # also: _daily _yearly or _hourly
-df_long = load_toy_monthly()
+
+# also: _hourly, _daily, _weekly, or _yearly
+from autots.datasets import load_monthly 
+df_long = load_monthly()
 
 from autots import AutoTS
-model = AutoTS(forecast_length = 3, frequency = 'infer',
-               prediction_interval = 0.9, ensemble = False, weighted = False,
-			   drop_data_older_than_periods = 240,
-               max_generations = 5, num_validations = 2, validation_method = 'even')
-model = model.fit(df_long, date_col = 'datetime', value_col = 'value', id_col = 'series_id')
+model = AutoTS(forecast_length=3, frequency='infer',
+               prediction_interval=0.9, ensemble=None,
+			   model_list='superfast',
+               max_generations=5, num_validations=2,
+			   validation_method='even')
+model = model.fit(df_long, date_col='datetime',
+				  value_col='value', id_col='series_id')
 
 # Print the name of the best model
 print(model)
 
 prediction = model.predict()
 # point forecasts dataframe
 forecasts_df = prediction.forecast
-# accuracy of all tried model results (not including cross validation)
-model_results = model.initial_results.model_results
-# and including cross validation
-validation_results = model.validation_results.model_results
+# accuracy of all tried model results
+model_results = model.results()
+# and aggregated from cross validation
+validation_results = model.results("validation")
 
 ```
 
-Check out [extended_tutorial.md](https://github.com/winedarksea/AutoTS/blob/master/extended_tutorial.md) for a more detailed guide to features!
+Check out [extended_tutorial.md](https://winedarksea.github.io/AutoTS/build/source/tutorial.html) for a more detailed guide to features!
 
-# How to Contribute:
+## How to Contribute:
 * Give feedback on where you find the documentation confusing
 * Use AutoTS and...
 	* Report errors and request features by adding Issues on GitHub
 
@@ -7,68 +7,172 @@
 * New models need only be sometimes applicable
 * Fault tolerance: it is perfectly acceptable for model parameters to fail on some datasets, the higher level API will pass over and use others.
 
+Latest:
+	Added Github Pages documentation
+	Changed default for `series_id` so it is no longer required if univariate
+	Changed default of `subset` to None.
+	Removed `weighted` parameter, now passing weights to .fit() alone is sufficient.
+	Fixed a bug where 'One or more series is 90% or more NaN' was printing when it shouldn't
+	Fixed (or more accurately, reduced) a bug where multiple initial runs were counting as validation runs.
+	Fixed bug where validation subsetting was behaving oddly
+	Fixed bug where regressor wasn't being passed to validation.
+	Renamed preord_ to future_ regressor.
+	Renamed sample datasets.
+	Allowed export of result_file as .pickle along with more complete object.
+	Added model_interrupt parameter to allow for manually skipping models when enabled.
+	Made serious efforts to make the code prettier with pylint, still lots to do, however...
+	Improved genetic recombination so optimal models should be reached more quickly
+	Improved Point to Probabilistic methods:
+		'historic_quantile' more stable quantile-based error ranges
+		'inferred normal' Bayesian-inspired method
+	Metrics:
+		Added Scaled Pinball Loss (SPL)
+		Removed upper/lower MAE
+	Improved ensembling with new parameter options
+		Recursive ensembling (ensemble of ensembles) now enabled
+	Validation:
+		Added 'seasonal' validation method
+	Categorical transformer improved, now tolerant to leaving bounds.
+	Added remove_leading_zeroes option for convenience.
+
+	Added a number of new Transformer options
+		Multiple new Sklearn-sourced transformers (QuantileTransformer, etc)
+		SinTrend
+		DifferencedDetrend
+		CumSumTransformer
+		PctChangeTransformer
+		PositiveShift Transformer
+		Log
+		IntermittentOccurrence
+		SeasonalDetrend
+		bkfilter and cffilter
+		DatepartRegression
+	Entirely changed the general transformer to add ~~three~~ four levels of transformation.
+	Allowed context_slicer to receive direct integer inputs
+	Added new 'Detrend' options to allow more sklearn linear models.
+
+	GLM
+		Error where it apparently won't tolerate any zeroes was compensated for.
+		Speed improvement.
+	RollingRegression
+		Added SVM model
+		Added option to tune some model parameters to sklearn
+		Added new feature construction parameters
+		Added RNNs with Keras
+	GluonTS:
+		fixed the use of context_length, added more options to that param
+	Dynamic Factor added uncertainty from Statsmodels Statespace
+	VARMAX added uncertainty from Statsmodels Statespace
+
+	New models:
+		SeasonalNaive model
+		VAR from Statsmodels (faster than VARMAX statespace)
+		MotifSimulation
+		WindowRegression
+		TensorflowSTS
+		TFPRegression
+		ComponentAnalysis
+
 # Errors: 
-'Detrend' transformation is still buggy (can't convert to Series)
-raise AttributeError(("Model String '{}' not recognized").format(model)) -> turn to an allowable exception with a printed warning
-Holiday not (always) working
+DynamicFactor holidays 	Exceptions 'numpy.ndarray' object has no attribute 'values'
+VECM does not recognize exog to predict
+ARIMA with User or Holiday ValueError('Can only compare identically-labeled DataFrame objects',)
+Drop Most Recent does not play well logically with added external (future) regressors.
+FastICA 'array must not contain infs or NaNs'
+How do fillna methods handle datasets that have entirely NaN series?
+VAR ValueError('Length of passed values is 4, index implies 9',)
+WindowRegression + KerasRNN + 1step + univariate = ValueError('Length mismatch: Expected axis has 54 elements, new values have 9 elements',)
+Is Template Eval Error: ValueError('array must not contain infs or NaNs',) related to Point to Probability HISTORIC QUANTILE?
+'Fake Date' doesn't work on entirely NaN series - ValueError('Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.',)
+
+
+### Ignored Errors:
+xgboost poisson loss does not accept negatives
+GluonTS not accepting quite a lot of frequencies
+KerasRNN errors due to parameters not working on all dataset
+Tensorflow GPU backend may crash on occasion.
+
+## General Tasks
+* test submission
+* test whether bottom up significantly overestimates on rollup
+	* store level hierarchial
 
 # To-Do
-* Get the sphinx (google style) documentation and readthedocs.io website up
-* Better point to probabilistic (uncertainty of naive last-value forecast) - linear reg of abs error of samples - simulations
-* get_prediction for Statsmodels Statespace models to include confidence interval where possible
+* drop duplicates as function of TemplateEvalObject
+* fake date dataset of many series to improve General Template
+* better document ensembling
+* 'fast' option for RandomTransformations generator
+* optimize randomtransform probabilities
+* Add to template: Gluon, Motif, WindowRegression
+* Convert 'Holiday' regressors into Datepart + Holiday 2d
+* best per series to validation template even if poor on score overall
+* Bring GeneralTransformer to higher level API.
+	* wide_to_long and long_to_wide in higher-level API
+* Option to use full traceback in errors in table
+* Hierarchial
+	* every level must be included in forecasting data
+	* 'bottom-up' and 'mid' levels
+	* one level. User would have to specify all as based on lowest-level keys if wanted sum-up.
+* Better point to probabilistic (uncertainty of naive last-value forecast) 
+	* linear reg of abs error of samples - simulations
+	* Data, pct change, find window with % max change pos, and neg then avg. Bound is first point + that percent, roll from that first point and adjust if points cross, variant where all series share knowledge
+	* Bayesian posterior update of prior
+	* variance of k nearest neighbors
+	* Data, split, normalize, find distribution exponentially weighted to most recent, center around forecast, shared variant
+	* Data quantile, recentered around median of forecast.
+	* Categorical class probabilities as range for RollingRegression
+* get_forecast for Statsmodels Statespace models to include confidence interval where possible
 	* migrate arima_model to arima.model
-* Check how fillna methods handle datasets that are entirely NaN
-* Better X_maker:
-	* use feature selection on TSFresh features - autocorrelation lag n, fft/cwt coefficients (abs), abs_energy
-	* date part and age/expanding regressors
+	* uncomp with uncertainty intervals
+* Window regression
+	* transfer learning
+* RollingRegression
+	* Better X_maker:
+		* 1d and 2d variations
+		* .cov, .skew, .kurt, .var
+		* https://link.springer.com/article/10.1007/s10618-019-00647-x/tables/1
+	* Probabilistic:
+		https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html
+	* other sequence models
+	* Categorical classifier, ComplementNB
+	* PCA or similar -> Univariate Series (Unobserved Components)
+* Simple performance:
+	* replace try/except with if/else in some cases
 * GluonTS
-	* Add support for preord_regressor
+	* Add support for future_regressor
+	* Make sure of rolling regression setup
 	* Modify GluonStart if lots of NaN at start of that series
 	* GPU and CPU ctx
-* Print methods for prediction/model objects that give simple readme how-to's
-* Get Tsfresh working with small dataset (short, 2 columns) (check feature importance again)
+* motif simulation, remove all those for loops
+* implement 'borrow' Genetic Recombination for ComponentAnalysis
+* Regressor to TensorflowSTS
 * Relative/Absolute Imports and reduce package reloading messages
-* Format of Regressor - allow multiple input to at least sklearn models
+* Allow FillNA to be None
+* Replace OrdinalEncoder with non-external code
 * 'Age' regressor as an option in addition to User/Holiday in ARIMA, etc.
-* Handle categorical forecasts where forecast leaves range of known values, then add to upper/lower forecasts
 * Speed improvements, Profiling
 * Parallelization, and Distributed options (Dask) for general greater speed
 * Improve usability on rarer frequenices (ie monthly data where some series start on 1st, others on 15th, etc.)
 * Figures: Add option to output figures of train/test + forecast, other performance figures
-* Pre-clustering on many time series
 * If all input are Int, convert floats back to int
-* Trim whitespace/case-desensitize on string inputs
-* Option to print % improvement of best over last value naive
-* Hierachial correction (bottom-up to start with)
-* Because I'm a biologist, incorporate more genetics and such. Also as a neuro person, there must be a way to fit networks in...
-* Improved verbosity controls and options. 
 * Replace most 'print' with logging.
-* Export as simpler code (as TPOT)
-* set up the lower-level API to be usable as pipelines
-	* allow stand-alone pipeline for transformation with format for export data format to other package requirements (use AutoTS just for preprocessing)
-* AIC metric, other accuracy metrics
-	* MAE of upper and lower forecasts, balance with Containment
-* Metric to measure if the series follows the same shape (Contour)
-	* Potentially % change between n and n-1, compare this % change between forecast and actual
-	* One if same direction, 0 otherwise (sum/len)
 * Analyze and return inaccuracy patterns (most inaccurate periods out, days of week, most inaccurate series)
 * Development tools:
-	* Add to Conda distribution as well as pip
+	* Add to Conda (Forge) distribution as well as pip
 	* Continuous integration
 	* Code/documentation quality checkers
-* Option to drop series which haven't had a value in last N days
-* More thorough use of setting random seed, verbose, n_jobs
-* For monthly data account for number of days in month
-* add constant to GLM
 * Ability to automatically add external datasets of parallel time series of global usability (ie from FRED or others)
+* make datetime input optional, just allow dataframes of numbers
 * Option to import either long or wide data
 * Infer column names for df_long to wide based on which is datetime, which is string, and which is numeric
 
+### Links
+* https://link.springer.com/article/10.1007/s10618-019-00647-x/tables/1
+* https://github.com/gantheory/TPA-LSTM
+* https://github.com/huseinzol05/Stock-Prediction-Models/tree/master/deep-learning
+
 ### Faster Convergence / Faster in General
 * Only search useful parameters, highest probability for most likely effective parameters
-* 'Expert' starting templates to try most likley combinations first
-* Recombine best two of each model parameters, if two or more present (plus option to disable this)
-* Recombination of transformations
 * Remove parameters that are rarely/never useful from get_new_params
 * Don't apply transformations to Zeroes naive, possibly other naives
 * Option to run generations until generations no longer see improvement of at least X % over n generations
@@ -77,35 +181,38 @@ Holiday not (always) working
 	* potentially a method = 'deep' to get_new_params used after n generations
 	* no unlock, but simply very low-probability deep options in get_new_params
 * Exempt or reduce slow models from unnecessary runs, particularly with different transformations
-* Numba and Cythion acceleration (metrics might be easy to start with)
+* Numba and Cython acceleration (metrics might be easy to start with)
+* GPU - xgboost, GluontTS
+
+#### New datasets:
+	Second level data that is music (like a radio stream)
+	Ecological data
 
 #### New Ensembles:
-	best 3 (unique algorithms not just variations of same)
-	forecast distance 30/30/30
-	best per series ensemble ('horizontal ensemble')
-	best point with best probalistic containment
+	Best N combined with Decision Tree
+
 #### New models:
-	Seasonal Naive
-	Last Value + Drift Naive
-	Simple Decomposition forecasting
-	Statespace variant of ETS which has Confidence Intervals
-	Tensorflow Probability Structural Time Series
-	Pytorch Simple LSTM/GRU
+	Croston, SBA, TSB, ADIDA, iMAPA
+	Local Linear/Piecewise Regression Model
 	Simulations
-	XGBoost (doesn't support multioutput directly)
-	Sklearn + TSFresh
-	Sktime
 	Ta-lib
+	Pyflux
 	tslearn
-	Multivariate GARCH
+	GARCH (arch library seems best maintained, none have multivariate)
 	pydlm - baysesian dynamic linear
-	Isotonic regression
-	Survival Analysis
 	MarkovAutoRegression
-	Motif discovery, and repeat
+	hmmlearn
 	TPOT if it adds multioutput functionality
 	https://towardsdatascience.com/pyspark-forecasting-with-pandas-udf-and-fb-prophet-e9d70f86d802
-	Compressive Transformer, if they go anywhere
+	Compressive Transformer
+	Reinforcement Learning
 
 #### New Transformations:
-	Test variations on 'RollingMean100thN'
+	Sklearn iterative imputer 
+	lag and beta to DifferencedTransformer to make it more of an AR process
+	Weighted moving average
+	Symbolic aggregate approximation (SAX) and (PAA) (basically these are just binning)
+	Shared discretization (all series get same shared binning)
+	Last Value Centering
+	Constraint as a transformation parameter
+	
@@ -3,19 +3,21 @@
 
 https://github.com/winedarksea/AutoTS
 """
-from autots.datasets import load_toy_hourly
-from autots.datasets import load_toy_daily
-from autots.datasets import load_toy_monthly
-from autots.datasets import load_toy_yearly
-from autots.datasets import load_toy_weekly
+from autots.datasets import load_hourly
+from autots.datasets import load_daily
+from autots.datasets import load_monthly
+from autots.datasets import load_yearly
+from autots.datasets import load_weekly
 
 from autots.evaluator.auto_ts import AutoTS
+from autots.tools.transform import GeneralTransformer
+from autots.tools.shaping import long_to_wide
 
-__version__ = '0.1.5'
+__version__ = '0.2.0'
 
 
-__all__ = ['load_toy_daily','load_toy_monthly', 'load_toy_yearly', 'load_toy_hourly', 'load_toy_weekly',
-           'AutoTS']
+__all__ = ['load_daily','load_monthly', 'load_yearly', 'load_hourly', 'load_weekly',
+           'AutoTS', 'GeneralTransformer', 'long_to_wide']
 
 # import logging
 # logger = logging.getLogger(__name__)