Skip to content

Commit 830156b

Browse files
authored
Merge pull request #251 from winedarksea/dev
0.6.16
2 parents 6e473e8 + 7e97114 commit 830156b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+8026
-981
lines changed

README.md

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ A combination of metrics and cross-validation options, the ability to apply subs
2525
* [Installation](https://github.com/winedarksea/AutoTS#installation)
2626
* [Basic Use](https://github.com/winedarksea/AutoTS#basic-use)
2727
* [Tips for Speed and Large Data](https://github.com/winedarksea/AutoTS#tips-for-speed-and-large-data)
28+
* [Flowchart](https://github.com/winedarksea/AutoTS#autots-process)
2829
* Extended Tutorial [GitHub](https://github.com/winedarksea/AutoTS/blob/master/extended_tutorial.md) or [Docs](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html)
2930
* [Production Example](https://github.com/winedarksea/AutoTS/blob/master/production_example.py)
3031

@@ -59,10 +60,10 @@ df = load_daily(long=long)
5960

6061
model = AutoTS(
6162
forecast_length=21,
62-
frequency='infer',
63+
frequency="infer",
6364
prediction_interval=0.9,
64-
ensemble='auto',
65-
model_list="fast", # "superfast", "default", "fast_parallel"
65+
ensemble=None,
66+
model_list="superfast", # "fast", "default", "fast_parallel"
6667
transformer_list="fast", # "superfast",
6768
drop_most_recent=1,
6869
max_generations=4,
@@ -133,4 +134,37 @@ Also take a look at the [production_example.py](https://github.com/winedarksea/A
133134
* And, of course, contributing to the codebase directly on GitHub.
134135

135136

137+
## AutoTS Process
138+
```mermaid
139+
flowchart TD
140+
A[Initiate AutoTS Model] --> B[Import Template]
141+
B --> C[Load Data]
142+
C --> D[Split Data Into Initial Train/Test Holdout]
143+
D --> E[Run Initial Template Models]
144+
E --> F[Evaluate Accuracy Metrics on Results]
145+
F --> G[Generate Score from Accuracy Metrics]
146+
G --> H{Max Generations Reached or Timeout?}
147+
148+
H -->|No| I[Evaluate All Previous Templates]
149+
I --> J[Genetic Algorithm Combines Best Results and New Random Parameters into New Template]
150+
J --> K[Run New Template Models and Evaluate]
151+
K --> G
152+
153+
H -->|Yes| L[Select Best Models by Score for Validation Template]
154+
L --> M[Run Validation Template on Additional Holdouts]
155+
M --> N[Evaluate and Score Validation Results]
156+
N --> O{Create Ensembles?}
157+
158+
O -->|Yes| P[Generate Ensembles from Validation Results]
159+
P --> Q[Run Ensembles Through Validation]
160+
Q --> N
161+
162+
O -->|No| R[Export Best Models Template]
163+
R --> S[Select Single Best Model]
164+
S --> T[Generate Future Time Forecast]
165+
T --> U[Visualize Results]
166+
167+
R --> B[Import Best Models Template]
168+
```
169+
136170
*Also known as Project CATS (Catlin's Automated Time Series) hence the logo.*

TODO.md

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,34 @@
1313
* Forecasts are desired for the future immediately following the most recent data.
1414
* trimmed_mean to AverageValueNaive
1515

16-
# 0.6.15 🇺🇦 🇺🇦 🇺🇦
17-
* Constraint transformer added
18-
* historical_growth constraint method added
19-
* fft as multivariate_feature for Cassandra
20-
* None trend_window now searched as part of Cassandra
21-
* "quarterlydayofweek" method added for datepart
22-
* threshold_method arg to AlignLastValue
23-
* general tempate updated
24-
* slight change to MATSE metric, now only abs values for scaling
25-
* additional args to DatepartRegression
26-
* bug fixes
16+
# 0.6.16 🇺🇦 🇺🇦 🇺🇦
17+
* export_template added focus_models option
18+
* added OneClassSVM and GaussianMixture anomaly model options
19+
* added plot_unpredictability_score
20+
* added a few more NeuralForecast search options
21+
* bounds_only to Constraint transformer
22+
* updates for deprecated upstream args
23+
* FIRFilter transformer added
24+
* mle and imle downscaled to reduce score imbalance issues with these two in generate score
25+
* SectionalMotif now more robust to forecast lengths longer than history
26+
* new transformer and metric options for SectionalMotif
27+
* NaN robustness to matse
28+
* 'round' option to Constraint
29+
* minor change to mosaic min style ensembles to remove edge case errors
30+
* 'mosaic-profile', 'filtered', 'unpredictability_adjusted' and 'median' style mosaics added
31+
* updated profiler, and improved feature generation for horizontal generalization
32+
* changepoint style trend as an option to GLM and GLS
33+
* added ShiftFirstValue which is only a minor nuance on PositiveShift transformer
34+
* added BasicLinearModel model
35+
* datepart_method, scale, and fourier encodig to WindowRegression
36+
* trimmed_mean and more date part options to SeasonalityMotif
37+
* some additional options to MultivariateRegression
38+
* added ThetaTransformer
39+
* added TVVAR model (time varying VAR)
40+
* added ChangepointDetrend transformer
41+
* added MeanPercentSplitter transformer
42+
* updated load_daily with more recent history
43+
* added support for passing a custom metric
2744

2845
### Unstable Upstream Pacakges (those that are frequently broken by maintainers)
2946
* Pytorch-Forecasting

autots/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
from autots.models.cassandra import Cassandra
2828

2929

30-
__version__ = '0.6.15'
30+
__version__ = '0.6.16'
3131

3232
TransformTS = GeneralTransformer
3333

autots/datasets/_base.py

Lines changed: 85 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,54 @@
1212
def load_daily(long: bool = True):
1313
"""Daily sample data.
1414
15+
```
16+
# most of the wiki data was chosen to show holidays or holiday-like patterns
1517
wiki = [
16-
"Germany", "Thanksgiving", 'all', 'Microsoft',
17-
"Procter_%26_Gamble", "YouTube", "United_States", "Elizabeth_II",
18-
"William_Shakespeare", "Cleopatra", "George_Washington",
19-
"Chinese_New_Year", "Standard_deviation", "Christmas",
20-
"List_of_highest-grossing_films",
21-
"List_of_countries_that_have_gained_independence_from_the_United_Kingdom",
22-
"Periodic_table"
18+
'United_States',
19+
'Germany',
20+
'List_of_highest-grossing_films',
21+
'Jesus',
22+
'Michael_Jackson',
23+
'List_of_United_States_cities_by_population',
24+
'Microsoft_Office',
25+
'Google_Chrome',
26+
'Periodic_table',
27+
'Standard_deviation',
28+
'Easter',
29+
'Christmas',
30+
'Chinese_New_Year',
31+
'Thanksgiving',
32+
'List_of_countries_that_have_gained_independence_from_the_United_Kingdom',
33+
'History_of_the_hamburger',
34+
'Elizabeth_II',
35+
'William_Shakespeare',
36+
'George_Washington',
37+
'Cleopatra',
38+
'all'
2339
]
2440
41+
df2 = load_live_daily(
42+
observation_start="2017-01-01", weather_years=7, trends_list=None,
43+
gov_domain_list=None, wikipedia_pages=wiki,
44+
fred_series=['DGS10', 'T5YIE', 'SP500','DEXUSEU'], sleep_seconds=10,
45+
fred_key = "93873d40f10c20fe6f6e75b1ad0aed4d",
46+
weather_data_types = ["WSF2", "PRCP"],
47+
weather_stations = ["USW00014771"], # looking for intermittent
48+
tickers=None, london_air_stations=None,
49+
weather_event_types=None, earthquake_min_magnitude=None,
50+
)
51+
data_file_name = join("autots", "datasets", 'data', 'holidays.zip')
52+
df2.to_csv(
53+
data_file_name,
54+
index=True,
55+
compression={
56+
'method': 'zip',
57+
'archive_name': 'holidays.csv',
58+
'compresslevel': 9 # Maximum compression level (0-9)
59+
}
60+
)
61+
```
62+
2563
Sources: Wikimedia Foundation
2664
2765
Args:
@@ -224,8 +262,8 @@ def load_live_daily(
224262
tickers: list = ["MSFT"],
225263
trends_list: list = ["forecasting", "cycling", "microsoft"],
226264
trends_geo: str = "US",
227-
weather_data_types: list = ["AWND", "WSF2", "TAVG"],
228-
weather_stations: list = ["USW00094846", "USW00014925"],
265+
weather_data_types: list = ["AWND", "WSF2", "TAVG", "PRCP"],
266+
weather_stations: list = ["USW00094846", "USW00014925", "USW00014771"],
229267
weather_years: int = 5,
230268
london_air_stations: list = ['CT3', 'SK8'],
231269
london_air_species: str = "PM25",
@@ -769,14 +807,42 @@ def load_artificial(long=False, date_start=None, date_end=None):
769807
date_end = date_end.date()
770808
if date_start is None:
771809
if isinstance(date_end, datetime.date):
772-
date_start = date_end - datetime.timedelta(days=720)
810+
date_start = date_end - datetime.timedelta(days=740)
773811
else:
774-
date_start = datetime.datetime.now().date() - datetime.timedelta(days=720)
812+
date_start = datetime.datetime.now().date() - datetime.timedelta(days=740)
775813
if isinstance(date_start, datetime.datetime):
776814
date_start = date_start.date()
777815
dates = pd.date_range(date_start, date_end)
778816
size = dates.size
817+
new_size = int(size / 10)
779818
rng = np.random.default_rng()
819+
holiday = pd.Series(
820+
np.arange(size) * 0.025
821+
+ rng.normal(0, 0.2, size)
822+
+ (np.sin((np.pi / 7) * np.arange(size)) * 0.5),
823+
index=dates,
824+
name='holiday',
825+
)
826+
# January 1st
827+
holiday[holiday.index.month == 1 & (holiday.index.day == 1)] += 10
828+
# December 25th
829+
holiday[(holiday.index.month == 12) & (holiday.index.day == 25)] += -4
830+
# Second Tuesday of April
831+
# Find all Tuesdays in April
832+
second_tuesday_of_april = (
833+
(holiday.index.month == 4)
834+
& (holiday.index.weekday == 1)
835+
& (holiday.index.day >= 8)
836+
& (holiday.index.day <= 14)
837+
)
838+
holiday[second_tuesday_of_april] += 10
839+
# Last Monday of August
840+
last_monday_of_august = (
841+
(holiday.index.month == 8)
842+
& (holiday.index.weekday == 0)
843+
& ((holiday.index + pd.Timedelta(7, unit='D')).month == 9)
844+
)
845+
holiday[last_monday_of_august] += 12
780846

781847
df_wide = pd.DataFrame(
782848
{
@@ -810,6 +876,13 @@ def load_artificial(long=False, date_start=None, date_end=None):
810876
/ 2,
811877
),
812878
"linear": np.arange(size) * 0.025,
879+
"flat": 1,
880+
"new_product": np.concatenate(
881+
[
882+
np.zeros(int(size - new_size)),
883+
np.random.choice(a=[-0.8, 0, 0.8], size=new_size).cumsum(),
884+
]
885+
),
813886
"sine_wave": np.sin(np.arange(size)),
814887
"sine_seasonality_monthweek": (
815888
(np.sin((np.pi / 7) * np.arange(size)) * 0.25 + 0.25)
@@ -902,6 +975,7 @@ def load_artificial(long=False, date_start=None, date_end=None):
902975
},
903976
index=dates,
904977
)
978+
df_wide = df_wide.merge(holiday, left_index=True, right_index=True)
905979

906980
if not long:
907981
return df_wide

autots/datasets/data/holidays.zip

48.2 KB
Binary file not shown.

autots/evaluator/anomaly_detector.py

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def detect(self, df):
147147
self.anomalies[mask_replace] = 1
148148
return self.anomalies, self.scores
149149

150-
def plot(self, series_name=None, title=None, plot_kwargs={}):
150+
def plot(self, series_name=None, title=None, marker_size=None, plot_kwargs={}):
151151
import matplotlib.pyplot as plt
152152

153153
if series_name is None:
@@ -162,7 +162,14 @@ def plot(self, series_name=None, title=None, plot_kwargs={}):
162162
series_anom = self.anomalies[series_name]
163163
i_anom = series_anom[series_anom == -1].index
164164
if len(i_anom) > 0:
165-
ax.scatter(i_anom.tolist(), self.df.loc[i_anom, :][series_name], c="red")
165+
if marker_size is None:
166+
marker_size = max(20, fig.dpi * 0.45)
167+
ax.scatter(
168+
i_anom.tolist(),
169+
self.df.loc[i_anom, :][series_name],
170+
c="red",
171+
s=marker_size,
172+
)
166173

167174
def fit(self, df):
168175
return self.detect(df)
@@ -230,8 +237,8 @@ def get_new_params(method="random"):
230237

231238
if preforecast or method_choice == "prediction_interval":
232239
forecast_params = random_model(
233-
model_list=['LastValueNaive', 'GLS', 'RRVAR'],
234-
model_prob=[0.8, 0.1, 0.1],
240+
model_list=['LastValueNaive', 'GLS', 'RRVAR', "SeasonalityMotif"],
241+
model_prob=[0.8, 0.1, 0.05, 0.05],
235242
transformer_max_depth=5,
236243
transformer_list="superfast",
237244
keyword_format=True,
@@ -256,8 +263,9 @@ def __init__(
256263
use_wkdeom_holidays=True,
257264
use_lunar_holidays=True,
258265
use_lunar_weekday=False,
259-
use_islamic_holidays=True,
260-
use_hebrew_holidays=True,
266+
use_islamic_holidays=False,
267+
use_hebrew_holidays=False,
268+
use_hindu_holidays=False,
261269
output: str = "multivariate",
262270
n_jobs: int = 1,
263271
):
@@ -292,6 +300,7 @@ def __init__(
292300
self.use_lunar_weekday = use_lunar_weekday
293301
self.use_islamic_holidays = use_islamic_holidays
294302
self.use_hebrew_holidays = use_hebrew_holidays
303+
self.use_hindu_holidays = use_hindu_holidays
295304
self.n_jobs = n_jobs
296305
self.output = output
297306
self.anomaly_model = AnomalyDetector(
@@ -313,6 +322,7 @@ def detect(self, df):
313322
self.lunar_weekday,
314323
self.islamic_holidays,
315324
self.hebrew_holidays,
325+
self.hindu_holidays,
316326
) = anomaly_df_to_holidays(
317327
self.anomaly_model.anomalies,
318328
splash_threshold=self.splash_threshold,
@@ -328,6 +338,7 @@ def detect(self, df):
328338
use_lunar_weekday=self.use_lunar_weekday,
329339
use_islamic_holidays=self.use_islamic_holidays,
330340
use_hebrew_holidays=self.use_hebrew_holidays,
341+
use_hindu_holidays=self.use_hindu_holidays,
331342
)
332343

333344
def plot_anomaly(self, kwargs={}):
@@ -338,6 +349,7 @@ def plot(
338349
series_name=None,
339350
include_anomalies=True,
340351
title=None,
352+
marker_size=None,
341353
plot_kwargs={},
342354
series=None,
343355
):
@@ -355,6 +367,8 @@ def plot(
355367
)
356368
fig, ax = plt.subplots()
357369
self.df[series_name].plot(ax=ax, title=title, **plot_kwargs)
370+
if marker_size is None:
371+
marker_size = max(20, fig.dpi * 0.45)
358372
if include_anomalies:
359373
# directly copied from above
360374
if self.anomaly_model.output == "univariate":
@@ -366,13 +380,21 @@ def plot(
366380
i_anom = series_anom[series_anom == -1].index
367381
if len(i_anom) > 0:
368382
ax.scatter(
369-
i_anom.tolist(), self.df.loc[i_anom, :][series_name], c="red"
383+
i_anom.tolist(),
384+
self.df.loc[i_anom, :][series_name],
385+
c="red",
386+
s=marker_size,
370387
)
371388
# now the actual holidays
372389
i_anom = self.dates_to_holidays(self.df.index, style="series_flag")[series_name]
373390
i_anom = i_anom.index[i_anom == 1]
374391
if len(i_anom) > 0:
375-
ax.scatter(i_anom.tolist(), self.df.loc[i_anom, :][series_name], c="green")
392+
ax.scatter(
393+
i_anom.tolist(),
394+
self.df.loc[i_anom, :][series_name],
395+
c="green",
396+
s=marker_size,
397+
)
376398

377399
def dates_to_holidays(self, dates, style="flag", holiday_impacts=False):
378400
"""Populate date information for a given pd.DatetimeIndex.
@@ -400,6 +422,7 @@ def dates_to_holidays(self, dates, style="flag", holiday_impacts=False):
400422
lunar_weekday=self.lunar_weekday,
401423
islamic_holidays=self.islamic_holidays,
402424
hebrew_holidays=self.hebrew_holidays,
425+
hindu_holidays=self.hindu_holidays,
403426
)
404427

405428
def fit(self, df):

0 commit comments

Comments
 (0)