Skip to content

Commit 75dbfb3

Browse files
authored
feat(feature): add auto time features
1 parent 287169a commit 75dbfb3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+5177
-660
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
## v.0.0.15
44
- fix classification/anomaly detection
5+
- fix from_pretrained
56

67

78
## v0.0.13

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
From tensorflow/tensorflow:2.16.1-gpu
22

33
RUN apt-get update
4-
RUN apt-get install -y libgl1-mesa-dev wget vim python3.8
4+
RUN apt-get install -y libgl1-mesa-dev wget vim python3.9
55

66
RUN pip install --no-cache-dir tfts
77

docs/source/index.rst

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,37 @@ TFTS Documentation
99

1010
<a class="github-button" href="https://github.com/LongxingTan/Time-series-prediction" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star LongxingTan/Time-series-prediction on GitHub">GitHub</a>
1111

12-
TFTS (TensorFlow Time Series) supports state-of-the-art deep learning time series models for both business cases and data competitions. The package provides:
12+
TFTS (TensorFlow Time Series) supports state-of-the-art deep learning time series models for production, research and data competitions. Specifically, the package provides:
1313

14-
* Flexible and powerful design for time series task
15-
* Advanced SOTA deep learning models
16-
* TFTS documentation lives at `time-series-prediction.readthedocs.io <https://time-series-prediction.readthedocs.io>`_
14+
* Flexible and powerful modular design for time series task
15+
* Easy-to-use advanced SOTA deep learning models
16+
* Allow training on CPUs, single and multiple GPUs, TPU
1717

1818

1919
Quick Start
2020
-----------------
21-
The tfts could accept any time series data of 3D data format as model input: ``(num_examples, train_sequence_length, num_features)``,
22-
and the model supported by tfts outputs 3D data as model output: ``(num_examples, predict_sequence_length, num_outputs)``
2321

22+
1. Requirements
23+
~~~~~~~~~~~~~~~~~~
2424

25-
Visit :ref:`Quick start <quick-start>` to learn more about the package.
25+
To get started with `tfts`, follow the steps below:
26+
27+
* Python 3.7 or higher
28+
* `TensorFlow 2.x <https://www.tensorflow.org/install/pip>`_ installation instructions
29+
30+
31+
2. Installation
32+
~~~~~~~~~~~~~~~~~~
33+
Now you are ready, proceed with
34+
35+
.. code-block:: shell
2636
27-
- :ref:`detailed installation instructions<installation>`
28-
- :ref:`how to use it<usage>`
37+
$ pip install tfts
38+
39+
2. Learn more
40+
~~~~~~~~~~~~~~~~~~
41+
42+
Visit :ref:`Quick start <quick-start>` to learn more about the package.
2943

3044

3145
Tutorials
@@ -38,18 +52,16 @@ The :ref:`Tutorials <tutorials>` section provides guidance on
3852

3953
Models
4054
---------
41-
The tfts library supports the SOTA deep learning models for time series.
4255

43-
- `TFTS BERT model <https://github.com/LongxingTan/KDDCup2022-Baidu>`_ wins the 3rd place in `Baidu KDD Cup 2022 <https://aistudio.baidu.com/aistudio/competition/detail/152/0/introduction>`_
44-
- `TFTS Seq2Seq model <https://github.com/LongxingTan/Data-competitions/tree/master/tianchi-enso-prediction>`_ wins the 4th place in `Alibaba Tianchi ENSO prediction <https://tianchi.aliyun.com/competition/entrance/531871/introduction>`_
45-
- :ref:`Learn more models <models>`
56+
1. Design a Custom Model with TFTS
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
4659

4760
.. code-block:: python
4861
4962
import tensorflow as tf
5063
from tfts import AutoConfig, AutoModel
5164
52-
5365
def build_model(use_model, input_shape):
5466
inputs = tf.keras.layers.Input(input_shape)
5567
config = AutoConfig.for_model(use_model)
@@ -64,13 +76,22 @@ The tfts library supports the SOTA deep learning models for time series.
6476
model.compile(optimizer, loss_fn)
6577
return model
6678
67-
6879
model = build_model(use_model="bert", input_shape=(24, 3))
6980
model.summary()
7081
7182
83+
2. More highlights
84+
~~~~~~~~~~~~~~~~~~~~~~~~
85+
86+
The tfts library supports the SOTA deep learning models for time series.
87+
88+
- `TFTS BERT model <https://github.com/LongxingTan/KDDCup2022-Baidu>`_ — 3rd place in `Baidu KDD Cup 2022 <https://aistudio.baidu.com/aistudio/competition/detail/152/0/introduction>`_
89+
- `TFTS Seq2Seq model <https://github.com/LongxingTan/Data-competitions/tree/master/tianchi-enso-prediction>`_ — 4th place in `Alibaba Tianchi ENSO prediction <https://tianchi.aliyun.com/competition/entrance/531871/introduction>`_
90+
- :ref:`Learn more models <models>`
91+
92+
7293
Tricks
73-
-------
94+
----------
7495
Visit :ref:`Tricks <tricks>` if you want to know more tricks to improve the prediction performance.
7596

7697

docs/source/models.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Some experiments of tfts in Kaggle Dataset
1111
Models supported
1212
------------------
1313

14-
You can you below models in ``AutoModel``
14+
You can use below models with ``AutoModel``
1515

1616
* RNN
1717
* Seq2seq
@@ -23,3 +23,8 @@ You can you below models in ``AutoModel``
2323
* NBeats
2424
* AutoFormer
2525
* Informer
26+
27+
.. code-block:: python
28+
29+
config = AutoConfig.for_model("seq2seq")
30+
model = AutoModel.from_config(config, predict_sequence_length=predict_sequence_length)

docs/source/quick-start.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,9 @@ The general setup for training and testing a model is
7474

7575
3.1 Prepare the data
7676
~~~~~~~~~~~~~~~~~~~~~~~~
77+
The tfts could accept any time series data of 3D data format as model input: ``(num_examples, train_sequence_length, num_features)``,
78+
and the model supported by tfts outputs 3D data as model output: ``(num_examples, predict_sequence_length, num_outputs)``
79+
7780
Before training, ensure your raw data is preprocessed into a 3D format with the shape ``(batch_size, train_steps, features)``. Perform any necessary data cleaning, normalization, or transformation steps to ensure the data is ready for training.
7881

7982

@@ -121,11 +124,8 @@ Run with pretrained weights
121124
model = AutoModel.from_pretrained("tfts-model")
122125
123126
124-
3.3 Evaluate the model
125-
~~~~~~~~~~~~~~~~~~~~~~~
126-
127127
128-
3.4 Serve the model
128+
3.3 Serve the model
129129
~~~~~~~~~~~~~~~~~~~~~~~
130130
Once the model is trained and evaluated, deploy it for inference. Ensure the model is saved in a format compatible with your serving environment (e.g., TensorFlow SavedModel, ONNX, etc.). Set up an API or service to handle incoming requests, preprocess input data, and return predictions in real-time.
131131

docs/source/tutorials.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,25 @@ Feed the input data into the model
4444
- array for single variable prediction
4545

4646

47+
Features
48+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49+
50+
- datetime features
51+
- static features
52+
- dynamic features
53+
54+
.. code-block:: python
55+
56+
from tfts.features import feature_registry, registry
57+
58+
feature_registry = feature_registry
59+
feature_registry.register(["some features"])
60+
61+
@registry
62+
def add_custom_features():
63+
return
64+
65+
4766
.. _train_models:
4867

4968
Train the models

examples/notebooks/multi_steps_sales_prediction.ipynb

Lines changed: 31 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,12 @@
7272
"source": [
7373
"import logging\n",
7474
"from typing import List, Optional, Union\n",
75+
"\n",
7576
"import numpy as np\n",
7677
"import pandas as pd\n",
7778
"import tensorflow as tf\n",
78-
"from tfts import AutoModel, AutoConfig, KerasTrainer"
79+
"\n",
80+
"from tfts import AutoConfig, AutoModel, KerasTrainer"
7981
]
8082
},
8183
{
@@ -103,7 +105,7 @@
103105
"class CFG:\n",
104106
" input_dir = \"/kaggle/input/china-vehicle-sales-data/china_vehicle_sales_data.csv\"\n",
105107
" train_sequence_length = 12\n",
106-
" predict_sequence_length = 3\n"
108+
" predict_sequence_length = 3"
107109
]
108110
},
109111
{
@@ -316,7 +318,7 @@
316318
},
317319
{
318320
"cell_type": "code",
319-
"execution_count": 5,
321+
"execution_count": null,
320322
"id": "ad6327cc",
321323
"metadata": {
322324
"execution": {
@@ -340,6 +342,7 @@
340342
"\n",
341343
"logger = logging.getLogger(__name__)\n",
342344
"\n",
345+
"\n",
343346
"def add_lagging_feature(\n",
344347
" data: pd.DataFrame,\n",
345348
" groupby_column: Union[str, List[str]],\n",
@@ -364,9 +367,6 @@
364367
" for lag in lags:\n",
365368
" feature_col_name = f\"{column}_lag{lag}\"\n",
366369
" feature_columns.append(feature_col_name)\n",
367-
" logger.debug(\n",
368-
" f\"Creating lagging feature: {feature_col_name} for column '{column}' with lag {lag} and groupby '{groupby_column}'.\"\n",
369-
" )\n",
370370
" data[feature_col_name] = data.groupby(groupby_column)[column].shift(lag)\n",
371371
" return data"
372372
]
@@ -759,7 +759,13 @@
759759
"source": [
760760
"feature_columns = []\n",
761761
"\n",
762-
"data = add_lagging_feature(data, groupby_column=[\"provinceId\", \"model\"], value_columns=[\"salesVolume\"], lags=list(range(1, 12)), feature_columns=feature_columns)\n",
762+
"data = add_lagging_feature(\n",
763+
" data,\n",
764+
" groupby_column=[\"provinceId\", \"model\"],\n",
765+
" value_columns=[\"salesVolume\"],\n",
766+
" lags=list(range(1, 12)),\n",
767+
" feature_columns=feature_columns,\n",
768+
")\n",
763769
"\n",
764770
"data"
765771
]
@@ -854,7 +860,9 @@
854860
],
855861
"source": [
856862
"grouped_sequence = data.groupby([\"provinceId\", \"model\"]).apply(\n",
857-
" lambda x: x.sort_values('Date')[[\"salesVolume\", \"salesVolume_lag1\", \"salesVolume_lag2\", \"salesVolume_lag3\"]].to_numpy()\n",
863+
" lambda x: x.sort_values(\"Date\")[\n",
864+
" [\"salesVolume\", \"salesVolume_lag1\", \"salesVolume_lag2\", \"salesVolume_lag3\"]\n",
865+
" ].to_numpy()\n",
858866
")\n",
859867
"\n",
860868
"data_3d = np.stack(grouped_sequence.values)\n",
@@ -902,27 +910,25 @@
902910
" self.total_samples = self.num_ids * self.samples_per_id\n",
903911
"\n",
904912
" # Precompute all valid (id, start_idx) pairs\n",
905-
" self.indices = [\n",
906-
" (i, j)\n",
907-
" for i in range(self.num_ids)\n",
908-
" for j in range(self.samples_per_id)\n",
909-
" ]\n",
910-
" \n",
913+
" self.indices = [(i, j) for i in range(self.num_ids) for j in range(self.samples_per_id)]\n",
914+
"\n",
911915
" def __getitem__(self, index):\n",
912-
" # batch-wise item \n",
913-
" batch_indices = self.indices[index * self.batch_size:(index + 1) * self.batch_size]\n",
914-
" \n",
916+
" # batch-wise item\n",
917+
" batch_indices = self.indices[index * self.batch_size : (index + 1) * self.batch_size]\n",
918+
"\n",
915919
" x_batch = []\n",
916920
" y_batch = []\n",
917921
"\n",
918922
" for id_idx, start_idx in batch_indices:\n",
919-
" x = self.data[id_idx, start_idx:start_idx + self.train_seq_len, 1:]\n",
920-
" y = self.data[id_idx, start_idx + self.train_seq_len:start_idx + self.train_seq_len + self.pred_seq_len, 0]\n",
923+
" x = self.data[id_idx, start_idx : start_idx + self.train_seq_len, 1:]\n",
924+
" y = self.data[\n",
925+
" id_idx, start_idx + self.train_seq_len : start_idx + self.train_seq_len + self.pred_seq_len, 0\n",
926+
" ]\n",
921927
" x_batch.append(x)\n",
922928
" y_batch.append(y)\n",
923929
"\n",
924930
" return np.nan_to_num(np.array(x_batch)), np.nan_to_num(np.array(y_batch))\n",
925-
" \n",
931+
"\n",
926932
" def __len__(self):\n",
927933
" # depends on how many samples you want to extract from 1 ID\n",
928934
" return int(np.ceil(len(self.indices) / self.batch_size))"
@@ -1086,14 +1092,14 @@
10861092
"source": [
10871093
"def build_model():\n",
10881094
" inputs = tf.keras.Input(shape=(CFG.train_sequence_length, 3))\n",
1089-
" \n",
1095+
"\n",
10901096
" config = AutoConfig()(\"rnn\")\n",
10911097
" config.rnn_type = \"lstm\"\n",
10921098
" backbone = AutoModel.from_config(config=config)\n",
1093-
" \n",
1099+
"\n",
10941100
" outputs = backbone(inputs)\n",
10951101
" model = tf.keras.Model(inputs=inputs, outputs=outputs)\n",
1096-
" model.compile(loss=tf.keras.losses.MeanAbsoluteError(), optimizer=tf.keras.optimizers.Adam(), metrics = ['mae'])\n",
1102+
" model.compile(loss=tf.keras.losses.MeanAbsoluteError(), optimizer=tf.keras.optimizers.Adam(), metrics=[\"mae\"])\n",
10971103
" return model\n",
10981104
"\n",
10991105
"\n",
@@ -1165,8 +1171,8 @@
11651171
}
11661172
],
11671173
"source": [
1168-
"history = model.fit(train_dataset, validation_data=valid_dataset, epochs=10) \n",
1169-
"model.save_weights('./sales_model.weights.h5')"
1174+
"history = model.fit(train_dataset, validation_data=valid_dataset, epochs=10)\n",
1175+
"model.save_weights(\"./sales_model.weights.h5\")"
11701176
]
11711177
}
11721178
],
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import unittest
22

3-
from tfts.datasets.get_data import get_air_passengers, get_data, get_sine
3+
from tfts.data.get_data import get_air_passengers, get_data, get_sine
44

55

66
class GetDataTest(unittest.TestCase):

0 commit comments

Comments
 (0)