Open
Description
Issue Type
Documentation Bug
Source
binary
Keras Version
3.7.0
Custom Code
No
OS Platform and Distribution
Ubuntu 22.04
Python version
3.12.8
GPU model and memory
RTX 5000 Ada
Current Behavior?
Running the code from this time series example does not produce the same number of parameters as the example output in the documentation. Further, the model does not achieve stated accuracy.
The colab link also has the same issue.
What is strange is that the doc's final dense layer has ~64K params while running the code produces 2x the mlp unit input, which is 128. I tried increasing it to see if that fixed the problem, but it seems that there is something structurally different between how this code runs on 2.4 vs 3.7.0.
I expected the close to the same output as what is documented on the page.
Standalone code to reproduce the issue or tutorial link
You can run the colab example:
* https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/timeseries/ipynb/timeseries_classification_transformer.ipynb)
or run the code located here:
* https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_classification_transformer.py
Relevant log output
(tcap) travis@travis-p1-g6:~/projects/tetra_capital$ python scripts/example_classifications.py
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 500, 1) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention │ (None, 500, 1) │ 7,169 │ input_layer[0][0], │
│ (MultiHeadAttention) │ │ │ input_layer[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_1 (Dropout) │ (None, 500, 1) │ 0 │ multi_head_attention[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization │ (None, 500, 1) │ 2 │ dropout_1[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add (Add) │ (None, 500, 1) │ 0 │ layer_normalization[0][0], │
│ │ │ │ input_layer[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d (Conv1D) │ (None, 500, 4) │ 8 │ add[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_2 (Dropout) │ (None, 500, 4) │ 0 │ conv1d[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_1 (Conv1D) │ (None, 500, 1) │ 5 │ dropout_2[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_1 │ (None, 500, 1) │ 2 │ conv1d_1[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_1 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_1[0][… │
│ │ │ │ add[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_1 │ (None, 500, 1) │ 7,169 │ add_1[0][0], add_1[0][0] │
│ (MultiHeadAttention) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_4 (Dropout) │ (None, 500, 1) │ 0 │ multi_head_attention_1[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_2 │ (None, 500, 1) │ 2 │ dropout_4[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_2 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_2[0][… │
│ │ │ │ add_1[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_2 (Conv1D) │ (None, 500, 4) │ 8 │ add_2[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_5 (Dropout) │ (None, 500, 4) │ 0 │ conv1d_2[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_3 (Conv1D) │ (None, 500, 1) │ 5 │ dropout_5[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_3 │ (None, 500, 1) │ 2 │ conv1d_3[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_3 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_3[0][… │
│ │ │ │ add_2[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_2 │ (None, 500, 1) │ 7,169 │ add_3[0][0], add_3[0][0] │
│ (MultiHeadAttention) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_7 (Dropout) │ (None, 500, 1) │ 0 │ multi_head_attention_2[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_4 │ (None, 500, 1) │ 2 │ dropout_7[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_4 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_4[0][… │
│ │ │ │ add_3[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_4 (Conv1D) │ (None, 500, 4) │ 8 │ add_4[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_8 (Dropout) │ (None, 500, 4) │ 0 │ conv1d_4[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_5 (Conv1D) │ (None, 500, 1) │ 5 │ dropout_8[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_5 │ (None, 500, 1) │ 2 │ conv1d_5[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_5 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_5[0][… │
│ │ │ │ add_4[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_3 │ (None, 500, 1) │ 7,169 │ add_5[0][0], add_5[0][0] │
│ (MultiHeadAttention) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_10 (Dropout) │ (None, 500, 1) │ 0 │ multi_head_attention_3[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_6 │ (None, 500, 1) │ 2 │ dropout_10[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_6 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_6[0][… │
│ │ │ │ add_5[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_6 (Conv1D) │ (None, 500, 4) │ 8 │ add_6[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_11 (Dropout) │ (None, 500, 4) │ 0 │ conv1d_6[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_7 (Conv1D) │ (None, 500, 1) │ 5 │ dropout_11[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_7 │ (None, 500, 1) │ 2 │ conv1d_7[0][0] │
│ (LayerNormalization) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_7 (Add) │ (None, 500, 1) │ 0 │ layer_normalization_7[0][… │
│ │ │ │ add_6[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ global_average_pooling1d │ (None, 1) │ 0 │ add_7[0][0] │
│ (GlobalAveragePooling1D) │ │ │ │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense) │ (None, 2048) │ 4,096 │ global_average_pooling1d[… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_12 (Dropout) │ (None, 2048) │ 0 │ dense[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense) │ (None, 2) │ 4,098 │ dropout_12[0][0] │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
Total params: 36,938 (144.29 KB)
Trainable params: 36,938 (144.29 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 22s 284ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6927 - val_sparse_categorical_accuracy: 0.5354
Epoch 2/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 9s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5024 - val_loss: 0.6925 - val_sparse_categorical_accuracy: 0.5354
Epoch 3/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 84ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5005 - val_loss: 0.6926 - val_sparse_categorical_accuracy: 0.5354
Epoch 4/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5031 - val_loss: 0.6925 - val_sparse_categorical_accuracy: 0.5354
Epoch 5/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5155 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 6/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6933 - sparse_categorical_accuracy: 0.5004 - val_loss: 0.6924 - val_sparse_categorical_accuracy: 0.5354
Epoch 7/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5078 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 8/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5096 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 9/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5131 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 10/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6928 - sparse_categorical_accuracy: 0.5196 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 11/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5021 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 12/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6934 - sparse_categorical_accuracy: 0.4936 - val_loss: 0.6924 - val_sparse_categorical_accuracy: 0.5354
Epoch 13/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6928 - sparse_categorical_accuracy: 0.5176 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 14/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6933 - sparse_categorical_accuracy: 0.4975 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 15/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5098 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 16/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5078 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 17/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6927 - sparse_categorical_accuracy: 0.5171 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 18/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5118 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 19/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 20/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 84ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5029 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 21/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5075 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 22/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5145 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 23/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5101 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 24/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5090 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 25/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
42/42 ━━━━━━━━━━━━━━━━━━━━ 5s 58ms/step - loss: 0.6925 - sparse_categorical_accuracy: 0.5264 0
Metadata
Metadata
Assignees
Labels
No labels