Timeseries example not reproducible

### Issue Type

Documentation Bug

### Source

binary

### Keras Version

3.7.0

### Custom Code

No

### OS Platform and Distribution

Ubuntu 22.04

### Python version

3.12.8

### GPU model and memory

RTX 5000 Ada

### Current Behavior?

Running the code from this [time series example](https://keras.io/examples/timeseries/timeseries_classification_transformer/) does not produce the same number of parameters as the example output in the documentation. Further, the model does not achieve stated accuracy.

**The colab link also has the same issue.**

What is strange is that the doc's final dense layer has ~64K params while running the code produces 2x the mlp unit input, which is 128. I tried increasing it to see if that fixed the problem, but it seems that there is something structurally different between how this code runs on 2.4 vs 3.7.0.

I expected the close to the same output as what is documented on the page.

### Standalone code to reproduce the issue or tutorial link

```shell
You can run the colab example:
* https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/timeseries/ipynb/timeseries_classification_transformer.ipynb)

or run the code located here:
* https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_classification_transformer.py
```


### Relevant log output

```shell
(tcap) travis@travis-p1-g6:~/projects/tetra_capital$ python scripts/example_classifications.py 
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)      │ (None, 500, 1)            │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention          │ (None, 500, 1)            │           7,169 │ input_layer[0][0],         │
│ (MultiHeadAttention)          │                           │                 │ input_layer[0][0]          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_1 (Dropout)           │ (None, 500, 1)            │               0 │ multi_head_attention[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization           │ (None, 500, 1)            │               2 │ dropout_1[0][0]            │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add (Add)                     │ (None, 500, 1)            │               0 │ layer_normalization[0][0], │
│                               │                           │                 │ input_layer[0][0]          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d (Conv1D)               │ (None, 500, 4)            │               8 │ add[0][0]                  │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_2 (Dropout)           │ (None, 500, 4)            │               0 │ conv1d[0][0]               │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_1 (Conv1D)             │ (None, 500, 1)            │               5 │ dropout_2[0][0]            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_1         │ (None, 500, 1)            │               2 │ conv1d_1[0][0]             │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_1 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_1[0][… │
│                               │                           │                 │ add[0][0]                  │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_1        │ (None, 500, 1)            │           7,169 │ add_1[0][0], add_1[0][0]   │
│ (MultiHeadAttention)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_4 (Dropout)           │ (None, 500, 1)            │               0 │ multi_head_attention_1[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_2         │ (None, 500, 1)            │               2 │ dropout_4[0][0]            │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_2 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_2[0][… │
│                               │                           │                 │ add_1[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_2 (Conv1D)             │ (None, 500, 4)            │               8 │ add_2[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_5 (Dropout)           │ (None, 500, 4)            │               0 │ conv1d_2[0][0]             │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_3 (Conv1D)             │ (None, 500, 1)            │               5 │ dropout_5[0][0]            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_3         │ (None, 500, 1)            │               2 │ conv1d_3[0][0]             │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_3 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_3[0][… │
│                               │                           │                 │ add_2[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_2        │ (None, 500, 1)            │           7,169 │ add_3[0][0], add_3[0][0]   │
│ (MultiHeadAttention)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_7 (Dropout)           │ (None, 500, 1)            │               0 │ multi_head_attention_2[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_4         │ (None, 500, 1)            │               2 │ dropout_7[0][0]            │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_4 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_4[0][… │
│                               │                           │                 │ add_3[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_4 (Conv1D)             │ (None, 500, 4)            │               8 │ add_4[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_8 (Dropout)           │ (None, 500, 4)            │               0 │ conv1d_4[0][0]             │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_5 (Conv1D)             │ (None, 500, 1)            │               5 │ dropout_8[0][0]            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_5         │ (None, 500, 1)            │               2 │ conv1d_5[0][0]             │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_5 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_5[0][… │
│                               │                           │                 │ add_4[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ multi_head_attention_3        │ (None, 500, 1)            │           7,169 │ add_5[0][0], add_5[0][0]   │
│ (MultiHeadAttention)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_10 (Dropout)          │ (None, 500, 1)            │               0 │ multi_head_attention_3[0]… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_6         │ (None, 500, 1)            │               2 │ dropout_10[0][0]           │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_6 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_6[0][… │
│                               │                           │                 │ add_5[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_6 (Conv1D)             │ (None, 500, 4)            │               8 │ add_6[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_11 (Dropout)          │ (None, 500, 4)            │               0 │ conv1d_6[0][0]             │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ conv1d_7 (Conv1D)             │ (None, 500, 1)            │               5 │ dropout_11[0][0]           │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ layer_normalization_7         │ (None, 500, 1)            │               2 │ conv1d_7[0][0]             │
│ (LayerNormalization)          │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ add_7 (Add)                   │ (None, 500, 1)            │               0 │ layer_normalization_7[0][… │
│                               │                           │                 │ add_6[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ global_average_pooling1d      │ (None, 1)                 │               0 │ add_7[0][0]                │
│ (GlobalAveragePooling1D)      │                           │                 │                            │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense)                 │ (None, 2048)              │           4,096 │ global_average_pooling1d[… │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_12 (Dropout)          │ (None, 2048)              │               0 │ dense[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense)               │ (None, 2)                 │           4,098 │ dropout_12[0][0]           │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
 Total params: 36,938 (144.29 KB)
 Trainable params: 36,938 (144.29 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 22s 284ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6927 - val_sparse_categorical_accuracy: 0.5354
Epoch 2/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 9s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5024 - val_loss: 0.6925 - val_sparse_categorical_accuracy: 0.5354
Epoch 3/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 84ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5005 - val_loss: 0.6926 - val_sparse_categorical_accuracy: 0.5354
Epoch 4/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5031 - val_loss: 0.6925 - val_sparse_categorical_accuracy: 0.5354
Epoch 5/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5155 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 6/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6933 - sparse_categorical_accuracy: 0.5004 - val_loss: 0.6924 - val_sparse_categorical_accuracy: 0.5354
Epoch 7/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6931 - sparse_categorical_accuracy: 0.5078 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 8/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5096 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 9/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5131 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 10/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6928 - sparse_categorical_accuracy: 0.5196 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 11/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5021 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 12/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6934 - sparse_categorical_accuracy: 0.4936 - val_loss: 0.6924 - val_sparse_categorical_accuracy: 0.5354
Epoch 13/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6928 - sparse_categorical_accuracy: 0.5176 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 14/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6933 - sparse_categorical_accuracy: 0.4975 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 15/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5098 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 16/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5078 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 17/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6927 - sparse_categorical_accuracy: 0.5171 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 18/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5118 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 19/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 20/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 84ms/step - loss: 0.6932 - sparse_categorical_accuracy: 0.5029 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 21/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5075 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 22/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6929 - sparse_categorical_accuracy: 0.5145 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 23/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5101 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
Epoch 24/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5090 - val_loss: 0.6923 - val_sparse_categorical_accuracy: 0.5354
Epoch 25/150
45/45 ━━━━━━━━━━━━━━━━━━━━ 4s 85ms/step - loss: 0.6930 - sparse_categorical_accuracy: 0.5079 - val_loss: 0.6922 - val_sparse_categorical_accuracy: 0.5354
42/42 ━━━━━━━━━━━━━━━━━━━━ 5s 58ms/step - loss: 0.6925 - sparse_categorical_accuracy: 0.5264 0
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeseries example not reproducible #2018

Issue Type

Source

Keras Version

Custom Code

OS Platform and Distribution

Python version

GPU model and memory

Current Behavior?

Standalone code to reproduce the issue or tutorial link

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeseries example not reproducible #2018

Description

Issue Type

Source

Keras Version

Custom Code

OS Platform and Distribution

Python version

GPU model and memory

Current Behavior?

Standalone code to reproduce the issue or tutorial link

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions