Open
Description
System information
- Custom Code: YES
- OS: SUSE Linux Enterprise High Performance Computing 15 SP5
- TensorFlow installed from: DOCKER (tensorflow/tensorflow:2.16.1-gpu-jupyter)
- TensorFlow version: v2.16.1-0-g5bc9d26649c 2.16.1
- Python version: 3.11
- GPU model and memory: NVIDIA A100-PCIE-40GB
- Code to reproduce: find below
Describe the problem
I have a model comprising almost entirely of LSTM layers. If I load the same weights into a copy of the model instanced to run on CPU and GPU, results are different.
This issue disappears (the GPU results change to match CPU) if I change any of these:
- Move from
- SLES + NVIDIA A100 + Driver Version: 550.54.14 + CUDA Version: 12.4
to - Ubuntu 22.04.4 LTS NVIDIA V100 + Driver Version: 535.161.07 + CUDA Version: 12.2
- SLES + NVIDIA A100 + Driver Version: 550.54.14 + CUDA Version: 12.4
- Set keras.backend.set_floatx('float64')
- Use keras 3 instead of tf-keras
In all these cases, I'm running the same (official) docker image, in which my only modification has been to install tf-keras==2.16.0 and plotly.
Standalone code to reproduce the issue.
!pip install plotly
!pip install tf-keras==2.16.0
import os
import tensorflow as tf
import numpy as np
USE_TF_KERAS = True
if USE_TF_KERAS:
import tf_keras as keras
from tf_keras import layers
from tf_keras import initializers
from tf_keras import backend as K
else:
import keras
from keras import layers
from keras import initializers
from keras import backend as K
# Setting float64 as default dtype removes the discrepancy between CPU and GPU!
# keras.backend.set_floatx('float64')
from plotly import graph_objects as go
ROOT_DIR = os.getcwd()
n_time_steps = 800
theta = np.linspace(0, 2 * np.pi, n_time_steps).reshape(1, -1)
np.random.seed(42)
tf.random.set_seed(42)
dummy_input_dict = {
"input_a": 800
* np.stack((np.cos(theta), np.sin(theta)), axis=-1).astype(np.float32),
"input_b": np.random.rand(1, n_time_steps, 5).astype(np.float32),
}
def build_model():
input_a = layers.Input(shape=(n_time_steps, 2), name="input_a")
input_b = layers.Input(shape=(n_time_steps, 5), name="input_b")
x = layers.Concatenate()([input_a, input_b])
for idx in range(8):
lstm_layer = layers.LSTM(
1024,
kernel_initializer=initializers.RandomNormal(seed=42 + idx),
recurrent_initializer=initializers.RandomNormal(seed=52 + idx),
return_sequences=True,
)
x = lstm_layer(x)
y = layers.Dense(1)(x)
model = keras.Model(inputs=[input_a, input_b], outputs=y)
return model
def main(device):
with tf.device(device):
model = build_model()
model.load_weights("my_initial_weights.h5")
features = ["input_a", "input_b"]
dummy_input = [dummy_input_dict[k] for k in features]
preds = model.predict(dummy_input)
return preds
# Save one set of weights, so that we can compare the weights of the two models
with tf.device("/device:CPU:0"):
model = build_model()
model.save_weights("my_initial_weights.h5")
tf.config.list_logical_devices()
cpu_preds = main("/device:CPU:0")
gpu_preds = main("/device:GPU:0")
cpu_output = cpu_preds[0, :, 0]
gpu_output = gpu_preds[0, :, 0]
fig = go.Figure()
fig.add_trace(go.Scatter(y=cpu_output, name="CPU"))
fig.add_trace(go.Scatter(y=gpu_output, name="GPU"))
fig.show()
Resulting plot:
As mentioned at the beginning:
- changing host to my V100 host
- uncommenting
# keras.backend.set_floatx('float64')
- setting
USE_TF_KERAS = False
All workaround the issue, and the GPU prediction matches the CPU prediction.
I also re-iterate that all of this has been run in the official tensorflow/tensorflow:2.16.1-gpu-jupyter
container, on both hosts.