Skip to content

Failed to reproduce the results #2

@zhuyy0810

Description

@zhuyy0810

I attempted to reproduce the experiment in section 5.2.1 of the paper, where DeepONet is used to solve the advection equation. I used the same model architecture and training parameters as described in the paper, with a Trunk size of 4×512, a Branch size of 2×512, and trained for 250,000 iterations. However, the training time and memory usage differ significantly from the results in the paper. When using mixed-precision training, the training time was 2680.338209 seconds, and the memory usage was 736MB. With fp32 training, the training time was 3127.268575 seconds, and the memory usage was 736MB. My environment has TensorFlow version 2.13.1, DeepXDE version 1.10.1, and I trained on NVIDIA GeForce RTX 3090 GPU. When I ran advec_mixed_prec.py, I encountered the error 'The global policy can only be set in TensorFlow 2 or if V2 dtype behavior has been set. To enable V2 dtype behavior, call "tf.compat.v1.keras.layers.enable_v2_dtype_behavior()".' Therefore, I added tf.compat.v1.keras.layers.enable_v2_dtype_behavior() before policy = mixed_precision.Policy('mixed_float16'). The rest of advec_mixed_prec.py and Advection.py, except for the training parameter settings in the main function, are the same as the ones on GitHub.. Below is the main function of the codes I used to train DeepONet with mixed precision and fp32.

nt = 40
nx = 40
x_train, y_train = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/train_IC2.npz")
x_test, y_test = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/test_IC2.npz")
data = dde.data.TripleCartesianProd(x_train, y_train, x_test, y_test)

net = dde.maps.DeepONetCartesianProd(
    [nx, 512, 512], [2, 512, 512, 512, 512], "relu", "Glorot normal"
)

model = dde.Model(data, net)
# model.callbacks.append(time_callback(verbose=1))
model.compile(
    "adam",
    lr=1e-3,
    decay=("inverse time", 1, 1e-4),
    metrics=["mean l2 relative error"],
)

# IC1
# losshistory, train_state = model.train(epochs=100000, batch_size=None)
# IC2
# time_callback = TimeCallback()
losshistory, train_state = model.train(epochs=250000, batch_size=None)

y_pred = model.predict(data.test_x)
np.savetxt("y_pred_deeponet.dat", y_pred[0].reshape(nt, nx))
np.savetxt("y_true_deeponet.dat", data.test_y[0].reshape(nt, nx))
np.savetxt("y_error_deeponet.dat", (y_pred[0] - data.test_y[0]).reshape(nt, nx))

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions