Replies: 8 comments 1 reply
-
|
I tried changing the backend (not sure why the default was "tensorflow_compat_v1") (base) chaztikov@priority:~$ python3
WARNING:tensorflow:From /home/chaztikov/anaconda3/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. WARNING:tensorflow:From /home/chaztikov/anaconda3/lib/python3.9/site-packages/deepxde/nn/initializers.py:118: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.
|
Beta Was this translation helpful? Give feedback.
-
|
Changing the backend to "tensorflow" did not seem to resolve my issue Using backend: tensorflow Enable just-in-time compilation with XLA. Set the default float type to float64 Warning: epochs is deprecated and will be removed in a future version. Use iterations instead. 2022-09-08 12:44:19.775379: I tensorflow/compiler/xla/service/service.cc:170] XLA service 0x564f3badd8a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: |
Beta Was this translation helpful? Give feedback.
-
|
I tried the example Burgers_RAR.py This did not run to completion, I am not sure what is wrong, here is the output from that example Using backend: tensorflow Enable just-in-time compilation with XLA. 2022-09-08 12:45:38.861163: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA Training model... WARNING:tensorflow:AutoGraph could not transform <function at 0x7f3b7e31fb80> and will run it as-is. coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) |
Beta Was this translation helpful? Give feedback.
-
|
If I deactivate conda and remove it and associated lines from bashrc, I get a different error for the Burgers_RAR example chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3 Burgers_RAR.py Enable just-in-time compilation with XLA. 2022-09-08 13:29:07.863662: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA Training model... WARNING:tensorflow:AutoGraph could not transform <function at 0x7f63ec563250> and will run it as-is. coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) |
Beta Was this translation helpful? Give feedback.
-
|
Still not sure what's wrong here, any advice is appreciated. It could be something very obvious. I didn't have this trouble on my last install, with Ubuntu 18.04 and python3.6. |
Beta Was this translation helpful? Give feedback.
-
|
Is there a more appropriate place to ask this question? Perhaps the tensorflow repo? |
Beta Was this translation helpful? Give feedback.
-
|
It should be due to some installation/configuration issue of tensorflow. You can try disable XLA: https://deepxde.readthedocs.io/en/latest/modules/deepxde.html#deepxde.config.disable_xla_jit |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm not sure why the error messages follow, I very recently installed Ubuntu 22.04 along with python 3.10, tensorflow, torch, deepxde. Torch and tensorflow seem to work and identify cuda/my gpu.
The script that I try to run below worked on my previous installation. Any idea on what this output is indicating? Seems to identify some issue with deepxde finding the gpu or not. Thanks.
chaztikov@priority:~/git/PINNs/main/examples/IBPINN/projects/forward_models/FOSLS/LinearElasticityFOSLS/HE_neohookean_planestress$ python3 singlerun_neohookean_hyperelastic_FOSLS_2D.py
Using backend: tensorflow.compat.v1
WARNING:tensorflow:From /home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Enable just-in-time compilation with XLA.
WARNING:tensorflow:From /home/chaztikov/.local/lib/python3.10/site-packages/deepxde/nn/initializers.py:118: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.
Set the default float type to float32
current_file_name
singlerun_neohookean_hyperelastic_FOSLS_2D
/home/chaztikov/git/PINNs/main/examples/IBPINN/projects/forward_models/FOSLS/LinearElasticityFOSLS/HE_neohookean_planestress/saved/
81 28 32
289 60 64
1024 124 124
3969 248 248
15876 500 500
num_points,num_boundary_points 1681 160
/home/chaztikov/git/PINNs/main/examples/IBPINN/projects/forward_models/FOSLS/LinearElasticityFOSLS/HE_neohookean_planestress/saved/
Warning: 10 points required, but 16 points sampled.
mexclusions
[]
Compiling model...
Building feed-forward neural network...
/home/chaztikov/.local/lib/python3.10/site-packages/deepxde/nn/tensorflow_compat_v1/fnn.py:103: UserWarning:
tf.layers.denseis deprecated and will be removed in a future version. Please usetf.keras.layers.Denseinstead.return tf.layers.dense(
'build' took 0.281838 s
2022-09-08 10:46:34.216905: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 10:46:34.765001: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-08 10:46:34.765048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9921 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
'compile' took 2.370059 s
Warning: epochs is deprecated and will be removed in a future version. Use iterations instead.
Initializing variables...
2022-09-08 10:46:36.332096: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
Training model...
2022-09-08 10:46:36.637194: I tensorflow/compiler/xla/service/service.cc:170] XLA service 0x7fa368009950 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-08 10:46:36.637232: I tensorflow/compiler/xla/service/service.cc:178] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2022-09-08 10:46:36.662312: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var
MLIR_CRASH_REPRODUCER_DIRECTORYto enable.2022-09-08 10:46:37.251477: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-08 10:46:37.252078: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-08 10:46:37.252130: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-08 10:46:37.253146: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-08 10:46:37.253219: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-09-08 10:46:37.255705: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-08 10:46:37.255731: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-08 10:46:37.256383: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-08 10:46:37.256444: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:640] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2022-09-08 10:46:37.387348: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-09-08 10:46:37.388380: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:330] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-09-08 10:46:37.391976: I tensorflow/compiler/jit/xla_compilation_cache.cc:478] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2022-09-08 10:46:37.392913: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:462 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1377, in _do_call
return fn(*args)
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) INTERNAL: libdevice not found at ./libdevice.10.bc
[[{{node cluster_0_1/xla_compile}}]]
[[cluster_0_1/merge_oidx_1/_3]]
(1) INTERNAL: libdevice not found at ./libdevice.10.bc
[[{{node cluster_0_1/xla_compile}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/chaztikov/git/PINNs/main/examples/IBPINN/projects/forward_models/FOSLS/LinearElasticityFOSLS/HE_neohookean_planestress/singlerun_neohookean_hyperelastic_FOSLS_2D.py", line 3190, in
losshistory, train_state = model.train(epochs=mepochs + 1, batch_size=mbatch_size,display_every=mdisp, disregard_previous_best=mdisregard_previous_best, callbacks=mcallback_list, model_save_path=save_path)
File "/home/chaztikov/.local/lib/python3.10/site-packages/deepxde/utils/internal.py", line 22, in wrapper
result = f(*args, **kwargs)
File "/home/chaztikov/.local/lib/python3.10/site-packages/deepxde/model.py", line 561, in train
self._test()
File "/home/chaztikov/.local/lib/python3.10/site-packages/deepxde/model.py", line 693, in _test
) = self._outputs_losses(
File "/home/chaztikov/.local/lib/python3.10/site-packages/deepxde/model.py", line 471, in _outputs_losses
return self.sess.run(outputs_losses, feed_dict=feed_dict)
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 967, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1190, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1370, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/home/chaztikov/.local/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1396, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
2 root error(s) found.
(0) INTERNAL: libdevice not found at ./libdevice.10.bc
[[{{node cluster_0_1/xla_compile}}]]
[[cluster_0_1/merge_oidx_1/_3]]
(1) INTERNAL: libdevice not found at ./libdevice.10.bc
[[{{node cluster_0_1/xla_compile}}]]
0 successful operations.
0 derived errors ignored.
Beta Was this translation helpful? Give feedback.
All reactions