generated from saturncloud/dask-saturn
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi, I'm new with saturncloud. Sorry for creating an issue in here, I know this is not suitable platform to ask user question but I didn't find anyway to ask, I check the repo saturncloud/images but there is no issues
section for reporting.
Firstly, I created jupyterlab server to train model in tensorflow. I used public.ecr.aws/saturncloud/saturn-python-tensorflow:2023.05.01
as docker image for that environment. Expecting to run code smoothly with this environment but then when I run model.fit()
some errors appeared:
2023-10-11 09:29:01.001362: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:296 : INTERNAL: libdevice not found at ./libdevice.10.bc
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
Cell In[36], line 1
----> 1 history = model.fit(X_train, Y_train, epochs=5, batch_size=8, validation_split=0.2, callbacks=[early_stopping], )
2 # history = model.fit(X_train, Y_train, epochs=5, batch_size=8, validation_split=0.2)
File /opt/saturncloud/envs/saturn/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
I guess there are some issues with environment. I searched and tried to debug environment with conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
, see here
Metadata
Metadata
Assignees
Labels
No labels