Skip to content

Error with environment to run tensorflow model. #13

@hungpham3112

Description

@hungpham3112

Hi, I'm new with saturncloud. Sorry for creating an issue in here, I know this is not suitable platform to ask user question but I didn't find anyway to ask, I check the repo saturncloud/images but there is no issues section for reporting.

Firstly, I created jupyterlab server to train model in tensorflow. I used public.ecr.aws/saturncloud/saturn-python-tensorflow:2023.05.01 as docker image for that environment. Expecting to run code smoothly with this environment but then when I run model.fit() some errors appeared:

2023-10-11 09:29:01.001362: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:296 : INTERNAL: libdevice not found at ./libdevice.10.bc
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In[36], line 1
----> 1 history = model.fit(X_train, Y_train, epochs=5, batch_size=8, validation_split=0.2, callbacks=[early_stopping], )
      2 # history = model.fit(X_train, Y_train, epochs=5, batch_size=8, validation_split=0.2)

File /opt/saturncloud/envs/saturn/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

I guess there are some issues with environment. I searched and tried to debug environment with conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0, see here

image
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions