Skip to content
This repository was archived by the owner on Oct 14, 2024. It is now read-only.
This repository was archived by the owner on Oct 14, 2024. It is now read-only.

Training of generated model (from the generated gene) gives tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed #4

@amew0

Description

@amew0

I have been going over the implementation of the NASCaps repo, and to understand how the algorithm searches the architecture I am following the README.md there to run the "main.py" with its args as mentioned in the file. and I have encountered an issue explained down below:
Once a gene is created and the corresponding CapsNet model is created, upon training the model for evaluating the population (method evaluate_population > wrap_train_test > train) I get the following error:

File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/ak11263/nascaps/nsga/main.py", line 893, in train
    callbacks=[timeout_call, log, checkpoint, lr_decay])
  File "/home/ak11263/nascaps/nsga/main.py", line 652, in wrap_train_test
    runid, _ = train(model=model, data=((x_train_current, y_train), (x_test_current, y_test)), args=args)
  File "/home/ak11263/nascaps/nsga/main.py", line 525, in evaluate_population
    p["runid"], train_acc = wrap_train_test(p["gene"])
  File "/home/ak11263/nascaps/nsga/main.py", line 711, in run_NSGA2
    evaluate_population(parent)
  File "/home/ak11263/nascaps/nsga/main.py", line 1065, in <module>
    rets = run_NSGA2(metrics=["accuracy_drop", "energy", "memory", "latency"], inshape=inshape, p_size=args.population, q_size=args.offsprings, generations=args.generations)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ak11263/miniconda3/envs/tf-1.13-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main (Current frame)
    "__main__", mod_spec)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(100, 160), b.shape=(160, 784), m=100, n=784, k=160
	 [[{{node decoder/dense_1/MatMul}}]]
	 [[{{node loss/decoder_loss/Mean_3}}]]

After disabling (commenting out) the training and testing of the generated model and replacing it with a dummy model to generate a random test_acc I have seen that the program runs successfully.

I have been looking around the net and have some suggestions that the use of tensorflow v1 is causing the issue (I also have seen that it has been showing me plenty of warnings of deprecations).

I also have started migrating the project into tensorflow 2, although not very successfully.

It would have been delightful if I could have been given any suggestions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions