Skip to content

topological sort failed and error about model-170000 #1

@dkrmsptlfk

Description

@dkrmsptlfk

Hi
I'm learning about image dehazing so I was testing your code.
The open dataset was too many so I used only 500 image pairs(hazy and real image) in RESIDEv0-SOTS_indoor (gt folder and hazy folder). I copied and pasted the real images 10 times each to make real-hazy image pairs and then when I perform the code in train DHSGAN_generator.sh, there were errors about topological sort failed like this:

Optimization starts!!!
2019-08-08 15:38:28.198116: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-08 15:38:28.271352: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-08 15:38:28.646927: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-08 15:38:28.701832: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

However, waiting few minutes, optimization started well and it worked well.

The real problem was happened when i run the train_DHSGAN.sh
in the train_DHSGAN.sh code, checkpoint is model-170000 but after running train_DHSGAN_generator.sh, there were errors like this:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator/discriminator_unit/dense_layer_1/dense/bias not found in checkpoint
[[node save_1/RestoreV2 (defined at main.py:224) ]]

I changed the model number from 170000 to 200000(the max iter) but the same error occurred

To sum up,

  1. Is the problem about model-170000 due to topological sort failed error?
  2. If 1) is not, how should i set the check point model?

I'm using python3.6.8, tensorflow gpu 1.13.1, CUDA10.0 and cudnn7.6.0 in windows10

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions