Hello, I found a performance issue in the definition of _build, luminoth/models/ssd/proposal.py, tf.shape(all_anchors)[0] will be calculated repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop.
The same issue exists in tf.cast(total_anchors, tf.float32).
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.