What should we add?
I have recently written code to implement two decomposition methods for arbitrary 2-qubit unitaries that require less than three 2-qubit gates to implement, on average. This is significant because the common 2-qubit unitary decomposition method with CNOT gates requires 3 CNOT gates. Implementation of 2-qubit unitary decomposition with fewer 2-qubit gates required can lead to increased efficiency. The decomposition methods are detailed in the two papers listed below:
B-Gate decomposition - Minimum construction of two-qubit quantum operations
Quantum Instruction Set Design for Performance
SQiSW decomposition - Algorithm 1 in Quantum Instruction Set Design for Performance