Simply defining a loss function based on a global observable (that is, observables measuring all qubits) leads to barren plateaus even for shallow circuits with sharp priors76, while local observables (those comparing quantum states at the single-qubit level) avoid this issue. The latter is due not to bad inductive biases but rather to the fact that comparing objects in exponentially large Hilbert spaces requires an exponential precision, as their overlap is usually exponentially small.
[This is because all supervised quantum models ARE kernel methods!!]
Here, the issue arises when the visible qubits of the QNN (those that are measured at the QNN’s output) are entangled with a large number of qubits in the hidden layers. Due to entanglement, the information of the state is stored in non-local correlations across all qubits, and hence the reduced state of the visible qubits concentrates around the maximally mixed state. This type of barren plateau can be solved by taming the entanglement generated across the QNN.
The first quantum advantages in QML will likely arise from hidden parameter extraction from quantum data. This can be for quantum sensing or quantum state classification/egression. Fundamentally, we know from the theory of optimal measurement that non-local quantum measurements can extract hidden parameters using fewer samples. Using QML, one can form and search over a parameterization of hypotheses for such measurements
In particular, the variational approach may be able to find the optimal entanglement, exposure and measurement scheme that filters signal from noise. (Quantum sensing, metrology)
Models needing inductive bias -> No free lunch!
Important point is not lose track of symmetries of the classical data when we encode it into a quantum state.