Adding PQN Support

PQN is becoming a popular off-policy RL algorithm that appears to show different behaviour concerning its hyperparameters compared to DQN. Additionally, it allows much faster training and does not require a replay buffer. This makes it interesting to be studied regarding HPO in addition to the currently implemented algorithms.