Implement grading of answers

Implement a small database to gather feedback on the quality of answers, so that we can compare models.
Simple feedback with thumbs up, middle or down, and results stores in a small database.

When using multiple models in parallel, we could store an additional value to count how many times a model was better ranked than another (eventually which one vs which one).