Let's see if we can find some evaluation tasks that we can use to gauge performance of identified models in a reliable manner. Ideally tasks should probably be scoped to the medical or biological domain (think term recognition) but we should first see what we can find
Let's see if we can find some evaluation tasks that we can use to gauge performance of identified models in a reliable manner. Ideally tasks should probably be scoped to the medical or biological domain (think term recognition) but we should first see what we can find