A big issue with AI screening is that AI trends towards consensus
As such, it is unlikely to effectively implement new theoretical screening rules well, and instead default back to consensus framing.
This is problematic for literature reviews/meta-analysis, as the author may be trying to prove a novel point. Even if they are not, AI is likely to be overly confident in its categorization at times.
A possible solution is to allow the screening to categorize studies as "Unsure", in which case a human would perform the final review on a subset of studies.
Combined with stricter prompting (e.g. only categorize as this type of domain if its one of these tasks: [enumerate tasks]. If a task seem relevant but is not explicilty named, catevorize as "unsure").
A big issue with AI screening is that AI trends towards consensus
As such, it is unlikely to effectively implement new theoretical screening rules well, and instead default back to consensus framing.
This is problematic for literature reviews/meta-analysis, as the author may be trying to prove a novel point. Even if they are not, AI is likely to be overly confident in its categorization at times.
A possible solution is to allow the screening to categorize studies as "Unsure", in which case a human would perform the final review on a subset of studies.
Combined with stricter prompting (e.g. only categorize as this type of domain if its one of these tasks: [enumerate tasks]. If a task seem relevant but is not explicilty named, catevorize as "unsure").