While using AI & AI4L represents a significant advancement in creating evidence reviews and provides much-needed transparency on the interventions in question, it is important to be aware of their limitations.
Due to the heuristic nature of AI, an audit may pass without identifying all issues in the document, even after multiple passes.
AI4L's audits are based on an extensive checklist of criteria, and a 100% pass rate indicates that the document has met all the criteria on the checklist.
However, this does not necessarily mean that every claim in the document is medically correct or that there are no issues at all. The checklist may not cover every possible issue, and nuances or complexities in medical information may not be captured by it. Therefore, while a clean audit is a positive indicator, it does not guarantee that the document is free of errors or inaccuracies.
When relying on built-in tools of the models, such as web search or fetch, the quality of the reviews and audits can be affected by the availability of these tools and the accessibility of the websites they rely on. This can lead to variability in the quality of the audits and fixes.
We found tremendous variability in the quality of reviews, audits, and fixes across models. This is a critical consideration when using AI4L, as the choice of model can significantly impact the results.
For the experience with individual models see the discussions section and our general findings in Lessons Learned.
AI4L's evidence reviews are designed to support informed decision-making and provide transparency on the interventions in question. However, they are not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition.