U.S. flag

An official website of the United States government

Artificial Intelligence in Evidence Synthesis Reviews

There is growing interest in, and use of, artificial intelligence (AI) in evidence synthesis reviews. AI in evidence synthesis is an active area of research and development and has been for some time (e.g. Agency for Healthcare Research and Quality 2012). While research teams create custom solutions for their review projects, we are also seeing growth in access to off-the-shelf solutions (Cierco et al. 2022). For instance, the NIH Library offers free access to Covidence to streamline evidence synthesis reviews.

Teams should thoughtfully and carefully discuss the use of AI in their review projects, considering the potential for AI to introduce errors and bias into the review, the evidence (or lack thereof) to support their tool selection and use of AI, and the potential time savings. (Note: Time savings can also be achieved in other ways, not just use of AI! (Clark et al. 2020))  

Keep in mind, the review is not only about the final product. The process is also an opportunity for team members to learn about the topic of the review and evidence synthesis methods. The hands-on experience of conducting an evidence synthesis review can help researchers determine if, when, and how to apply AI in future reviews.  

If you choose to incorporate AI, it’s critical to report it transparently. Use the PRISMA 2020 guidelines (Page et al. 2021a) to report how and what you did. Refer to and use preliminary guidance on AI use, e.g. Responsible AI in Evidence SynthEsis (RAISE): Guidance and Recommendations. Finally, cite the specific software and AI used (Chue et al. 2019, Katz et al. 2021). Reporting items include the name and related details for identifying the AI system, details on tool use, and justification for its use. Documentation and transparency are key for readers seeking to appraise and apply the results of your review, as well as for building evidence for and trust in AI for evidence synthesis reviews (O’Connor et al. 2019). For examples of documentation, see the PRISMA 2020 Explanation and Elaboration paper (Page et al. 2021b).  

When planning a review project, and before submitting the manuscript for review, check the journal and/or publisher guidelines for authors. They may have additional reporting requirements and limitations on permitted uses of AI.

Remember, the appropriateness of AI for evidence synthesis depends on several factors, including  

Topic of Review

The topic of the review may impact the utility of AI. For instance, reviews of drug interventions reported in Randomized Controlled Trials may be more amenable to AI implementation. Reporting in RCTs tends to be more standardized and tools have been developed to filter RCTs from other study types.

Intended Use of Review

We hope our final evidence synthesis reviews are impactful and useful! Considering how your review might be used, what impacts might error and bias introduced by AI use have?

Step in Review Process

Covidence now includes relevancy ranking to accelerate screening. This use of AI is aimed at speeding up the review process and aiding humans in title-abstract screening, not at replacing human effort. Covidence also now includes an RCT tagger, which can be used to aid or replace human efforts in screening. This narrow, focused use of AI to identify the study type is appropriate for only certain review topics, e.g. reviews of intervention efficacy which use RCT as an inclusion criterion. Other evidence synthesis software programs incorporate AI in other steps of the review. For more information, see the table Tools & Their Use of AI.

Researchers are exploring the use of AI for other steps of the review process, including search (Malhotra et al. 2023), data extraction, and risk of bias assessment (Arno et al. 2022, Gartlehner et al. 2024, Goldkuhle et al. 2018, Marshall et al. 2016, Wang et al. 2022).

Type of AI

  • Automation Tools: AI-powered systems that facilitate deduplication, systematic review management, and citation screening. Automation tools may use one or more types of AI.  
  • Natural Language Processing (NLP): Used for text mining, entity recognition, and summarization (e.g., AI-assisted abstract screening tools like ASReview).
  • Machine Learning (ML) Models: Supervised, semi-supervised and unsupervised algorithms learn from data without explicit programming. These models are used for study classification, relevance prediction, and data extraction.
  • Generative AI: Tools like ChatGPT assist with summarizing, translating, and drafting but require careful verification for accuracy and bias. Publishers and journals may have requirements or restrictions around the use of Generative AI. For more on generative AI, visit the Generative AI at NIH guide, which includes a toolkit for crafting your GenAI usage strategy.  
  • Large Language Models (LLMs): AI models trained on text data and capable of generating text, as in summarizing articles and answering questions.

For instance, Covidence uses machine learning. During screening, the model learns from team members’ decisions what is and is not relevant. This is an example of supervised machine learning. Covidence’s RCT tagger is also an example of supervised machine learning. This model was trained on an external dataset, consisting of titles and abstracts labeled according to whether they reported on an RCT or not.

Closing Thoughts

As noted above, this is an active space for research and development, and we anticipate further guidance and recommendations on the responsible use of AI in evidence synthesis. The evidence base for the use of AI in evidence synthesis reviews continues to grow, and the models and tools continue to evolve. There are opportunities to contribute to this evolution, by examining the evidence for your tool(s) of choice, reporting transparently on your use of AI tools in reviews, and contributing to research through validation studies and SWARs (Studies Within a Review) (Devane et al. 2022).

References & Further Reading