U.S. flag

An official website of the United States government

Data Collection

What It Is | How to Collect | What to Collect | Pilot Test | Tools | Manuals & Guidance

What It Is

Data collection is the process by which evidence synthesis review teams obtain the necessary information about characteristics and findings from the studies included in the review. This step may be called “data extraction”, “data collection”, or, for a scoping review, “data charting”.  

In general, you collect data to describe your included studies. It is also important to collect the same data in the same way from each study to enable later synthesis.

How to Collect

Two review team members should independently collect data from each included study. Define your process for resolving discrepancies (e.g., consensus discussion, separate third review team member). Pilot your data collection step. Unlike with risk of bias, for which you should use an established tool, your review team will need to develop a data collection tool specific to your review. In your protocol, you described these steps and pilot tested them before officially starting (see Piloting below).

As part of your protocol, you may develop a codebook which includes the additional definitions and details on your data items to collect, how to collect each item, the process for cleaning each item, and additional steps taken during data analysis.

What to Collect

The data collected must help you to answer your research question. Do not collect more data than you need. Depending on your research question, you might collect the following in addition to other data relevant to your specific question (see Cochrane Handbook Chapter 5.3):

  • Study information (first author, publication year, title, DOI)
  • Population demographics (age, sex/gender, ethnicity/race, disease/condition, other characteristics related to the intervention/exposure/outcomes)
  • Methodology (study design/type, location/setting, participant recruitment/selection/allocation, study quality/sources of bias)
  • Intervention/Exposure (quantity, dose, route of administration, duration, setting, length of exposure, follow-up, diagnostic criteria used)
  • Outcomes (quantitative, qualitative)
  • Funding sources

To help with collecting data on the intervention and then reporting that in your eventual manuscript, consider using the Template for Intervention Description and Replication (TIDIeR) checklist. The TIDIeR checklist helps with describing the interventions.

If a meta-analysis is planned, collect additional information such as sample sizes, effect sizes, dependent variables, reliability measures, follow-up data, statistical tests used, and more. See chapters 5: Collecting Data and 10: Analysing data and undertaking meta-analyses from the Cochrane Handbook for more details.

Additionally, consider whether to use open-ended or closed-ended answer options for the data. Using closed-ended options for concepts with discrete categories makes data collection and cleaning easier later (see Cochrane Handbook Chapter 5). If an open-ended option is necessary, provide more instructions for the data collectors on where to look in the article, whether to include page numbers, or to copy the information verbatim from the article.

Qualitative Data Collection

If you will collect and include qualitative data, you must consider additional points about the theoretical framework, method for collection, and potential review team member bias.  

Learn more about including qualitative data in a review:  

Pilot Test

Pilot testing the data collection process is critical to the success of your review and recommended by both the Cochrane Collaboration and JBI. Piloting allows the team to refine and clarify

  • The data items you plan to collect,
  • How you will collect them (e.g., open-ended vs closed-ended),
  • Where the data may be reported in the article to ensure that the reviewers understand and collect the data consistently.

It is also an important opportunity to try out the data collection form and software to make sure it works for the reviewers.

How many records to use?

This depends on the review topic. JBI recommends at least 2-3 articles, but 5 or more are often needed to really test out the data collection process. All team members conducting data collection should participate in the pilot including whoever will resolve discrepancies in the collected data. Ideally, the entire team should participate in the pilot.  

Afterwards, the team should meet to discuss and resolve discrepancies which may result in changes to how specific data are collected, answer questions, and make modifications to the data collection software used. Make sure these changes are documented in the protocol.  

Learn more about pilot testing the data collection step:

Tools

A wide variety of tools can be used for data collection. The choice is determined by what your team has access to and is familiar with using, and best fits your data collection needs. Some software used for data collection in an evidence synthesis review include:

NIH and HHS staff should check with your NIH Institute/Center IT or NIH Center for Information Technology (CIT) or HHS IT Office about using freely available, open source software for your work.  

Manuals & Guidance

For more in-depth guidance and information on how to conduct data collection for your review, please refer to the resources listed below. You can also ask a NIH Librarian for additional help.

For any type of review, especially systematic reviews and meta-analyses:

For scoping reviews:  

Next > Data Synthesis

Eligibility Criteria | Screening | Using Covidence | Risk of Bias | Data Collection | Data Synthesis