Data Wrangling in R is the third class in the NIH Library Introduction to R Series. A basic understanding of R and R Data Types is expected. This class provides a basic overview of manipulating, analyzing and exporting data with the R tidyverse. R is a programming language and open source environment for statistical computing and graphics. The R class series is a comprehensive collection of training sessions offered by the NIH Library Data Services Program that is designed to teach non-programmers how to write modular code and to introduce best practices for using R for data analysis and data visualization. Each class uses both evidence-based best practices for programming and practical hands-on lessons.
By the end of this class, students should be able to: describe the purpose of Tidyverse packages; select certain columns or rows in a data frame; describe the function of the pipe operator; add new columns to a dataframe that are functions of existing columns; use the split-apply-combine concept for data analysis; use summarize, group by, and count to split a data frame into groups of observations, apply summary statistics for each group, and then combine the results; describe the concept of a wide and a long table format and for which purpose those formats are useful; describe the function of key-value pairs; reshape a data frame using the gather commands from the tidyr package; export a data frame to a .csv file.
Students are encouraged to install R and RStudio and download the class data before the class so that they can follow along with the instructor. Attendees will need to download the class data before the class.