Data Cleaning

Raw datasets are rarely pretty, and few -- if any -- are ready to be analyzed right out of the box. Missing data, typos, irregular capitalization, and inconsistent date/time and lat/long formatting can make it impossible to analyze or summarize your data. In this workshop, participants learn how to fix, clean, standardize, and reorganize their raw data until it is ready for analysis.

  • Session 1 covers basic principles of data manipulation using subsetting and filtering techniques.
  • Session 2 covers text cleaning using the stringr package.
  • Session 3 covers date/time cleaning using R's lubridate package; and Session 4 covers advanced cleaning techniques, such as conditional statements and for loops.

Course Length

  • 12 hours in 4 sessions


    Zero to Hero in R or previous experience with dataframes and the tidyverse package in R.