Instructions for the class

Disclaimer

This repository is build on the work of Garrett Grolemund from posit. In particular, it reuses an important part of the material he developed for tidyverse-related workshops, which is available at https://github.com/rstudio-education/remaster-the-tidyverse under the Creative Commons BY-SA 4.0 copyright.

Material

The main webpage is at https://astamm.github.io/data-science-with-r/.

Outline

Data wrangling with R

The class is organised in 9 parts each of which has its own set of slides and exercises. The slides are available in the above Data Wranging - Slides tab and the exercises in the above Data Wranging - Labs tab. The slides are written partly with Keynote (exported as PDFs) and partly in Quarto reveajs slides. The exercises are written in Quarto.

Part Title Slides Exercises Suppl. Material
1 Introduction PDF Quarto
2 Visualize Data PDF Quarto
3 Transform Data PDF Quarto CSV
4 Model Data PDF Quarto ZIP
5 Communicate Data PDF Quarto Quarto
6 Tidy Data PDF Quarto
7 Join Data PDF Quarto
8 Manipulate Data Types PDF Quarto
9 Manipulate Lists PDF Quarto

Exploratory Data Analysis with R

The class is organised in 4 parts each of which has its own set of slides and exercises. The slides are available in the above Exploratory Data Analysis - Slides tab and the exercises in the the above Exploratory Data Analysis - Labs tab. The slides are written in Quarto revealjs slides. The exercises are written in Quarto.

Part Title Slides Exercises Suppl. Material
1 Hypothesis Testing Quarto Quarto
2 Linear Regression Quarto Quarto ZIP
3 Principal Component Analysis RMarkdown Quarto ZIP

Requirements

  • R: https://www.r-project.org

  • RStudio: https://posit.co/download/rstudio-desktop/ ou Positron https://positron.posit.co/download.html

  • Quarto: https://quarto.org/docs/get-started/

  • Tidyverse: https://www.tidyverse.org

  • Specific R packages by theme:

    • Data sets:

      • {babynames}: a data set of frequency of baby names in the US from 1880 to 2017.
      • {ncyflights13}: information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013: 336,776 flights in total.
    • Data wrangling:

      • {janitor}: simple functions for examining and cleaning dirty data.
      • {skimr}: a frictionless approach to summary statistics which conforms to the principle of least surprise, displaying summary statistics the user can skim quickly to understand their data.
      • {tidyverse}: an opinionated collection of R packages designed for data science.
    • Extra data visualization packages:

      • {ggcorrplot}: visualize easily a correlation matrix using ‘ggplot2’.
      • {ggfortify}: unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis.
      • {plotly}: an interactive plotting library.
    • Reporting

      • {DT}: an R interface to the JavaScript library DataTables.
      • {gt}: produce nice-looking display tables.
      • {kableExtra}: construct complex table with knitr::kable() + |>.
    • Model summaries:

      • {broom}: a package that provides tidy summaries of model outputs.