Instructions for the class
New announcement about the homework assignment. Checkout the homework assignment page.
Disclaimer
This repository is build on the work of Garrett Grolemund from posit. In particular, it reuses an important part of the material he developed for tidyverse-related workshops, which is available at https://github.com/rstudio-education/remaster-the-tidyverse under the Creative Commons BY-SA 4.0 copyright.
Material
The main webpage is at https://astamm.github.io/data-science-with-r/.
Outline
Data wrangling with R
The class is organised in 9 parts each of which has its own set of slides and exercises. The slides are available in the above Data Wranging - Slides tab and the exercises in the above Data Wranging - Labs tab. The slides are written partly with Keynote (exported as PDFs) and partly in Quarto reveajs slides. The exercises are written in Quarto.
Part | Title | Slides | Exercises | Suppl. Material |
---|---|---|---|---|
1 | Introduction | Quarto | ||
2 | Visualize Data | Quarto | ||
3 | Transform Data | Quarto | CSV | |
4 | Model Data | Quarto | ZIP | |
5 | Communicate Data | Quarto | Quarto | |
6 | Tidy Data | Quarto | ||
7 | Join Data | Quarto | ||
8 | Manipulate Data Types | Quarto | ||
9 | Manipulate Lists | Quarto |
Exploratory Data Analysis with R
The class is organised in 4 parts each of which has its own set of slides and exercises. The slides are available in the above Exploratory Data Analysis - Slides tab and the exercises in the the above Exploratory Data Analysis - Labs tab. The slides are written in Quarto revealjs slides. The exercises are written in Quarto.
Part | Title | Slides | Exercises | Suppl. Material |
---|---|---|---|---|
1 | Hypothesis Testing | Quarto | Quarto | |
2 | Linear Regression | Quarto | Quarto | ZIP |
3 | Principal Component Analysis | Quarto | Quarto | |
4 | Clustering | Quarto | Quarto |
Requirements
Quarto Drop extension: https://github.com/r-wasm/quarto-drop
Tidyverse: https://www.tidyverse.org
Specific R packages by theme:
Data sets:
- {babynames}: a data set of frequency of baby names in the US from 1880 to 2017.
Data visualization:
Model summaries:
- {broom}: a package that provides tidy summaries of model outputs.
- {modelr}: a package that provides functions for modelling within the tidyverse.
- {jtools}: a package that provides functions for summarizing and visualizing model outputs and main effects.
- {interactions}: a package that provides functions for visualizing the effect of interactions in regression models.