Instructions for the class

Final Exam (2025-01-16)

Exercises: The main document that you have to work on is here.
Data Sets: All data sets are available at here. The ZIP file contains two files: an RDS file and a CSV file.
Deliverable: All you have to do is send the QMD file with your answers and your name as author by mail to the instructor by 12:25pm on 2025-01-16.

Homework Assignment (2025-01-01)

New announcement about the homework assignment.

Deliverable is expected to be a ZIP archive containing the two .qmd files (report and dashboard) along with all the necessary data files.

Please ensure that both files compile withour error.

Checkout the homework assignment page.

Disclaimer

This repository is build on the work of Garrett Grolemund from posit. In particular, it reuses an important part of the material he developed for tidyverse-related workshops, which is available at https://github.com/rstudio-education/remaster-the-tidyverse under the Creative Commons BY-SA 4.0 copyright.

Material

The main webpage is at https://astamm.github.io/data-science-with-r/.

Outline

Data wrangling with R

The class is organised in 9 parts each of which has its own set of slides and exercises. The slides are available in the above Data Wranging - Slides tab and the exercises in the above Data Wranging - Labs tab. The slides are written partly with Keynote (exported as PDFs) and partly in Quarto reveajs slides. The exercises are written in Quarto.

Part	Title	Slides	Exercises	Suppl. Material
1	Introduction	PDF	Quarto
2	Visualize Data	PDF	Quarto
3	Transform Data	PDF	Quarto	CSV
4	Model Data	PDF	Quarto	ZIP
5	Communicate Data	PDF	Quarto	Quarto
6	Tidy Data	PDF	Quarto
7	Join Data	PDF	Quarto
8	Manipulate Data Types	PDF	Quarto
9	Manipulate Lists	PDF	Quarto

Exploratory Data Analysis with R

The class is organised in 4 parts each of which has its own set of slides and exercises. The slides are available in the above Exploratory Data Analysis - Slides tab and the exercises in the the above Exploratory Data Analysis - Labs tab. The slides are written in Quarto revealjs slides. The exercises are written in Quarto.

Part	Title	Slides	Exercises	Suppl. Material
1	Hypothesis Testing	Quarto	Quarto
2	Linear Regression	Quarto	Quarto	ZIP
3	Principal Component Analysis	RMarkdown	Quarto	ZIP
4	Clustering	Quarto	Quarto

Requirements

R: https://www.r-project.org
RStudio: https://posit.co/download/rstudio-desktop/
Quarto: https://quarto.org/docs/get-started/
Tidyverse: https://www.tidyverse.org
Specific R packages by theme:
- Data sets:
  - {babynames}: a data set of frequency of baby names in the US from 1880 to 2017.
- Data visualization:
  - {ggplot2}: a package that implements the grammar of graphics.
  - {plotly}: an interactive plotting library.
  - {gt}
- Model summaries:
  - {broom}: a package that provides tidy summaries of model outputs.
  - {modelr}: a package that provides functions for modelling within the tidyverse.
  - {jtools}: a package that provides functions for summarizing and visualizing model outputs and main effects.
  - {interactions}: a package that provides functions for visualizing the effect of interactions in regression models.
- PCA:
  - {FactoMineR}: a package that provides ready-to-use implementations of standard statistical methods for data analysis.
  - {factoextra}: a package that provides functions for extracting and visualizing the results of multivariate data analyses.
  - {corrplot}: a package that provides functions for visualizing correlation matrices.
  - {viridis}: a package that provides color palettes that are perceptually uniform in both color and black-and-white.
  - {huxtable}: a package that provides functions for creating tables in HTML documents.