Make sure that the .csv file is in the same directory as the .qmd file you’re currently working on. Then import the babynames.csv data set. Give it the name babynames. Then copy the import code into the code chunk below. Does it run?
babynames
# A tibble: 1,924,665 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
7 1880 F Ida 1472 0.0151
8 1880 F Alice 1414 0.0145
9 1880 F Bertha 1320 0.0135
10 1880 F Sarah 1288 0.0132
# ℹ 1,924,655 more rows
dplyr
Your Turn 2
Alter the code to select just the n column:
select(babynames, name, prop)
# A tibble: 1,924,665 × 2
name prop
<chr> <dbl>
1 Mary 0.0724
2 Anna 0.0267
3 Emma 0.0205
4 Elizabeth 0.0199
5 Minnie 0.0179
6 Margaret 0.0162
7 Ida 0.0151
8 Alice 0.0145
9 Bertha 0.0135
10 Sarah 0.0132
# ℹ 1,924,655 more rows
Quiz
Which of these is NOT a way to select the name and n columns together?
select(babynames, -c(year, sex, prop))
# A tibble: 1,924,665 × 2
name n
<chr> <int>
1 Mary 7065
2 Anna 2604
3 Emma 2003
4 Elizabeth 1939
5 Minnie 1746
6 Margaret 1578
7 Ida 1472
8 Alice 1414
9 Bertha 1320
10 Sarah 1288
# ℹ 1,924,655 more rows
select(babynames, name:n)
# A tibble: 1,924,665 × 2
name n
<chr> <int>
1 Mary 7065
2 Anna 2604
3 Emma 2003
4 Elizabeth 1939
5 Minnie 1746
6 Margaret 1578
7 Ida 1472
8 Alice 1414
9 Bertha 1320
10 Sarah 1288
# ℹ 1,924,655 more rows
select(babynames, starts_with("n"))
# A tibble: 1,924,665 × 2
name n
<chr> <int>
1 Mary 7065
2 Anna 2604
3 Emma 2003
4 Elizabeth 1939
5 Minnie 1746
6 Margaret 1578
7 Ida 1472
8 Alice 1414
9 Bertha 1320
10 Sarah 1288
# ℹ 1,924,655 more rows
Use filter, babynames, and the logical operators to find:
All of the names where prop is greater than or equal to 0.08
All of the children named “Sea”
Your Turn 4
Use Boolean operators to return only the rows that contain:
Boys named Sue
Names that were used by exactly 5 or 6 children in 1880
Names that are one of Acura, Lexus, or Yugo
Help Me
What is the smallest value of n? What is the largest?
Your Turn 5
Use |> to write a sequence of functions that:
Filters babynames to just the girls that were born in 2017, then…
Selects the name and n columns, then…
Arranges the results so that the most popular names are near the top.
Your Turn 6
Trim babynames to just the rows that contain your name and your sex
Trim the result to just the columns that will appear in your graph (not strictly necessary, but useful practice)
Plot the results as a line graph with year on the x axis and prop on the y axis
Your Turn 7
Cpmplete the code below to extract the rows where name == "Khaleesi". Then use summarise() and sum() and min() to find:
The total number of children named Khaleesi
The first year Khaleesi appeared in the data
Make sure to turn the eval option to true before running the code.
(Hint: Be sure to remove each _ before running the code)
babynames ___ filter(_______________________) ______________(total = ________, first = _______)
Your Turn 8
Use group_by(), summarise(), and arrange() to display the ten most popular names. Compute popularity as the total number of children of a single gender given a name.
Make sure to turn the eval option to true before running the code.
(Hint: Be sure to remove each _ before running the code)