A06 - Project: Bird Species

A06 - Project: Bird Species#

Kaplan, Daniel & Matthew Beckman. (2021). Data Computing. 2nd Ed.


19 Jun 2023

Programming Environment#


Get the data

OrdwayBirds %>%
  select(Month, Day) %>%
A data.frame: 6 × 2
OrdwayBirds <-
  OrdwayBirds %>%
  select(SpeciesName, Month, Day) %>%
    Month = as.numeric(as.character(Month)),
    Day   = as.numeric(as.character(Day))
OrdwayBirds %>%
A data.frame: 6 × 3
Task 1

[1] Including misspellings, how many different species are there in the OrdwayBirds data?

There are 275 unique values of the variable SpeciesName. This reduces to 268 after dropping the following invalid values:

  • ''

  • '-lost-'

  • '-missing-'

  • '13:00:00'

  • '[Nothing, just dashes]'

  • 'lost'

  • 'none'

[2] Consider the OrdwaySpeciesNames data frame also found in the dcData package as well. How many distinct species are there in the SpeciesNameCleaned variable in OrdwaySpeciesNames? You will find it helpful to use n_distinct() a reduction function, which counts the number of unique values in a variable.

There are 108 unique values of the variable SpeciesNameCleaned after accounting for the value NA.

OrdwayBirds %>%
A data.frame: 275 × 2
OrdwayBirds %>%
  select(SpeciesName) %>%
OrdwaySpeciesNames %>%
A data.frame: 109 × 2
OrdwaySpeciesNames %>%
  select(SpeciesNameCleaned) %>%

Task 2

Use the OrdwaySpeciesNames table to create a new data frame that corrects the misspellings in SpeciesNames. This can be done easily using the inner_join() data verb. Look at the names of the variables in OrdwaySpeciesNames and OrdwayBirds.

[1] Which variable(s) was used for matching cases?

The variable SpeciesName was used for matching cases.

[2] What were the variable(s) that will be added?

The variables SpeciesNameCleaned (renamed to Species), Month, and Day will be added.

Corrected <-
  OrdwayBirds %>%
    inner_join(y = OrdwaySpeciesNames) %>%
    select(Species = SpeciesNameCleaned, Month, Day) %>%
Corrected %>%
A data.frame: 6 × 3
1Song Sparrow 716
3Song Sparrow 716
4Field Sparrow716
5Field Sparrow716
6Field Sparrow716
7Field Sparrow716

Task 3

Call the variable that contains the total count. Arrange this into descending order from the species with the most birds, and look through the list. (Hint: Remember n(). Also, one of the arguments to one of the data verbs will be desc(count) to arrange the cases into descending order. Display the top 10 species in terms of the number of bird captures.) Define for yourself a “major species” as a species with more than a particular threshold count. Set your threshold so that there are 5 or 6 species designated as major. Filter to produce a data frame with only the birds that belong to a major species. Save the output in a table called Majors. (Hint: Remember that summary functions can be used case-by-case when filtering or mutating a data frame that has been grouped.)

[1] How many bird captures are reported for each of the corrected species?

See below for the result (major species threshold >= 1000).

Corrected %>%
  count(Species) %>%
  arrange(desc(n)) %>%
  head(n = 10)
A data.frame: 10 × 2
Corrected %>%
  group_by(Species) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(n = 10)
A tibble: 10 × 2
Majors <-
  Corrected %>%
    group_by(Species) %>%
    summarize(count = n()) %>%
    arrange(desc(count)) %>%
    filter(count >= 1000)
A tibble: 5 × 2
Task 4

When you have correctly produced Majors, write a command that produces the month-by-month count of each of the major species. Call this table ByMonth. Display this month-by-month count with a bar chart arranged in a way that you think tells the story of what time of year the various species appear. You can use mplot() to explore different possibilies. (Warning: mplot() and similar interactive functions should not appear in your Rmd file, it needs to be used interactively from the console. Use the “Show Expression” button in mplot() to create an expression that you can cut and paste into a chunk in your Rmd document, so that the graph gets created when you compile it.) Once you have the graph, use it to answer these questions:

[1] Which species are present year-round?

  • American Goldfinch (11-12 mo)

  • Black-capped Chickadee (12 mo)

[2] Which species are migratory, that is, primarily present in one or two seasons?

  • Field Sparrow (6 mo)

  • Slate-colored Junco (8-9 mo)

  • Tree Swallow (3-5 mo)

[3] What is the peak month for each major species?

  • 10 American Goldfinch

  • 11 Black-capped Chickadee

  • 05 Field Sparrow

  • 10 Slate-colored Junco

  • 06 Tree Swallow

[4] Which major species are seen in good numbers for at least 6 months of the year? (Hint: n_distinct() and >= 6.)

Arguably, the only species that is not seen in good numbers for at least 6 months of the year is the tree swallow.

ByMonth <-
  OrdwayBirds %>%
    group_by(SpeciesName, Month = as.integer(Month)) %>%
    summarize(count = n()) %>%
    filter(SpeciesName %in% Majors$Species)
A grouped_df: 47 × 3
ByMonth %>%
  group_by(SpeciesName) %>%
    MonthsPerYear   = n(),
    SixMonthsOrMore = n_distinct(Month) >= 6
A tibble: 5 × 3
ByMonth %>%
  ggplot() +
      mapping     = aes(x = Month, y = count, fill = SpeciesName),
      na.rm       = FALSE,
      position    = 'stack',
      show.legend = TRUE,
      stat        = 'identity'
    ) +
    scale_x_continuous(breaks = 1:12)