A06 - Project: Bird Species

A06 - Project: Bird Species#

Kaplan, Daniel & Matthew Beckman. (2021). Data Computing. 2nd Ed. Home.


Revised

19 Jun 2023


Programming Environment#

library(dcData)
library(tidyverse)

sessionInfo()
── Attaching core tidyverse packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.2      readr     2.1.4
 forcats   1.0.0      stringr   1.5.0
 ggplot2   3.4.3      tibble    3.2.1
 lubridate 1.9.2      tidyr     1.3.0
 purrr     1.0.2     
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.2    
 [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
 [9] ggplot2_3.4.3   tidyverse_2.0.0 dcData_0.1.0   

loaded via a namespace (and not attached):
 [1] gtable_0.3.3     jsonlite_1.8.5   compiler_4.3.0   crayon_1.5.2    
 [5] tidyselect_1.2.0 IRdisplay_1.1    scales_1.2.1     uuid_1.1-0      
 [9] fastmap_1.1.1    IRkernel_1.3.2   R6_2.5.1         generics_0.1.3  
[13] munsell_0.5.0    pillar_1.9.0     tzdb_0.4.0       rlang_1.1.1     
[17] utf8_1.2.3       stringi_1.7.12   repr_1.1.6       timechange_0.2.0
[21] cli_3.6.1        withr_2.5.0      magrittr_2.0.3   digest_0.6.31   
[25] grid_4.3.0       base64enc_0.1-3  hms_1.1.3        pbdZMQ_0.3-9    
[29] lifecycle_1.0.3  vctrs_0.6.3      evaluate_0.21    glue_1.6.2      
[33] fansi_1.0.4      colorspace_2.1-0 tools_4.3.0      pkgconfig_2.0.3 
[37] htmltools_0.5.5 

Get the data#

?dcData::OrdwayBirds
?dcData::OrdwaySpeciesNames
OrdwayBirds %>%
  select(Month, Day) %>%
  head()
A data.frame: 6 × 2
MonthDay
<chr><chr>
3716
4
5716
6716
7716
8716
OrdwayBirds <-
  OrdwayBirds %>%
  select(SpeciesName, Month, Day) %>%
  mutate(
    Month = as.numeric(as.character(Month)),
    Day   = as.numeric(as.character(Day))
  )
OrdwayBirds %>%
  head()
A data.frame: 6 × 3
SpeciesNameMonthDay
<chr><dbl><dbl>
3Song Sparrow 716
4 NANA
5Song Sparrow 716
6Field Sparrow 716
7Field Sparrow 716
8Song Sparrow 716
OrdwaySpeciesNames %>%
  filter(is.na(SpeciesNameCleaned))
A data.frame: 50 × 2
SpeciesNameSpeciesNameCleaned
<chr><chr>
NA
-lost- NA
-missing- NA
[Nothing, just dashes]NA
13:00:00 NA
Bank Swallow NA
Barn Swallow NA
Bay-breasted Warbler NA
Blackpoll Warbler NA
Blue Jay NA
Blue-headed Vireo NA
Blue-winged Warbler NA
Bluebird NA
Boreal Chickadee NA
Brewer's Sparrow NA
Brown Creeper NA
Brown Thrasher NA
Brown Towhee NA
Cactus Wren NA
Common Crow NA
Common Grackle NA
Common Nighthawk NA
Common Redpoll NA
Common Yellowthroat NA
Connecticut Warbler NA
Downy Woodpecker NA
E Bluebird NA
Eastern Bluebird NA
Eastern Kingbird NA
Eastern Meadowlark NA
Eastern Robin NA
Flicker NA
Fox Sparrow NA
Goldfinch NA
Grackle NA
Green Heron NA
Ground Dove NA
Hairy Woodpecker NA
Hermit Thrush NA
Horned Lark NA
House Finch NA
House Sparrow NA
Inca Dove NA
Indigo Bunting NA
Killdeer NA
Kingbird NA
Kiskadee F.C. NA
Magnolia Warbler NA
Mockingbird NA
Rough-winged Swallow NA

Task 1#

[1] Including misspellings, how many different species are there in the OrdwayBirds data?

There are 275 unique values of the variable SpeciesName. This reduces to 268 after dropping the following invalid values:

  • ''

  • '-lost-'

  • '-missing-'

  • '13:00:00'

  • '[Nothing, just dashes]'

  • 'lost'

  • 'none'

[2] Consider the OrdwaySpeciesNames data frame also found in the dcData package as well. How many distinct species are there in the SpeciesNameCleaned variable in OrdwaySpeciesNames? You will find it helpful to use n_distinct() a reduction function, which counts the number of unique values in a variable.

There are 108 unique values of the variable SpeciesNameCleaned after accounting for the value NA.

OrdwayBirds %>%
  count(SpeciesName)
A data.frame: 275 × 2
SpeciesNamen
<chr><int>
4
-lost- 1
-missing- 1
13:00:00 1
Acadian Flycatcher 1
American Gold Finch 50
American Goldfinch 1153
American Golf Finch 1
American Redfinch 1
American Redstart 3
American Robin 4
Arkansas Kingbird 1
Baltimore Oriole 206
Bank Swallow 21
Barn Swallow 23
Batimore Oriole 1
Bay-breasted Warbler 2
Blac-capped Chickadee 1
Black and White Warbler 9
Black-Capped Chickadee 13
Black-and-white Warbler 1
Black-billed Cookoo 1
Black-billed Cuckoo 15
Black-capeed Chickadee 1
Black-capped Chicakdee 1
Black-capped Chickadee 1110
Black-capped Chikadee 1
Black-capped chickadee 187
Black-throat Sparrow 31
Black-throat-Sparrow 1
White-breast Nuthatch 23
White-breasted Nuthatch 236
White-crown Sparrow 17
White-crowned Sparrow 78
White-eyed Vireo 1
White-thorat Sparrow 1
White-throat Sparrow 86
White-throated Sparrow 229
White-winged Junco 2
Wht-brstd Nuthatch 1
Wilson Warbler 4
Wilson's Warbler 22
Winter Wren 1
Wood Pewee 37
Wood Thrush 3
Woodcock 1
Wren 2
Yellow Flicker 1
Yellow Shafted Flicker 4
Yellow Warbler 19
Yellow-bellied Flycatcher 7
Yellow-bellied Sapsucker 3
Yellow-shaft Flicker 6
Yellow-shafted Flicker 34
Yellow-shafted flicker 6
Yellow-tailed Oriole 1
Yellowthroat 107
[Nothing, just dashes] 1
lost 1
none 2
OrdwayBirds %>%
  select(SpeciesName) %>%
    n_distinct()
275
OrdwaySpeciesNames %>%
  count(SpeciesNameCleaned)
A data.frame: 109 × 2
SpeciesNameCleanedn
<chr><int>
Acadian Flycatcher 1
American Goldfinch 3
American Redfinch 1
American Redstart 1
American Robin 1
Arkansas Kingbird 1
Baltimore Oriole 3
Black and White Warbler 2
Black-billed Cookoo 2
Black-capped Chickadee 8
Black-throat Sparrow 2
Brown-headed Cowbird 2
Cardinal 2
Carolina Chickadee 1
Catbird 4
Cedar Waxwing 2
Chestnut-backed Chickadee1
Chestnut-sided Warbler 1
Chickadee 2
Chipping Sparrow 4
Clay-colored Sparrow 3
Cowbird 1
Curve-billed Thrasher 2
Eastern Phoebe 2
Eastern Wood Pewee 2
Field Sparrow 2
Golden-Crowned Kinglet 3
Gray - cheeked Thrush 4
Great Crested Flycatcher 3
Harris's Sparrow 3
Tennessee Warbler 2
Traill's Flycatcher 1
Tree L 1
Tree Swallow 4
Tufted Titmouse 1
Unknown 1
Varied Thrush 1
Veery 1
Vesper Sparrow 1
Warbling Vireo 1
White-Crested Sparrow 1
White-Fronted Dove 1
White-breasted Nuthatch 5
White-crowned Sparrow 2
White-eyed Vireo 1
White-throat Sparrow 5
White-winged Junco 1
Wilson's Warbler 2
Winter Wren 1
Wood Pewee 1
Wood Thrush 1
Woodcock 1
Wren 1
Yellow Shafted Flicker 5
Yellow Warbler 1
Yellow-bellied Flycatcher 1
Yellow-bellied Sapsucker 1
Yellow-tailed Oriole 1
Yellowthroat 1
NA 50
OrdwaySpeciesNames %>%
  select(SpeciesNameCleaned) %>%
    n_distinct()
109

Task 2#

Use the OrdwaySpeciesNames table to create a new data frame that corrects the misspellings in SpeciesNames. This can be done easily using the inner_join() data verb. Look at the names of the variables in OrdwaySpeciesNames and OrdwayBirds.

[1] Which variable(s) was used for matching cases?

The variable SpeciesName was used for matching cases.

[2] What were the variable(s) that will be added?

The variables SpeciesNameCleaned (renamed to Species), Month, and Day will be added.

Corrected <-
  OrdwayBirds %>%
    inner_join(y = OrdwaySpeciesNames) %>%
    select(Species = SpeciesNameCleaned, Month, Day) %>%
    na.omit()
Corrected %>%
  head()
Joining with `by = join_by(SpeciesName)`
Warning message in inner_join(., y = OrdwaySpeciesNames):
“Detected an unexpected many-to-many relationship between `x` and `y`.
 Row 4 of `x` matches multiple rows in `y`.
 Row 211 of `y` matches multiple rows in `x`.
 If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.”
A data.frame: 6 × 3
SpeciesMonthDay
<chr><dbl><dbl>
1Song Sparrow 716
3Song Sparrow 716
4Field Sparrow716
5Field Sparrow716
6Field Sparrow716
7Field Sparrow716

Task 3#

Call the variable that contains the total count. Arrange this into descending order from the species with the most birds, and look through the list. (Hint: Remember n(). Also, one of the arguments to one of the data verbs will be desc(count) to arrange the cases into descending order. Display the top 10 species in terms of the number of bird captures.) Define for yourself a “major species” as a species with more than a particular threshold count. Set your threshold so that there are 5 or 6 species designated as major. Filter to produce a data frame with only the birds that belong to a major species. Save the output in a table called Majors. (Hint: Remember that summary functions can be used case-by-case when filtering or mutating a data frame that has been grouped.)

[1] How many bird captures are reported for each of the corrected species?

See below for the result (major species threshold >= 1000).

Corrected %>%
  count(Species) %>%
  arrange(desc(n)) %>%
  head(n = 10)
A data.frame: 10 × 2
Speciesn
<chr><int>
1Slate-colored Junco 2732
2Tree Swallow 1537
3Black-capped Chickadee1327
4American Goldfinch 1204
5Field Sparrow 1164
6Lincoln's Sparrow 790
7Robin 608
8Catbird 554
9Song Sparrow 512
10House Wren 460
Corrected %>%
  group_by(Species) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(n = 10)
A tibble: 10 × 2
Speciescount
<chr><int>
Slate-colored Junco 2732
Tree Swallow 1537
Black-capped Chickadee1327
American Goldfinch 1204
Field Sparrow 1164
Lincoln's Sparrow 790
Robin 608
Catbird 554
Song Sparrow 512
House Wren 460
Majors <-
  Corrected %>%
    group_by(Species) %>%
    summarize(count = n()) %>%
    arrange(desc(count)) %>%
    filter(count >= 1000)
Majors
A tibble: 5 × 2
Speciescount
<chr><int>
Slate-colored Junco 2732
Tree Swallow 1537
Black-capped Chickadee1327
American Goldfinch 1204
Field Sparrow 1164

Task 4#

When you have correctly produced Majors, write a command that produces the month-by-month count of each of the major species. Call this table ByMonth. Display this month-by-month count with a bar chart arranged in a way that you think tells the story of what time of year the various species appear. You can use mplot() to explore different possibilies. (Warning: mplot() and similar interactive functions should not appear in your Rmd file, it needs to be used interactively from the console. Use the “Show Expression” button in mplot() to create an expression that you can cut and paste into a chunk in your Rmd document, so that the graph gets created when you compile it.) Once you have the graph, use it to answer these questions:

[1] Which species are present year-round?

  • American Goldfinch (11-12 mo)

  • Black-capped Chickadee (12 mo)

[2] Which species are migratory, that is, primarily present in one or two seasons?

  • Field Sparrow (6 mo)

  • Slate-colored Junco (8-9 mo)

  • Tree Swallow (3-5 mo)

[3] What is the peak month for each major species?

  • 10 American Goldfinch

  • 11 Black-capped Chickadee

  • 05 Field Sparrow

  • 10 Slate-colored Junco

  • 06 Tree Swallow

[4] Which major species are seen in good numbers for at least 6 months of the year? (Hint: n_distinct() and >= 6.)

Arguably, the only species that is not seen in good numbers for at least 6 months of the year is the tree swallow.

ByMonth <-
  OrdwayBirds %>%
    group_by(SpeciesName, Month = as.integer(Month)) %>%
    summarize(count = n()) %>%
    filter(SpeciesName %in% Majors$Species)
ByMonth
`summarise()` has grouped output by 'SpeciesName'. You can override using the `.groups` argument.
A grouped_df: 47 × 3
SpeciesNameMonthcount
<chr><int><int>
American Goldfinch 1 10
American Goldfinch 2 51
American Goldfinch 3 48
American Goldfinch 4 21
American Goldfinch 5 125
American Goldfinch 6 63
American Goldfinch 7 67
American Goldfinch 8 70
American Goldfinch 9 151
American Goldfinch 10 364
American Goldfinch 11 180
American Goldfinch 12 3
Black-capped Chickadee 1 56
Black-capped Chickadee 2 140
Black-capped Chickadee 3 96
Black-capped Chickadee 4 51
Black-capped Chickadee 5 48
Black-capped Chickadee 6 20
Black-capped Chickadee 7 13
Black-capped Chickadee 8 11
Black-capped Chickadee 9 66
Black-capped Chickadee10 173
Black-capped Chickadee11 271
Black-capped Chickadee12 165
Field Sparrow 4 83
Field Sparrow 5 197
Field Sparrow 6 15
Field Sparrow 7 79
Field Sparrow 8 64
Field Sparrow 9 74
Field Sparrow 10 69
Field Sparrow 11 1
Slate-colored Junco 1 113
Slate-colored Junco 2 61
Slate-colored Junco 3 188
Slate-colored Junco 4 694
Slate-colored Junco 5 1
Slate-colored Junco 8 1
Slate-colored Junco 9 35
Slate-colored Junco 101178
Slate-colored Junco 11 272
Slate-colored Junco 12 174
Tree Swallow 4 2
Tree Swallow 5 11
Tree Swallow 6 171
Tree Swallow 7 16
Tree Swallow 11 1
ByMonth %>%
  group_by(SpeciesName) %>%
  summarize(
    MonthsPerYear   = n(),
    SixMonthsOrMore = n_distinct(Month) >= 6
  )
A tibble: 5 × 3
SpeciesNameMonthsPerYearSixMonthsOrMore
<chr><int><lgl>
American Goldfinch 12 TRUE
Black-capped Chickadee12 TRUE
Field Sparrow 8 TRUE
Slate-colored Junco 10 TRUE
Tree Swallow 5FALSE
ByMonth %>%
  ggplot() +
    geom_bar(
      mapping     = aes(x = Month, y = count, fill = SpeciesName),
      na.rm       = FALSE,
      position    = 'stack',
      show.legend = TRUE,
      stat        = 'identity'
    ) +
    scale_x_continuous(breaks = 1:12)
../../../../_images/1bedb06c18461fcfb7f548b06bae4b55139fb3c83d148d96ee5c5b47b35daa60.png