Forecasting: Principles and Practice 3#

Hyndman, Rob J. & George Athanasopoulos. (2021). Forecasting: Principles and Practice. 3rd Ed. OTexts. Home.


Revised

14 Jun 2023


Programming Environment#

library(fpp3)
library(GGally)
library(latex2exp)

sessionInfo()
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── fpp3 0.5 ──
 tibble      3.2.1      tsibble     1.1.3
 dplyr       1.1.2      tsibbledata 0.4.1
 tidyr       1.3.0      feasts      0.3.1
 lubridate   1.9.2      fable       0.3.3
 ggplot2     3.4.3      fabletools  0.3.3
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── fpp3_conflicts ──
 lubridate::date()    masks base::date()
 dplyr::filter()      masks stats::filter()
 tsibble::intersect() masks base::intersect()
 tsibble::interval()  masks lubridate::interval()
 dplyr::lag()         masks stats::lag()
 tsibble::setdiff()   masks base::setdiff()
 tsibble::union()     masks base::union()
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] latex2exp_0.9.6   GGally_2.1.2      fable_0.3.3       feasts_0.3.1     
 [5] fabletools_0.3.3  tsibbledata_0.4.1 tsibble_1.1.3     ggplot2_3.4.3    
 [9] lubridate_1.9.2   tidyr_1.3.0       dplyr_1.1.2       tibble_3.2.1     
[13] fpp3_0.5         

loaded via a namespace (and not attached):
 [1] rappdirs_0.3.3       utf8_1.2.3           generics_0.1.3      
 [4] anytime_0.3.9        stringi_1.7.12       digest_0.6.31       
 [7] magrittr_2.0.3       evaluate_0.21        grid_4.3.0          
[10] timechange_0.2.0     RColorBrewer_1.1-3   pbdZMQ_0.3-9        
[13] fastmap_1.1.1        plyr_1.8.8           jsonlite_1.8.5      
[16] reshape_0.8.9        purrr_1.0.2          fansi_1.0.4         
[19] scales_1.2.1         cli_3.6.1            rlang_1.1.1         
[22] crayon_1.5.2         ellipsis_0.3.2       munsell_0.5.0       
[25] base64enc_0.1-3      withr_2.5.0          repr_1.1.6          
[28] tools_4.3.0          uuid_1.1-0           colorspace_2.1-0    
[31] IRdisplay_1.1        vctrs_0.6.3          R6_2.5.1            
[34] lifecycle_1.0.3      stringr_1.5.0        pkgconfig_2.0.3     
[37] pillar_1.9.0         gtable_0.3.3         glue_1.6.2          
[40] Rcpp_1.0.10          tidyselect_1.2.0     rstudioapi_0.15.0   
[43] IRkernel_1.3.2       farver_2.1.1         htmltools_0.5.5     
[46] compiler_4.3.0       distributional_0.3.2

2 - Time series graphics#

2.1 tsibble objects#

[Time Series]

A time series can be thought of as a list of numbers (the measurements), along with some information about what times those numbers were recorded (the index). This information can be stored as a tsibble object in R.

[Tsibble]

  • three components

    • index: time information about the observation

    • one or more keys: optional unique identifiers for each series

    • one or more measured variables: number of interest

tsibble objects extend tidy data frames (tibble objects) by introducing temporal structure.

A tsibble allows storage and manipulation of multiple time series in R.

A tsibble allos multiple time series to be stored in a single object.

For observations that are more frequent than once per year, we need to use a time class function on the index.

  • yearmonth

  • start:end annual

  • yearquarter() quarterly

  • yearmonth() monthly

  • yearweek() weekly

  • as_date(), ymd() daily

  • as_datetime(), ymd_hms() sub daily

head(tsibbledata::global_economy)
A tbl_ts: 6 × 9
CountryCodeYearGDPGrowthCPIImportsExportsPopulation
<fct><fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
AfghanistanAFG1960 537777811NANA 7.024793 4.1322338996351
AfghanistanAFG1961 548888896NANA 8.097166 4.4534439166764
AfghanistanAFG1962 546666678NANA 9.349593 4.8780519345868
AfghanistanAFG1963 751111191NANA16.863910 9.1716019533954
AfghanistanAFG1964 800000044NANA18.055555 8.8888939731361
AfghanistanAFG19651006666638NANA21.41280311.2582799938414
head(tsibble::tourism)
A tbl_ts: 6 × 5
QuarterRegionStatePurposeTrips
<qtr><chr><chr><chr><dbl>
1998 Q1AdelaideSouth AustraliaBusiness135.0777
1998 Q2AdelaideSouth AustraliaBusiness109.9873
1998 Q3AdelaideSouth AustraliaBusiness166.0347
1998 Q4AdelaideSouth AustraliaBusiness127.1605
1999 Q1AdelaideSouth AustraliaBusiness137.4485
1999 Q2AdelaideSouth AustraliaBusiness199.9126
y <- tsibble(
  Year        = 2015:2019,
  Observation = c(123, 39, 78, 52, 110),
  index       = Year
)
y
A tbl_ts: 5 × 2
YearObservation
<int><dbl>
2015123
2016 39
2017 78
2018 52
2019110
z <- tibble(
  Month       = c('2019 Jan', '2019 Feb', '2019 Mar', '2019 Apr', '2019 May'),
  Observation = c(50, 23, 34, 30, 25)
)
z

z %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(index = Month)
A tibble: 5 × 2
MonthObservation
<chr><dbl>
2019 Jan50
2019 Feb23
2019 Mar34
2019 Apr30
2019 May25
A tbl_ts: 5 × 2
MonthObservation
<mth><dbl>
2019 Jan50
2019 Feb23
2019 Mar34
2019 Apr30
2019 May25
tsibbledata::olympic_running %>%
  head()

tsibbledata::olympic_running %>%
  distinct(Length, Sex) %>%
  head()
A tbl_ts: 6 × 4
YearLengthSexTime
<int><int><chr><dbl>
1896100men12.0
1900100men11.0
1904100men11.0
1908100men10.8
1912100men10.8
1916100men NA
A tibble: 6 × 2
LengthSex
<int><chr>
100men
100women
200men
200women
400men
400women
tsibbledata::PBS %>%
  head()

tsibbledata::PBS %>%
  filter(ATC2 == 'A10') %>%
  head()

tsibbledata::PBS %>%
  filter(ATC2 == 'A10') %>%
  select(Month, Concession, Type, Cost) %>%
  head()

tsibbledata::PBS %>%
  filter(ATC2 == 'A10') %>%
  select(Month, Concession, Type, Cost) %>%
  summarize(TotalC = sum(Cost)) %>%
  head()

tsibbledata::PBS %>%
  filter(ATC2 == 'A10') %>%
  select(Month, Concession, Type, Cost) %>%
  summarize(TotalC = sum(Cost)) %>%
  mutate(Cost = TotalC / 1e6) %>%
  head()

a10 <- tsibbledata::PBS %>%
  filter(ATC2 == 'A10') %>%
  select(Month, Concession, Type, Cost) %>%
  summarize(TotalC = sum(Cost)) %>%
  mutate(Cost = TotalC / 1e6)
A tbl_ts: 6 × 9
MonthConcessionTypeATC1ATC1_descATC2ATC2_descScriptsCost
<mth><chr><chr><chr><chr><chr><chr><dbl><dbl>
1991 JulConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1822867877
1991 AugConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1532757011
1991 SepConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1477555020
1991 OctConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1538057222
1991 NovConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1437152120
1991 DecConcessionalCo-paymentsAAlimentary tract and metabolismA01STOMATOLOGICAL PREPARATIONS1502854299
A tbl_ts: 6 × 9
MonthConcessionTypeATC1ATC1_descATC2ATC2_descScriptsCost
<mth><chr><chr><chr><chr><chr><chr><dbl><dbl>
1991 JulConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY897332092878
1991 AugConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY771011795733
1991 SepConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY762551777231
1991 OctConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY786811848507
1991 NovConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY705541686458
1991 DecConcessionalCo-paymentsAAlimentary tract and metabolismA10ANTIDIABETIC THERAPY758141843079
A tbl_ts: 6 × 4
MonthConcessionTypeCost
<mth><chr><chr><dbl>
1991 JulConcessionalCo-payments2092878
1991 AugConcessionalCo-payments1795733
1991 SepConcessionalCo-payments1777231
1991 OctConcessionalCo-payments1848507
1991 NovConcessionalCo-payments1686458
1991 DecConcessionalCo-payments1843079
A tbl_ts: 6 × 2
MonthTotalC
<mth><dbl>
1991 Jul3526591
1991 Aug3180891
1991 Sep3252221
1991 Oct3611003
1991 Nov3565869
1991 Dec4306371
A tbl_ts: 6 × 3
MonthTotalCCost
<mth><dbl><dbl>
1991 Jul35265913.526591
1991 Aug31808913.180891
1991 Sep32522213.252221
1991 Oct36110033.611003
1991 Nov35658693.565869
1991 Dec43063714.306371
prison <- readr::read_csv('https://OTexts.com/fpp3/extrafiles/prison_population.csv')
prison <- prison %>%
  mutate(Quarter = yearquarter(Date)) %>%
  select(-Date) %>%
  as_tsibble(key = c(State, Gender, Legal, Indigenous), index = Quarter)
head(prison)
Rows: 3072 Columns: 6
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): State, Gender, Legal, Indigenous
dbl  (1): Count
date (1): Date
 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
A tbl_ts: 6 × 6
StateGenderLegalIndigenousCountQuarter
<chr><chr><chr><chr><dbl><qtr>
ACTFemaleRemandedATSI02005 Q1
ACTFemaleRemandedATSI12005 Q2
ACTFemaleRemandedATSI02005 Q3
ACTFemaleRemandedATSI02005 Q4
ACTFemaleRemandedATSI12006 Q1
ACTFemaleRemandedATSI12006 Q2

2.2 Time plots#

melsyd_economy <- tsibbledata::ansett %>%
  filter(Airports == 'MEL-SYD', Class == 'Economy') %>%
  mutate(Passengers = Passengers / 100)

autoplot(melsyd_economy, Passengers) +
  labs(
    title    = "Ansett airlines economy class",
    subtitle = 'Melbourne-Sydney',
    y        = "Passengers ('000)"
  )
../../../_images/ae26f237baa40a1c027ee8a4173e0585e90d4507d818228e0493d0a5d881c55e.png
autoplot(a10, Cost) +
  labs(
    y     = '$ (millions)',
    title = 'Australian antidiabetic drug sales'
  )
../../../_images/d8cf5dfa5b8fa64b743d87a864cb05bab18ce415269415006a750fc8a61a3a8a.png

2.4 Seasonal plots#

a10 %>%
  gg_season(Cost, labels = 'both') +
    labs(
      y     = '$ (millions)',
      title = 'Seasonal plot: Antidiabetic drug sales'
    )
../../../_images/711cdfb5c62f5752d2946a4cb1a54a0257e95b87f5065b951e948a1968b3395b.png
tsibbledata::vic_elec %>%
  gg_season(Demand, period = 'day') +
    theme(legend.position = 'none') +
    labs(
      y     = 'MWh',
      title = 'Electricity demand: Victoria'
    )
../../../_images/ccf50dd88e4754128770b0b0c3f53a7179ffa60ffa11444635fe4bf702d90f5c.png
tsibbledata::vic_elec %>%
  gg_season(Demand, period = 'week') +
    theme(legend.position = 'none') +
    labs(
      y     = 'MWh',
      title = 'Electricity demand: Victoria'
    )
../../../_images/95d74dd41bb4b8d66a18b4859037b874487e5f33b82c685a4a8007d39391bbbb.png
tsibbledata::vic_elec %>%
  gg_season(Demand, period = 'year') +
    labs(
      y     = 'MWh',
      title = 'Electricity demand: Victoria'
    )
../../../_images/50a6f57c56677433ad035bd2c8b762e29bb992d706bc7afca17488c279294555.png

2.5 Seasonal subseries plots#

a10 %>%
  gg_subseries(Cost) +
    labs(
      y     = '$ (millions)',
      title = 'Australian antidiabetic drug sales'
    )
../../../_images/40961a773cdff6603931354ef2b3dcbb54af62bfe20d7a371700e24811dc5d10.png
holidays <- tsibble::tourism %>%
  filter(Purpose == 'Holiday') %>%
  group_by(State) %>%
  summarize(Trips = sum(Trips))
head(holidays)
A tbl_ts: 6 × 3
StateQuarterTrips
<chr><qtr><dbl>
ACT1998 Q1196.2186
ACT1998 Q2126.7706
ACT1998 Q3110.6796
ACT1998 Q4170.4722
ACT1999 Q1107.7792
ACT1999 Q2124.6442
autoplot(holidays, Trips) +
  labs(
    y     = "Overnight trips ('000)",
    title = 'Australian domestic holidays'
  )
../../../_images/469c214f424b9800b7559ffc58ca042dfffe03dd00e15286c8014cccf6113109.png
gg_season(holidays, Trips) +
  labs(
    y     = "Overnight trips ('000)",
    title = 'Australian domestic holidays'
  )
../../../_images/921fb2297970e35c980200710f837ade77fcdefffa0076aa5146f99b4e1f96c3.png
gg_subseries(holidays, Trips) +
  labs(
    y     = "Overnight trips ('000)",
    title = 'Australian domestic holidays'
  )
../../../_images/ca15bfbd74b55916e2944e4c6eb4d1359cd1e3330bd6d824df91b7a1e76ccb87.png

2.6 Scatterplots#

tsibbledata::vic_elec %>%
  filter(year(Time) == 2014) %>%
  autoplot(Demand) +
    labs(
      y     = 'GW',
      title = 'Half-hourly electricity demand: Victoria'
    )
../../../_images/d5f0cee810eae8c854883ccf787f2393e70f7bf32b68d66076bc9448a23b6818.png
tsibbledata::vic_elec %>%
  filter(year(Time) == 2014) %>%
  autoplot(Temperature) +
    labs(
      y     = 'Degrees Celsius',
      title = 'Half-hourly temperatures: Melbourne, Australia'
    )
../../../_images/5dea00cdf5cfa40f87a16e4a5d518653d00988604f8f176ac56973d5599036a2.png
tsibbledata::vic_elec %>%
  filter(year(Time) == 2014) %>%
  ggplot(aes(x = Temperature, y = Demand)) +
    geom_point() +
    labs(
      x = 'Temperature (degrees Celsius)',
      y = 'Electricity demand (GW)'
    )
../../../_images/043341e1ce68f74d6ccb11cfa490dd95f51a1ded1b1ab7851efeb155a1ac31c3.png
visitors <- tsibble::tourism %>%
  group_by(State) %>%
  summarize(Trips = sum(Trips))
visitors %>%
  ggplot(aes(x = Quarter, y = Trips)) +
    geom_line() +
    facet_grid(vars(State), scales = 'free_y') +
    labs(
      title = 'Australian domestic tourism',
      y     = "Overnight trips ('000)"
    )
../../../_images/914618ae5b4370c92b6c6205864624789a2fcdbe418a7dc9ad7a13182a4999b2.png
visitors %>%
  pivot_wider(values_from = Trips, names_from = State) %>%
  GGally::ggpairs(columns = 2:9)
../../../_images/cf2649991c2ddd556a8b835ebfadef8db7515434f4c89df1e7a2da5b4a53fa0e.png

2.7 Lag plots#

recent_production <- tsibbledata::aus_production %>%
  filter(year(Quarter) >= 2000)
recent_production %>%
  gg_lag(Beer, geom = 'point') +
    labs(
      x = 'lag(Beer, k)'
    )
../../../_images/131ac6558fdb5e16ae8de20915f45f69524b1ecb4a4353d426200f4c563ae4b2.png

2.8 Autocorrelation#

recent_production %>%
  ACF(Beer, lag_max = 9)
A tbl_cf: 9 × 2
lagacf
<cf_lag><dbl>
1Q-0.052981076
2Q-0.758175440
3Q-0.026233757
4Q 0.802204530
5Q-0.077471204
6Q-0.657451271
7Q 0.001194922
8Q 0.707254078
9Q-0.088756255
recent_production %>%
  ACF(Beer) %>%
  autoplot() +
    labs(
      title = 'Australian beer production'
    )
../../../_images/8381cb0335980a2b22610b87ea899085b94a5fd1725943b06e6e3c6747ee0e8e.png
a10 %>%
  ACF(Cost, lag_max = 48) %>%
  autoplot() +
    labs(
      title = 'Australian antidiabetic drug sales'
    )
../../../_images/a157e4a2a66bdc871936d6ef33c79e132d9a908170b6ac8b7b9284dc6b9e98fb.png

2.9 White noise#

set.seed(30)

y <- tsibble(
  sample = 1:50,
  wn     = rnorm(50),
  index  = sample
)
y %>%
  autoplot(wn) +
    labs(
      title = 'White noise',
      y     = ''
    )
../../../_images/95b7120cd3672ffd0147e6c8e1493ef07a0f7a06cf50c2cbe31a7e669d929b0e.png
y %>%
  ACF(wn) %>%
  autoplot() +
    labs(
      title = 'White noise'
    )
../../../_images/254c20dbffca51e92a790bdf036d639456631a1dae4efa8a9f88809e8db87b41.png

3 - Time series decomposition#

3.1 Transformations and adjustments#

tsibbledata::global_economy %>%
  filter(Country == 'Australia') %>%
  autoplot(GDP/Population) +
    labs(
      title = 'GDP per capita',
      y     = '$US'
    )
../../../_images/c8105e351a03281e9dcaaa047a09500d8156e89220d9a7efa489b62160ade1d6.png
print_retail <- tsibbledata::aus_retail %>%
  filter(Industry == 'Newspaper and book retailing') %>%
  group_by(Industry) %>%
  index_by(Year = year(Month)) %>%
  summarize(Turnover = sum(Turnover))
  
aus_economy <- tsibbledata::global_economy %>%
  filter(Code == 'AUS')
print_retail %>%
  left_join(aus_economy, by = 'Year') %>%
  mutate(Adjusted_turnover = Turnover / CPI * 100) %>%
  pivot_longer(c(Turnover, Adjusted_turnover), values_to = 'Turnover') %>%
  mutate(name = factor(name, levels = c('Turnover', 'Adjusted_turnover'))) %>%
  ggplot(aes(x = Year, y = Turnover)) +
    geom_line() +
    facet_grid(name ~ ., scales = 'free_y') +
    labs(
      title = 'Turnover: Australian print media industry',
      y     = '$AU'
    )
Warning message:
“Removed 1 row containing missing values (`geom_line()`).”
../../../_images/e563a6342f53ffd43beb2c912faf9ec2d7cb7239d7e2a8a742c3dd1415e5e380.png
lambda <- tsibbledata::aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
tsibbledata::aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(
    y = '',
    title = latex2exp::TeX(paste0('Transformed gas production with $\\lambda$ = ', round(lambda, 2)))
  )
../../../_images/a103eba2d7b8b3adf58d27c7cc81b87096e40a3fb5da11e7252e51847e578215.png

Bibliography#

Hyndman, Rob J. & George Athanasopoulos. (2021). Forecasting: Principles and Practice. 3rd Ed. OTexts. Home.