統計データ解析

第13講サンプルコード

Published

January 10, 2025

準備

以下で利用する共通パッケージを読み込む．

library(conflicted) # 関数名の衝突を警告
conflicts_prefer(   # 優先的に使う関数を指定
    dplyr::filter(),
    dplyr::select(),
    dplyr::lag(),
    )
library(tidyverse)
library(fable)
library(tsibble)
library(feasts)
library(gt)
library(patchwork)
#' macOSのための日本語表示の設定
if(Sys.info()["sysname"] == "Darwin") { # macOSか調べる
    theme_update(text = element_text(family = "HiraMaruProN-W4"))
    label_family <- "HiraMaruProN-W4"
} else {label_family <- NULL}

自己相関

問題

以下の問に答えなさい．

同じAR過程のモデルから生成した時系列の自己相関を比較しなさい．(前回の練習問題を利用すればよい)
MA過程についても同様な比較を行いなさい．
ARMA過程についても同様な比較を行いなさい．

解答例

前回作成した自作の関数 my_arma() を利用しても良いが，今回は関数 stats::arima.sim() を用いた方法を紹介する．

Tmax <- 500 # 時系列の長さ t=1,..,Tmax
K <- 4 # 表示する時系列の数 (4つを並べて比較する)
ts_ar <- ts(replicate(K, arima.sim(list(ar = c(0.67, 0.26)),
                                   n = Tmax)))
ts_ma <- ts(replicate(K, arima.sim(list(ma = c(0.44, -0.28)),
                                   n = Tmax)))
ts_arma <- ts(replicate(K, arima.sim(list(ar = c(0.8, -0.64),
                                          ma = c(-0.5)),
                                     n = Tmax)))

AR(2)モデルの自己相関を計算する．

ts_ar |> as_tsibble() |> ACF(value) |> autoplot()

ts_ar |> as_tsibble() |> ACF(value) |>
    autoplot() + # 格子状に並べるには facet を指定する
    facet_wrap(key ~ .) + # 適宜調整してくれる
    labs(title = "AR(2)")

MA(2)モデルの自己相関を計算する．

ts_ma |> as_tsibble() |> ACF(value) |>
    autoplot() + 
    facet_wrap(key ~ .) +
    labs(title = "MA(2)")

ARMA(2,1)モデルの自己相関を計算する．

ts_arma |> as_tsibble() |> ACF(value) |>
    autoplot() + 
    facet_wrap(key ~ .) +
    labs(title = "ARMA(2,1)")

同じホワイトノイズでAR,MA,ARMAを生成するには以下のようにすればよい．

epsilon <- rnorm(Tmax)
toy_ar <- arima.sim(list(ar = c(0.67, 0.26)),
                    n = Tmax,
                    innov = epsilon)
toy_ma <- arima.sim(list(ma = c(0.44, -0.28)),
                    n = Tmax,
                    innov = epsilon)
toy_arma <- arima.sim(list(ar = c(0.8, -0.64),
                           ma = c(-0.5)),
                      n = Tmax,
                      innov = epsilon)
ts(bind_cols(`1.AR`=toy_ar, `2.MA`=toy_ma, `3.ARMA`=toy_arma)) |> 
    as_tsibble() |> ACF(value) |> autoplot()

なお，facet はそのままだとアルファベット順に並べられるので，ここでは簡単に番号を付けて回避している．

AR/ARMAモデルの推定

問題

以下の問に答えなさい．

AR過程を生成し，関数 fable::AR() を用いて係数を推定しなさい．
ARMA過程を生成し，関数 fable::ARIMA() を用いて係数を推定しなさい．
推定結果の妥当性を残差の自己相関係数を調べることによって確認しなさい．

解答例

AR(2)過程に従うデータを生成する．

toy_ar <- arima.sim(model = list(ar = c(0.67, 0.26)),
                    n = 5000) |> # 時系列の長さも自由に変更せよ
    as_tsibble()

関数 fabletools::model() + fable::AR() を用いて推定を行う．

toy_ar_fit <- # 自動的に推定されたモデルを保存
    toy_ar |> model(AR(value)) 
toy_ar_fit |> report()   # 推定されたモデル

Series: value 
Model: AR(2) 

Coefficients:
     ar1    ar2
  0.6796  0.246

sigma^2 estimated as 0.9891
AIC = -8670.2   AICc = -8670.2  BIC = -8657.16

toy_ar_fit |> tidy()     # 推定された係数とその評価

# A tibble: 2 × 6
  .model    term  estimate std.error statistic  p.value
  <chr>     <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 AR(value) ar1      0.680    0.0137      49.6 0       
2 AR(value) ar2      0.246    0.0137      17.9 7.52e-70

toy_ar_fit |> accuracy() # 推定されたモデルのあてはまりの評価

# A tibble: 1 × 10
  .model    .type          ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
  <chr>     <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1 AR(value) Training -0.00527 0.995 0.794 -3.79  279. 0.944 0.945 -0.00396

toy_ar_fit |> glance()   # 推定されたモデルの情報量規準など

# A tibble: 1 × 6
  .model    sigma2    AIC   AICc    BIC   dof
  <chr>      <dbl>  <dbl>  <dbl>  <dbl> <int>
1 AR(value)  0.989 -8670. -8670. -8657.  4998

ある程度長い系列であれば良い推定が得られる． n = 1000 などとして推定結果がどのように変わるか試してみよ．

ARMA(2,1)過程に従うデータを生成する．

toy_arma <- arima.sim(model = list(ar = c(0.67, 0.26), ma = c(-0.5)),
                      n = 2000) |> 
    as_tsibble()

関数 fabletools::model() + fable::ARIMA() を用いて推定を行う．

toy_arma_fit <- # 自動的に推定されたモデルを保存
    toy_arma |> model(
                    model0 = ARIMA(value ~ pdq(2,0,1)), # 正しいモデル
                    model1 = ARIMA(value ~ pdq(3,0,1)), # 間違ったモデル
                    model2 = ARIMA(value ~ pdq(2,0,2)), # 間違ったモデル
                    auto0  = ARIMA(value ~ pdq(d = 0)), # ARMAで自動推定(階差を取らない)
                    auto1  = ARIMA(value), # ARIMA全体で自動推定(モデルとしてはかなり冗長)
                    )
toy_arma_fit                             # 推定されたモデルの表示

# A mable: 1 x 5
          model0         model1         model2          auto0          auto1
         <model>        <model>        <model>        <model>        <model>
1 <ARIMA(2,0,1)> <ARIMA(3,0,1)> <ARIMA(2,0,2)> <ARIMA(2,0,1)> <ARIMA(2,0,1)>

toy_arma_fit |> glance()                 # 推定されたモデルの評価

# A tibble: 5 × 8
  .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 model0  0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>
2 model1  0.989  -2825. 5661. 5661. 5689. <cpl [3]> <cpl [1]>
3 model2  0.989  -2825. 5661. 5661. 5689. <cpl [2]> <cpl [2]>
4 auto0   0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>
5 auto1   0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>

toy_arma_fit |> glance() |> arrange(AIC) # AIC順に並べる

# A tibble: 5 × 8
  .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 model0  0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>
2 auto0   0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>
3 auto1   0.989  -2825. 5659. 5659. 5681. <cpl [2]> <cpl [1]>
4 model1  0.989  -2825. 5661. 5661. 5689. <cpl [3]> <cpl [1]>
5 model2  0.989  -2825. 5661. 5661. 5689. <cpl [2]> <cpl [2]>

対数尤度 (log_lik) は大きい方が観測データへのあてはまりは良い． AIC は小さい方が良い予測となることが期待される．自動推定では必ずしも正しいモデルが推定される訳ではないことに注意する．特に短い時系列では推定が難しい場合が多い．生成する系列の長さを変えて実験してみよ．

残差を評価する．

toy_arma_fit |> select(model0) |> # 正しいモデル
    gg_tsresiduals()

toy_arma_fit |> select(auto1) |>  # 自動推定されたARMAモデル
    gg_tsresiduals()

残差は無相関になっていることが確認できる．

東京の気温の時系列モデル

問題

東京の気候データを用いて以下の問に答えなさい．

tw_data <- read_csv("data/tokyo_weather.csv")

気温のデータを tsibble クラスに変換しなさい．
気温のデータおよびその階差の性質を検討しなさい．
ARIMAモデルを作成しなさい．

解答例

データを整理する．

tw_data <- read_csv("data/tokyo_weather.csv")
tw_tsbl <- tw_data |>
    mutate(date = as.Date(paste(year, month, day, sep = "-"))) |>
    select(date, temp) |>
    as_tsibble(index = date) # date を時系列の index に指定

データの視覚化を行う．

tw_tsbl |>
    autoplot(temp, colour = "red") +
    labs(title = "Temperature in Tokyo",
         x = "date", y = "temperature")

tw_tsbl |> # 一部を切り出して視覚化する
    filter_index("2023-06-01" ~ "2023-07-31") |>
    autoplot(temp, colour = "red") +
    labs(title = "Temperature in Tokyo",
         x = "date", y = "temperature")

データの性質を確認する．

tw_tsbl |> ACF(temp) |> # 自己相関
    autoplot()          # 減衰が遅いので差分をとった方が良さそう

tw_tsbl |> 
    autoplot(difference(temp), colour = "orange")

tw_tsbl |> ACF(difference(temp)) |> # 階差系列の自己相関
    autoplot()

階差系列 (d=1) にARMAモデルをあてはめる．

tw_fit <- tw_tsbl |>
    model(ARIMA(temp ~ pdq(d = 1) + PDQ(D = 0)))
report(tw_fit)       # 推定されたモデルの仕様を表示

Series: temp 
Model: ARIMA(1,1,4) 

Coefficients:
         ar1      ma1      ma2     ma3     ma4
      0.9864  -1.1857  -0.1704  0.1426  0.2260
s.e.  0.0115   0.0521   0.0816  0.0774  0.0501

sigma^2 estimated as 3.676:  log likelihood=-751.59
AIC=1515.19   AICc=1515.42   BIC=1538.57

tw_fit |>            # 残差の評価
    gg_tsresiduals() # そこそこあてはまりは良さそう

tw_fit |>            # データとあてはめ値の比較
    augment() |>
    autoplot(temp) +
    geom_line(aes(y = .fitted), # y軸にあてはめ値を指定
              colour = "blue", alpha = 0.3) +
    labs(title = "Fitted by ARIMA model",
         x = "date", y = "temperature")

時系列の予測

問題

以下の問に答えなさい．

COVID-19のデータを用いて分析・予測を行いなさい． https://covid19.mhlw.go.jp/public/opendata/newly_confirmed_cases_daily.csv
datasets::AirPassengers データを用いて分析・予測を行いなさい．

解答例

厚生労働省のCOVID-19の感染者数データ

以下は解析事例で紹介した内容となる．

データを取得する．

cp_tbl <-
    read_csv("https://covid19.mhlw.go.jp/public/opendata/newly_confirmed_cases_daily.csv") |>
    select(1:2) |>
    set_names(c("date","patients")) |>
    mutate(date = as_date(date))

時系列データ(tsibbleクラス)へ変更する．

cp_tsbl <-
    cp_tbl |>
    as_tsibble(index = date)

データ(全国・全期間)を表示する．

cp_tsbl |>
    autoplot(patients, colour = "skyblue") +
    geom_col(fill = "skyblue") + # 塗り潰しを行う
    labs(title = "COVID-19 patients in Japan",
         x = "date",
         y = "number of patients")

第3波 (2020/9/15-2021/1/31) に着目する．

cp_3rd_train <- # 訓練データ
    cp_tsbl |>
    filter_index("2020-09-15" ~ "2020-11-30")
cp_3rd_test <-  # 試験データ
    cp_tsbl |>
    filter_index("2020-12-01" ~ "2021-01-31")

データを表示する．

bind_rows(cp_3rd_train,cp_3rd_test) |>
    autoplot(patients, colour = "skyblue") +
    geom_col(fill = "skyblue") + # 塗り潰しを行う
    labs(title = "COVID-19 patients in Japan",
         x = "date",
         y = "number of patients")

階差系列を描画する．

cp_3rd_train |>
    gg_tsdisplay(difference(patients),
                 plot_type = "partial")

7日周期の影響があることがわかる．

対数変換をした階差系列を描画する．

cp_3rd_train |>
    gg_tsdisplay(difference(log(patients)),
                 plot_type = "partial")

やはり7日周期の影響があることがわかる．

7日の周期(季節成分)での階差系列を描画する．

cp_3rd_train |>
    gg_tsdisplay(difference(difference(log(patients), lag = 7)),
                 lag_max = 21,
                 plot_type = "partial")

7日の相関は負になっているので，季節階差は取らなくても良さそう．

以下では対数変換した系列に対してARIMAモデルをあてはめる．

cp_3rd_arima <-
    cp_3rd_train |>
    model(ARIMA(log(patients) ~ PDQ(D = 0)))

推定結果を検討する．

cp_3rd_arima |> report()

Series: patients 
Model: ARIMA(1,1,1)(2,0,0)[7] 
Transformation: log(patients) 

Coefficients:
         ar1      ma1    sar1    sar2
      0.4493  -0.8309  0.3709  0.4232
s.e.  0.1635   0.0981  0.1212  0.1353

sigma^2 estimated as 0.03811:  log likelihood=15.04
AIC=-20.07   AICc=-19.21   BIC=-8.42

cp_3rd_arima |> accuracy()

# A tibble: 1 × 10
  .model                  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
  <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 ARIMA(log(patients) ~ … Trai…  20.9  193.  111. 0.400  13.4 0.521 0.616 0.0355

cp_3rd_arima |> glance()

# A tibble: 1 × 8
  .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
  <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
1 ARIMA(log(patients) ~ PDQ(… 0.0381    15.0 -20.1 -19.2 -8.42 <cpl>    <cpl>

データとあてはめ値を表示する．

cp_3rd_arima |>
    augment() |>
    autoplot(patients, colour = "skyblue") +
    geom_line(aes(y = .fitted), colour = "orange") +
    labs(title = "Fitted by ARIMA model",
         x = "date", y = "log(patients)")

残差の診断を行う．

cp_3rd_arima |> gg_tsresiduals()

残差の相関はだいぶ消えているので，そこそこ良いモデルといえそう．

後半を予測してみる．

cp_3rd_arima |> 
    forecast(h = nrow(cp_3rd_test)) |>
    autoplot(cp_3rd_train, level = 80) +
    autolayer(cp_3rd_test, .vars = patients, colour = "red") +
    labs(title = "Prediction by ARIMA model")

感染が収まるまでの増加傾向を上手く推定できていることがわかる．

第8波 (2022/11/01-2023/1/31) に着目する．

cp_8th_train <- # 訓練データ
    cp_tsbl |>
    filter_index("2022-11-01" ~ "2022-11-30")
cp_8th_test <-  # 試験データ
    cp_tsbl |>
    filter_index("2022-12-01" ~ "2023-01-31")

データの表示を行う．

bind_rows(cp_8th_train,cp_8th_test) |>
    autoplot(patients, colour = "skyblue") +
    geom_col(fill = "skyblue") + # 塗り潰しを行う
    labs(title = "COVID-19 patients in Japan",
         x = "date",
         y = "number of patients")

階差系列を描画する．

cp_8th_train |>
    gg_tsdisplay(difference(patients),
                 plot_type = "partial")

対数変換をした階差系列を描画する．

cp_8th_train |>
    gg_tsdisplay(difference(log(patients)),
                 plot_type = "partial")

7日の周期(季節成分)での階差系列を描画する．

cp_8th_train |>
    gg_tsdisplay(difference(difference(log(patients), lag = 7)),
                 lag_max = 21,
                 plot_type = "partial")

対数変換した系列に対してARIMAモデルをあてはめる．

cp_8th_arima <-
    cp_8th_train |>
    model(ARIMA(log(patients) ~ PDQ(D = 0)))

推定結果を検討する．

cp_8th_arima |> report()

Series: patients 
Model: ARIMA(0,1,1)(1,0,0)[7] 
Transformation: log(patients) 

Coefficients:
          ma1    sar1
      -0.8737  0.8784
s.e.   0.0687  0.0680

sigma^2 estimated as 0.04256:  log likelihood=0.19
AIC=5.62   AICc=6.58   BIC=9.72

cp_8th_arima |> accuracy()

# A tibble: 1 × 10
  .model               .type    ME   RMSE    MAE    MPE  MAPE  MASE RMSSE   ACF1
  <chr>                <chr> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
1 ARIMA(log(patients)… Trai… 2081. 14315. 10219. -0.867  13.8 0.636 0.744 -0.147

cp_8th_arima |> glance()

# A tibble: 1 × 8
  .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
  <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
1 ARIMA(log(patients) ~ PDQ(… 0.0426   0.192  5.62  6.58  9.72 <cpl>    <cpl>

データとあてはめ値を表示する．

cp_8th_arima |>
    augment() |>
    autoplot(patients, colour = "skyblue") +
    geom_line(aes(y = .fitted), colour = "orange") +
    labs(title = "Fitted by ARIMA model",
         x = "date", y = "log(patients)")

残差の診断を行う．

cp_8th_arima |> gg_tsresiduals()

後半を予測する．

cp_8th_arima |> 
    forecast(h = nrow(cp_8th_test)) |>
    autoplot(cp_8th_train, level = 80) +
    autolayer(cp_8th_test, .vars = patients, colour = "red") +
    labs(title = "Prediction by ARIMA model")

package::fable に実装されているいくつかの方法で予測してみる．

cp_8th_models <-
    cp_8th_train |>
    model( # 自動推定
        naive  = NAIVE(log(patients)),  # random walk model
        arima  = ARIMA(log(patients) ~ pdq() + PDQ(0,0,0)),
        sarima = ARIMA(log(patients) ~ pdq() + PDQ(D = 0)),
        ets    = ETS(log(patients)),    # exponential smoothing
        )
cp_8th_models |>
    forecast(h = nrow(cp_8th_test)) |>
    autoplot(cp_8th_train, level = NULL) +
    autolayer(cp_8th_test, .vars = patients, colour = "gray") +
    labs(title = "Prediction by various models")

AirPassengersデータの分析

データの時間に関する情報(月ごとのデータ)を表示する．

AirPassengers |> tsp()

[1] 1949.000 1960.917   12.000

データを表示する．

AirPassengers |> as_tsibble() |>
    autoplot(value) +
    labs(title = "AirPassengers",
         x = "Year", y = "Passengers/1000")

ほぼ線形のトレンドと増大する分散変動があることがわかる．

対数変換したデータを表示する．

AirPassengers |> log() |> as_tsibble() |>
    autoplot(value) +
    labs(title = "AirPassengers",
         x = "Year", y = "log(Passengers/1000)")

対数変換により分散変動が安定化していることがわかる．以下では対数変換したデータを扱う．

ap_tsbl <- AirPassengers |> as_tsibble()
ap_train <- ap_tsbl |> filter_index(~ "1958-12")   # 訓練データ
ap_test  <- ap_tsbl |> filter_index("1959-01" ~ .) # 試験データ(2年分)

まずトレンド(明らかな上昇傾向)について考察する．階差を取ることにより定常化できるか検討する．

ap_train |> 
    gg_tsdisplay(difference(log(value)),
                 plot_type = "partial")

lag=12(1年),24(2年)に強い自己相関(季節成分)がある．また lag=12(1年) に偏自己相関は残っている．

季節成分について考察するために， 12ヶ月で階差を取って同様に検討する．

ap_train |> 
    gg_tsdisplay(difference(difference(log(value), lag = 12)),
                 lag_max = 36,
                 plot_type = "partial")

lag=1,3,12(1年) に若干偏自己相関が残っている．また lag=36(3年) まで見ると上記以外の偏自己相関は誤差内といえる．

SARIMAモデルを作成する．

階差および季節成分の自己相関・偏自己相関から，階差系列については ARMA(pまたはq=1-3)，季節成分については ARMA(pまたはq=1-2) あたりを考える必要がありそう．いくつかのARIMAモデルの推定を行う．

ap_fit <-
    ap_train |>
    model( # 名前に特殊な文字を含む場合は``で括る
        `arima(0,1,2)(0,1,1)` = ARIMA(log(value) ~ pdq(0,1,2) + PDQ(0,1,1)),
        `arima d=1 D=1` = ARIMA(log(value) ~ pdq(d = 1) + PDQ(D = 1)),
        `arima auto` = ARIMA(log(value)),
        `ets` = ETS(log(value)),
        )

次数を指定したモデル order=pdq(0,1,2), seasonal=PDQ(0,1,1) を検討する．

ap_fit |> select(`arima(0,1,2)(0,1,1)`) |> report() # 推定されたモデルの概要

Series: value 
Model: ARIMA(0,1,2)(0,1,1)[12] 
Transformation: log(value) 

Coefficients:
          ma1      ma2     sma1
      -0.3425  -0.0224  -0.5406
s.e.   0.0978   0.0947   0.0877

sigma^2 estimated as 0.001445:  log likelihood=197.54
AIC=-387.07   AICc=-386.68   BIC=-376.38

ap_fit |> select(`arima(0,1,2)(0,1,1)`) |>
    gg_tsresiduals() # 残差の診断(モデルの診断)

ap_fit |> select(`arima(0,1,2)(0,1,1)`) |>
    augment() |> features(.innov, ljung_box, lag = 12) # Ljung-Box検定

# A tibble: 1 × 3
  .model              lb_stat lb_pvalue
  <chr>                 <dbl>     <dbl>
1 arima(0,1,2)(0,1,1)    6.19     0.906

残差の自己相関は小さいが残差の正規性は低い．残差に関する Ljung-Box 検定は

帰無仮説 : 残差は無作為
対立仮説 : 残差は無作為でない

であるので，p値が高い方が良い．

階差・季節成分の階差のみを指定したモデルを確認する．

ap_fit |> select(`arima d=1 D=1`) |> report()

Series: value 
Model: ARIMA(0,1,1)(0,1,1)[12] 
Transformation: log(value) 

Coefficients:
          ma1     sma1
      -0.3424  -0.5405
s.e.   0.1009   0.0877

sigma^2 estimated as 0.001432:  log likelihood=197.51
AIC=-389.02   AICc=-388.78   BIC=-381

ap_fit |> select(`arima d=1 D=1`) |>
    gg_tsresiduals()

ap_fit |> select(`arima d=1 D=1`) |>
    augment() |> features(.innov, ljung_box, lag = 12)

# A tibble: 1 × 3
  .model        lb_stat lb_pvalue
  <chr>           <dbl>     <dbl>
1 arima d=1 D=1    6.47     0.891

自動的に次数の選択を行ったモデルを確認する．

ap_fit |> select(`arima auto`) |> report()

Series: value 
Model: ARIMA(2,0,0)(0,1,1)[12] w/ drift 
Transformation: log(value) 

Coefficients:
         ar1     ar2     sma1  constant
      0.6159  0.2356  -0.5562    0.0179
s.e.  0.0944  0.0965   0.0898    0.0017

sigma^2 estimated as 0.001382:  log likelihood=201.77
AIC=-393.53   AICc=-392.95   BIC=-380.12

いずれにせよAIC最小のモデルは以下となる． order = pdq(0,1,1), seasonal = PDQ(0,1,1) 以降，このモデルを利用する．

予測値と信頼区間を描画する．

ap_fit |> select(`arima auto`) |>
    forecast(h = nrow(ap_test)) |>
    autoplot(ap_train) +
    autolayer(ap_test, value, colour = "purple") +
    labs(title = "Forecast from SARIMA model",
         x = "Year", y = "Passengers/1000")

時系列の分解(トレンド(level+slope) + 季節(12ヶ月周期) + ランダム)を行う ETS (exponential smoothing) モデルを確認する．

ap_fit |> select(`ets`) |> report()

Series: value 
Model: ETS(M,A,M) 
Transformation: log(value) 
  Smoothing parameters:
    alpha = 0.5480859 
    beta  = 0.0001017353 
    gamma = 0.0001002554 

  Initial states:
     l[0]       b[0]      s[0]     s[-1]     s[-2]    s[-3]   s[-4]    s[-5]
 4.798138 0.01002451 0.9812161 0.9595451 0.9858188 1.012039 1.03667 1.037824
    s[-6]     s[-7]     s[-8]   s[-9]    s[-10]    s[-11]
 1.020521 0.9969978 0.9984623 1.00612 0.9815133 0.9832722

  sigma^2:  0

      AIC      AICc       BIC 
-196.6814 -190.6814 -149.2941

ap_fit |> select(`ets`) |> gg_tsresiduals()

ap_fit |> select(`ets`) |> augment() |> features(.innov, ljung_box, lag = 12)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 ets       18.8    0.0941

残差に周期構造が残っているので帰無仮説が棄却される．

予測値と信頼区間を描画する．

ap_fit |> select(`ets`) |>
    forecast(h = nrow(ap_test)) |>
    autoplot(ap_train) +
    autolayer(ap_test, value, colour = "purple") +
    labs(title = "Forecast from ETS model",
         x = "Year", y = "Passengers/1000")

ARIMAとETSを比較する．

ap_train |>
    autoplot(value) +
    geom_line(aes(y = .fitted, colour = .model),
              data = ap_fit |> select(`arima auto`,`ets`) |> augment()) +
    labs(title = "ARIMA vs ETS (fitted)",
         x = "Year", y = "Passengers/1000")

ap_fit |> select(`arima auto`,`ets`) |>
    forecast(h = nrow(ap_test)) |>
    autoplot(ap_train, level = NULL) +
    autolayer(ap_test, value, colour = "violet") +
    labs(title = "ARIMA vs ETS (forecast)",
         x = "Year", y = "Passengers/1000")