Holt Winters Forecasting with CVG Temperature Data

Micrsoft Office stock photo

Introduction

Data

The goal of this exercise is to use Holt-Winters forecasting, also known as triple exponential smoothing, to predict future monthly average temperatures in the Cincinnati area. The data was taken from the National Weather Service, and contains the monthly temperature averages in Fahrenheit from January 2000 through December 2022.

I used the great e-textbook Forecasting: Principles and Practice by Hyndman and Athanasopoulos to inform my knowledge and methodology.

Holt-Winters

Exponential smoothing is a forecasting model that weights previous observations to inform its predicted values. As observations get older, their weights are reduced exponentially. This method did not account for trend or seasonality, but was adjusted by Holt, and later Winters, to capture both trend and seasonality, hence the name Holt-Winters.

Model parameters, as explained by Hyndman and Athanasopoulos:

“The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations — one for the level ℓt, one for the trend bt, and one for the seasonal component st, with corresponding smoothing parameters α, β∗ and γ.”

α, β and γ adjust the weight on recent observations (accounts for noise/randomness), trend, and seasonality, respectively.

Setup

Reshaping data

library(reshape2)
library(dplyr)
library(lubridate)
library(data.table)
library(readxl)
library(forecast)
library(ggplot2)
library(seasonal)

Loading temperature data:

temps <- read_excel("temps.xlsx", col_names = FALSE)
head(temps)

## # A tibble: 6 × 13
##   ...1  ...2  ...3   ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12 ...13
##   <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Year  Jan   Feb    Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec  
## 2 2000  27    37.5   45.9  51.4  64.9… 71.5… 72.4… 71.3  64.8  57.1  41    23.3 
## 3 2001  28.7  35.29… 38.1  56.8  63.6  71.0… 74.3  75.2  64.4… 55.7  49.6  38.4 
## 4 2002  35.6  35.9   41.3  54.6  58.7  73.5  77.9… 76.0… 70.7  53.5  41.4  32.4 
## 5 2003  22.6  26.9   43.2  54.9  60.8  67.3  73.4… 73.5… 64.4… 52.8  47.3  33.7…
## 6 2004  24.2  31.9   43.5  52.6  66.8  70.0… 73.5… 70.5… 68    55    46.2  32.6

We need to get this into time series format. Transposing and using reshape2 to melt the data frame:

#Transpose
temps_t <- transpose(temps)

#Fix column names
names(temps_t) <- temps_t[1,]
temps_t <- temps_t[-1,]
colnames(temps_t)[1] <- "Month"

#Melt
molten <- melt(temps_t, id.vars = c("Month"))
head(molten)

##   Month variable              value
## 1   Jan     2000                 27
## 2   Feb     2000               37.5
## 3   Mar     2000               45.9
## 4   Apr     2000               51.4
## 5   May     2000 64.900000000000006
## 6   Jun     2000 71.599999999999994

Next, converting to time series:

molten$date <- paste(molten$Month, molten$variable,  sep="-")

df <- molten %>% select(-c(Month,variable))
df$date <- my(df$date)
df$value <- as.numeric(df$value)

temps.ts <- ts(df$value, frequency = 12, start = c(2000, 1))

plot(temps.ts)

Time series anaysis

Time series decomposition

We can decompose the time series to extract specific elements such as noise, trend, and seasonality. The method used is X11 decomposition.

temps.ts %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of average monthly Cincinnati temperatures")

Here, we can see an unclear trend, strong seasonality, and moderate randomness.

Holt-Winters implementation

Model selection

The ets() function from the forecast package will automatically select a model based on the corrected Akaike information criterion, or AIC_c.

fit <- ets(temps.ts)
summary(fit)

## ETS(A,N,A) 
## 
## Call:
##  ets(y = temps.ts) 
## 
##   Smoothing parameters:
##     alpha = 0.004 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 53.8914 
##     s = -19.0083 -9.7741 1.8205 14.0466 20.593 21.6879
##            18.3794 9.9589 -0.1465 -11.2616 -21.8917 -24.4044
## 
##   sigma:  3.6412
## 
##      AIC     AICc      BIC 
## 2280.224 2282.070 2334.530 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.115928 3.547672 2.727518 -0.6958386 6.442588 0.6933699
##                    ACF1
## Training set 0.09159075

From the output, we can see that ets() generated a model with additive α, no β, and additive γ.

Per Hyndman and Athanasopoulos,

“The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the additive method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year, the seasonal component will add up to approximately zero. With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component.”

The omission of β lines up, as the decomposition did not show any clear trend.

Forecasting

Predicting 2023’s temperatures:

fit %>% forecast(h=12) %>%
  autoplot() + xlim(2020,2025) + ylab("Temperature (F)")

model <- fit %>% forecast(h=12)

Text output of forecast:

summary(model)

## 
## Forecast method: ETS(A,N,A)
## 
## Model Information:
## ETS(A,N,A) 
## 
## Call:
##  ets(y = temps.ts) 
## 
##   Smoothing parameters:
##     alpha = 0.004 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 53.8914 
##     s = -19.0083 -9.7741 1.8205 14.0466 20.593 21.6879
##            18.3794 9.9589 -0.1465 -11.2616 -21.8917 -24.4044
## 
##   sigma:  3.6412
## 
##      AIC     AICc      BIC 
## 2280.224 2282.070 2334.530 
## 
## Error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.115928 3.547672 2.727518 -0.6958386 6.442588 0.6933699
##                    ACF1
## Training set 0.09159075
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2023       29.61491 24.94850 34.28133 22.47824 36.75158
## Feb 2023       32.12756 27.46111 36.79402 24.99084 39.26429
## Mar 2023       42.75761 38.09112 47.42410 35.62083 49.89439
## Apr 2023       53.87266 49.20613 58.53919 46.73582 61.00950
## May 2023       63.97825 59.31168 68.64481 56.84135 71.11514
## Jun 2023       72.39888 67.73228 77.06548 65.26193 79.53583
## Jul 2023       75.70724 71.04060 80.37388 68.57023 82.84425
## Aug 2023       74.61246 69.94578 79.27913 67.47539 81.74952
## Sep 2023       68.06591 63.39920 72.73262 60.92879 75.20303
## Oct 2023       55.84010 51.17335 60.50685 48.70292 62.97728
## Nov 2023       44.24518 39.57840 48.91197 37.10795 51.38242
## Dec 2023       35.01068 30.34386 39.67751 27.87339 42.14798

Let’s check back in a year and see the accuracy of 2023’s forecast!