{"id":1724,"date":"2024-02-23T23:30:00","date_gmt":"2024-02-24T04:30:00","guid":{"rendered":"http:\/\/aristotle2digital.blogwyrm.com\/?p=1724"},"modified":"2024-02-19T08:01:44","modified_gmt":"2024-02-19T13:01:44","slug":"time-series-3-exponential-smoothing-and-holt-winter","status":"publish","type":"post","link":"https:\/\/aristotle2digital.blogwyrm.com\/?p=1724","title":{"rendered":"Time Series 3 &#8211; Exponential Smoothing and Holt-Winter"},"content":{"rendered":"\n<p>In the last post, we examined the Holt-Winter scheme for tracking the level, trend, and seasonal variations in a time series in a sequential fashion with some synthetic data designed to illustrate the algorithm in as clean a way as possible.&nbsp; In this post, we\u2019ll try the Holt-Winter method against real world data for US housing sales and will set some of the context for why the method works by comparing it to a related technique called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Moving_average\">moving average<\/a>.<\/p>\n\n\n\n<p>The data analyzed here were obtained from RedFin (<a href=\"https:\/\/www.redfin.com\/news\/data-center\/\">https:\/\/www.redfin.com\/news\/data-center\/<\/a>) but it isn\u2019t clear for how long RedFin will continue to make these data public as they list the data as being \u2018temporarily released\u2019.\u00a0 As a result, I\u2019ve linked the data file I\u2019ve used <a href=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/Redfin_home_sales.csv\">here<\/a>.<\/p>\n\n\n\n<p>We\u2019re going to approach these data in two ways.&nbsp; The first is by taking a historical look at the patterns in the data from the vantage point of hindsight on the entire span of home sales having been collected.&nbsp; In the second approach, we imagine what an agent working in the past thinks as the data come in one record at a time.&nbsp;<\/p>\n\n\n\n<p>The historical look starts with an overview of the number of homes sold in the time period starting in Feb 2012 and ending at May 2023.<\/p>\n\n\n\n\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_overview.png\" alt=\"\" class=\"wp-image-1723\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_overview.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_overview-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>These data show both seasonal and overall trend variations and so our expectation might be that Holt-Winter would do a good job but note two things: First, with the exception of the first pandemic year of 2020, each of the years shows the same pattern: sales are low in the winter months and strong in the summer ones. Second the trend (most easily seen by focusing on the summer peak) shows four distinct regions: a) from 2012-2017 there is an overall upward trend, b) from 2017-2020 the trend in now downward with a much shallower slope, c) the start of the pandemic lockdowns in 2020 breaks the smoothness of the trend and then the trend again has a positive slope over 2020-2021, and d) the trend is strongly downward afterwards.&nbsp; These data exhibit a real-world richness that the contrived data used in the last post did not and they should prove a solid test for a time series analysis agent\/algorithm.<\/p>\n\n\n\n<p>Depending on how \u2018intelligent\u2019 we want our analysis agent to be we could look at a variety of other factors to explain or inform these features.&nbsp; For our purposes, we\u2019ll content ourselves with looking at one other parameter, the median home sales price, mostly to satisfy our human curiosity.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_median_prices_overview.png\" alt=\"\" class=\"wp-image-1722\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_median_prices_overview.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_median_prices_overview-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>These data look much more orderly in their trend and seasonal variation over the time span from 2012-2020.&nbsp; Afterwards, there isn\u2019t a clear pattern in terms of trend and season.&nbsp;<\/p>\n\n\n\n<p>Our final historical analysis will be to try to understand the overall pattern of the data using a moving average defined as:<\/p>\n\n\n\n<p>\\[ {\\bar x}_{k,n} = \\frac{1}{n} \\sum_{i =k-n\/2}^{k+n\/2} x_i \\; . \\]<\/p>\n\n\n\n<p>The index $k$ specifies to which point of the underlying and $n$ the number of points to be used in the moving average.&nbsp; Despite the notation, $n$ is best when odd so that there are as many points before the $k$th one as there are after as this prevents the moving average from introducing a bias which shifts a peak in the average off of the place in the data where it occurs.&nbsp; In addition, there is an art in the selection of the value of $n$ between it being too small, thereby failing to smooth out unwanted fluctuations, and being too large which smears out the desired patterns.&nbsp; For these data, $n = 5$.&nbsp; The resulting moving average (in solid black overlaying the original data in the red dashed line) is:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_CMA.png\" alt=\"\" class=\"wp-image-1721\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_CMA.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_CMA-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>Any agent using this technique would clearly be able to describe the data as having a period of one year with a peak in the middle and perhaps an overall upward trend from 2012 to 2022 but then a sharp decline afterwards.&nbsp; But two caveats are in order.&nbsp; First and the most important one, the agent employing this technique to estimate a smoothed value on the $k$th time step must wait until at least $n\/2$ additional future points have come in.&nbsp; This requirement usually precludes being able to perform predictions in real time.&nbsp; The second is that the moving average is computationally burdensome when $n$ is large.<\/p>\n\n\n\n<p>By contrast, the Holt-Winter method can be used by an agent needing to analyze in real time and it is computationally clean.&nbsp; At the heart of the Holt-Winter algorithm is the notion of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_smoothing\">exponential smoothing<\/a> where the smoothed value at the $k$th step, $s_k$, is determined by the previous smoothed value $s_{k-1}$ and the current raw value $x_k$ according to<\/p>\n\n\n\n<p>\\[ s_k = \\alpha x_k + (1-\\alpha) s_{k-1} \\; . \\]<\/p>\n\n\n\n<p>Since $s_{k-1}$ was determined from a similar expression at the time point $k-1$, one can back substitute to eliminate all the smoothed values $s$ on the right-hand side in favor of the raw ones $x$ to get<\/p>\n\n\n\n<p>\\[ s_k &nbsp;= \\alpha x_k + (1-\\alpha)x_{k-1} + (1-\\alpha)^2 x_{k-2} + \\cdots + (1-\\alpha)^k x_0 \\; . \\]<\/p>\n\n\n\n<p>This expression shows that the smoothed value $s_k$ is a weighted average of all the previous points making it analogous to the <a href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=1700\">sequential averaging<\/a> discussed in a previous post but the exponential weighting by $(1-\\alpha)^n$ makes the resulting sequence $s_k$ look more like the moving average.&nbsp; In some sense, the exponential smoothing straddles the sequential and moving averages giving the computational convenience of the former while providing the latter\u2019s ability to follow variations and trends.<\/p>\n\n\n\n<p>How closely the exponentially smoothed sequence matches a given $n$-point moving average depends on the selection of the value of $\\alpha$.&nbsp; For example, with $\\alpha = 0.2$ the exponentially smoothed curve gives<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_2.png\" alt=\"\" class=\"wp-image-1719\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_2.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_2-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>whereas $\\alpha = 0.4$ gives<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_4.png\" alt=\"\" class=\"wp-image-1720\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_4.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_exp_0_4-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>Of the two of these, the one with $\\alpha=0.4$ much more closely matches the 5-point moving average used above.&nbsp;<\/p>\n\n\n\n<p>The Holt-Winter approach using three separate applications of exponential smoothing, hence the need for the three specified parameters $\\alpha$, $\\beta$, and $\\gamma$.&nbsp; Leslie Major presents an method for optimizing the selection of these three parameters in her video <em><a href=\"https:\/\/www.youtube.com\/watch?v=eQ9szvszg9E&amp;t=284s\">How to Holts Winters Method in Excel &amp; optimize Alpha, Beta &amp; Gamma<\/a><\/em>.&nbsp;<\/p>\n\n\n\n<p>We\u2019ll skip this step and simply use some values informed by the best practices that Major (and other YouTubers) note.<\/p>\n\n\n\n<p>The long-term predictions given by our real time agent are pretty good in the time span 2013-2018.&nbsp; For example, a 24-month prediction made in February 2013 looks like<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Feb_2013_long_forecast.png\" alt=\"\" class=\"wp-image-1718\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Feb_2013_long_forecast.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Feb_2013_long_forecast-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>Likewise, a 24-month prediction in June 2017 looks like<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Jun_2017_long_forecast.png\" alt=\"\" class=\"wp-image-1717\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Jun_2017_long_forecast.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_Jun_2017_long_forecast-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n\n\n\n<p>Both have good agreement with a few areas of over or under-estimation.&nbsp; The most egregious error is the significant overshoot in 2019 which is absent in the 12-month prediction made a year later.&nbsp;<\/p>\n\n\n\n<p>All told, the real time agent does an excellent job of predicting in the moment but it isn\u2019t perfect as is seen by how the one-month predictions falter when the pandemic hit.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"360\" src=\"http:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_short_forecast.png\" alt=\"\" class=\"wp-image-1716\" srcset=\"https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_short_forecast.png 648w, https:\/\/aristotle2digital.blogwyrm.com\/wp-content\/uploads\/2024\/02\/A2D_02Feb_2024_RedFin_home_sales_and_short_forecast-300x167.png 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"<p>In the last post, we examined the Holt-Winter scheme for tracking the level, trend, and seasonal variations in a time series in a sequential fashion with some synthetic data designed&#8230; <a class=\"read-more-button\" href=\"https:\/\/aristotle2digital.blogwyrm.com\/?p=1724\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1724","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1724","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1724"}],"version-history":[{"count":3,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1724\/revisions"}],"predecessor-version":[{"id":1729,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/1724\/revisions\/1729"}],"wp:attachment":[{"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1724"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1724"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aristotle2digital.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1724"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}