Forecasting Continued: Using Simulation to Create Prediction Intervals Around Holt-Winters

NOTE: In this post we’ll be implementing prediction intervals around the Holt-Winters forecast using Monte Carlo simulation on the error correction/state space model equations for Holt-Winters. Read this post first! I’d highly recommend following along in the spreadsheet for this post. And don’t forget to sign up for the newsletter to hear about future Analytics Made Skeezy tutorials.

“You lied to me, Alex,” said Victor. He spoke quietly but with an undeniable hint of anger.

My mind immediately went to the worst. He knew about my work with the feds. Victor knew he was getting screwed.

“What’d I lie about?” I asked thinking about which door I should make a run for if things got ugly. There were body guards posted outside the house, and I may look like Prefontain but I run like Phoebe Buffay. If things were about to go down, I was likely dead. Even if I did make if out of the house, I’d been blindfolded on the way here, so I had no idea which way to head for safety.

“The forecast you made for my Janaury meth demand. It was wrong!” he said.

He brought his voice up to a boom with the word “wrong,” and it echoed quickly throughout the empty living room of the McMansion where we sat. Victor’s appearence was unkempt, and his usual cool tone had been replaced with a frazzled edge.

Victor was feeling pressure; Agent Bestroff was up to something, and I could guarantee it didn’t have anything to do with meth demand in Flint michigan. This was about something else, but forecast accuracy was what he was going to take his frustration out on.

I decided to push back a little on him.

“Of course the forecast was wrong,” I said.

Victor raised an eyebrow, “What do you mean, ‘of course?’”

“The only thing you can ever guarantee with a forecast is that it will be wrong. That’s why forecasting is often called ‘organized ignorance.’ It’s tough to predict the future, especially things that might be impacted by unpredictable forces. Like drug addicts. They’re inherently unpredictable. As is the economy and the weather,” I said.

“How off was the forecast, exactly?” I added.

He shook his head and answered, “You told me I would sell 72.5 kilos. I only sold 68 kilos.”

I nodded, “Yeah, that’ll happen with a forecast.”

“Well then why bother? If I can’t trust it, what’s the point?” asked Victor.

“Hey, it’s better than nothing right? If you didn’t have a seasonally adjusted forecast, then you might just be planning off your gut instead,” I answered.

Victor sighed, “It’s not good enough. I want a worst case scenario to plan for that way I don’t have excess drugs sitting on shelves waiting for the cops to find.”

“Ah,” I answered, “Now a ‘worst case scenario’ is something we can do. We can create something called a prediction interval around the forecast. Similar to a confidence interval (but nitpickingly different!), we can put a 95% prediction interval around the forecast so that you’ll have a likely band of where the demand will fall.”

“And we can do that with the forecast we’ve already made?” asked Victor.

“Yes and no,” I answered, “We can do it with the Holt-Winters forecast and a bit of Monte Carlo simulation, but we’ll need to use some slightly different but equivalent level, trend, and seasonality update equations to make it happen.”

“What do you mean?” asked Victor.

“Well, there’s this alternative way of doing an exponential smoothing forecast, Holt-Winters included, called the error correction form. The numbers come out the same, but they’re written by forecasting from each period one step into the next, denoting the error between actual demand and that one-step forecast, and correcting the level, trend. and seasonality using that error.”

“And what does that method get us?” asked Victor.

“It will allow us to quantify the distribution of the error when doing a one-step forecast. We can use that distribution to simulate future forecast errors, and from those simulated errors we can actually back out an entire demand scenario. By generating a thousand or so demand scenarios based on random errors, we can come up with worst and best case scenarios.”

“OK,” said Victor, “So where do we start?”

He stood from the table where he sat and walked into the empty kitchen of the house. From an almost barren cabinet in the kitchen he pulled out a gigantic tub of kettlecorn and popped it open.

“You want any?” he asked.

“Uh, do I want any kettlecorn?” I asked, “No, Victor, I’m good…what is this house? Why was I blindfolded on the way here? And why does the place seem cleared out except for a random tin of kettlecorn?”

Victor permitted himself a brief chuckle between scowls.

“Recently I’ve had some security issues,” he said, “You wouldn’t know anything about that would you?”

“What kind of issues?” I asked.

“Forget it. My security problems always have a way of working themselves out,” he said and chomped down hard on a fistful of kettlecorn, “Let’s get back to the problem at hand.”

I was, for better or worse, growing used to the violent implications in Victor’s phrasing. I wished briefly that someone would just arrest the guy, so I could go back to my normal life. But then would they want me to testify? Oh lordy. That thought scared the mess out of me. He’d just whack me from prison or something.

I pushed that thought away and looked at Victor, “Can you pull up the spreadsheet we did the demand forecast in?”

He pulled out his laptop and threw the sheet open.

“OK, so if you’ll remember, last time we finished with this ‘HW Exp Smoothing’ sheet and a 12 month forecast,” I said, “What’d I’d like to do is make a new sheet in the workbook called ‘HW Error Correction’ in which we’ll paste row 1 from the smoothing sheet. We can clear out the MAPE value. We’re not going to need that. Also, let’s paste in the time series data from the previous tab and the initial level, trend, and seasonal values in columns D:F.

This gave us a sheet that resembled the one in the figure below:

“Now, to this sheet I’m going to add another couple of calculations in columns G and H. In G, let’s put the one-step ahead forecast from our initial values into January 2008,” I said, “And if you’ll remember from last time how to make such a forecast, for one period forward it’s just the previous level plus the previous trend times the last relevent seasonal adjustment which is F4 in our sheet.”

“So that’s just (D15+E15)*F4?” Victor asked.

“Yep, and we’re going to throw that forecast into G16,” I said, “and then in column H, we can calculate the one-step error between the forecast in G16 and the actual demand in C16.”

This gave the figure show below:

 

“OK, now how do we roll through time like we did using the previous equations?” asked Victor.

“We need to update the level, trend, and seasonality values and drag everything through all the months until we’re caught up,” I said, “So we’re going to use some update equations that are equivalent to the originals but rely on our one-step error calculation. First, we’ll update the level as just the previous level plus the previous trend plus the level smoothing factor times the one-step error divided by the appropriate seasonal factor to deseasonalize the error.”

I threw the calculation into the sheet as seen below:

Victor nodded, “So the higher the smoothing factor is, the more of the one-step error we adjust for when we update the level.”

“Exactly,” I said, “And the trend update works almost exactly the same way. It’s the previous trend plus the trend smoothing factor times the level smoothing factor times the deseasonalized error. You can think of it as the trend incorporating some of the error the level incorporated to adjust just the trend component.”

I put that calculation into the sheet as well:

“And note how those two values, 50.51 and 1.15 are the exact same values we got using the other update equations. All we’re doing is writing everything in a way that’s going to be handy to us later on,” I said.

Victor nodded, “So then how do we update the seasonality?” 

“It’s very similar to the trend update although kindof a mirror image. The seasonality update is just the previous relevant seasonality factor plus the seasonal smoothing factor times the error not incorporated into the level divided through by the previous level and trend to put it on a seasonal multiplier scale,” I said.

Victor looked slightly confused.

“To put it differently, it’s the seasonal factor from 12 months ago plus the seasonal smoothing factor times 1 minus the level smoothing factor times the error divided by the previous level and trend,” I said, “It’ll make more sense in the spreadsheet.”

And so I threw the calculation into the workbook as shown below:

“And now, since we’ve used absolute references on the smoothing parameters,” I added, “We can just drag values D16:H16 down through the last month of demand.”

Dragging the calculations down, the sheet filled.

“Here’s the cool part though,” I said, tapping column H on the screen, “We’ve now isolated this one-step ahead error column, and assuming our forecast is unbiased, we can calculate the standard deviation of the error, also called the standard error, and use that bell curve to simulate future errors.”

“And what is the standard deviation of our errors?” asked Victor.

“Well, up top in cell I1 let’s calculate the sum of the squared error (SSE) as =SUMPRODUCT(H16:H75,H16:H75) and then the standard error in this context is customarily calculated as the square root of the SSE divided by the number of data points minus the number of smoothing parameters. In our case that’s 60 minus 3,” I said and added the calculations to the top of the sheet:


 ”OK, so we’ve got a standard error that’s a little over 5, meaning 68% of one-step errors are within 5 and a quarter kilos of the forecast,” I said, “which, by the way, means that my January forecast error wasn’t all that bad.”

Victor nodded, “Fine, but I still didn’t know that.”

“True,” I said, “So what I’d like to now do is add the months in 2013 to the bottom of the sheet here and simulate future forecast error values by drawing them randomly from a normal distribution with mean 0 and standard error 5.24. To do that we use the formula =NORMINV(RAND(),0,K$1).”

I tossed the formula in cell H76 for January 2013 and dragged it down, yielding the sheet below:

“And now that we have our error, what can we do?” asked Victor.

“This is where the analytics gets pretty badass,” I said, “We can drag our calculations in D through G down through the future months just as if the demand had happened. And then using the error plus the one-step forecast, we can back out what the simulated demand is for that future month.”

I pulled the formulas in D:G down through row 87 and then backed out demand in C76 as G76+H76 and dragged that down as well, giving the sheet in the image below:

“All right!” I said, clapping my hands together, “That’s one possible future demand scenario.”

“And we need to find out what’s the worst case of these futures?” asked Victor.

“Yeah,” I nodded, “We need to generate a ton of these scenarios and discover how they spread out.”

“So how do we generate multiple scenarios?” asked Victor.

“Copy paste,” I said smiling, “Well, not quite, but sortof…lemme show you. Down a bit on the sheet on row 93 I’m going to write ‘Demand Scenarios 2013′ and then below that I’m going to paste the months of the year going across the columns. Then below that I’m going to copy 2013′s simulated demand, and I’m going to paste-special the values transposed into row 95.”

That ended up looking like this:

“And when you paste a scenario, the random numbers above update themselves,” said Victor.

“Exactly,” I answered.

“So you can just copy paste another scenario,” he said.

“Right, but honestly, that takes too long, so instead we’re going to record a macro to do it for us,” I said. (Some of you may need to show the developer tab on the ribbon in Excel to do this step.)

I navigated to the record macro button on the developer tab of the Excel ribbon. And pressing the record button, I named the macro and assigned it a shortcut key for fast access later:

Pressing OK, I did the following steps:

  1. Inserted a new, blank row 95
  2. Copied the 2013 simulated demand data in column C
  3. Did a paste-special values transposed onto my new blank row 95
  4. Pressed stop on the macro recording

“And now that we have our macro. We just need to press the shortcut key a lot to generate scenarios. Actually, you can just hold the buttons down and it’ll run through it on its own,” I said and pressed option + command + z on Victor’s Mac for a few minutes until I had just over 1000 scenarios.

Then on row 89, I calculated the 97.5th percentile of each of the simulated columns, on row 90, I pasted the transposed forecast values from the previous tab, and on row 91, I pulled the 2.5th percentile, giving me the following:

“So here we now have our forecast and upper and lower 95% interval bounds on it that we’ve found by simulating the forecast error,” I said, “So you can be quite confident that whatever demand you experience over the next year will fall inside these bounds.”

Victor smiled wide for the first time that day, “Great!”

“And we can highlight these three rows, select the area chart in Excel, and then format the bottom series with a white fill in order to get what folks in the forecasting business call a fan chart,” I said and inserted the chart for him.

“And note how the error compounds over time. Also, due to the multiplicative nature of the forecast, the absolute width of the interval increases in high seasonal demand periods,” I added.

“Wow, that’s nice,” said Victor.

It was my turn to smile, “Consider your ignorance about the futre quantified.”

Big Data-ish Tip

Credit goes to Hyndman for developing the state equations for Holt-Winters that make this whole simulation approach possible. As part of that work, he also derived closed-form calculations for the prediction intervals, but I’m extremely partial to this monte carlo approach because it’s so intuitive (and intuition is really what exponential smoothing is all about).

Hyndman wrote all the good forecasting packages in R, and I highly recommend checking out his blog as well as his (unfinished?) online textbook. Personally, I find Bowerman’s 2004 textbook to be the best for learning a lot of this stuff.

  • jaedee

    Great post, glad to see a new one.

    A small improvement and a (possible) correction:

    1. Instead of just running your simulations by holding the option+command for minutes (it does take a long time), you can record the macros, then add a loop, i.e.:

    Sub Autorun_My_Freaking_Macros()


    ‘ Keyboard Shortcut: Option+Cmd+z

    Dim i, j, k As Long

    i = 1
    j = 22
    For i = 1 To Range(“k2″)

    Calculate

    Calculate

    Next i
    End Sub

    Then just set k2 to the number of simulations (i.e. 1000). You can also just hard code the numer by replacing the Range(“k2″) with a number. You can reference your macros in the new one, or add the relevant code before and after.

    2. Correction (?): I believe you want to copy the demand in column C, not B. (See the step 2 where you “Copied the 2013 simulated demand data in column B”).

    • http://twitter.com/John4man John Foreman

      Thanks for catching the column B/C typo! Fixed it now I think.

      And you’re totally right that you can just loop the macro in VBA. I just didn’t want to force people to write code who were gun-shy.

  • Steve Wessels

    Great post (as usual) with many of my favorite methods: HW, Monte Carlo, prediction intervals and fan charts! I’ll keep an eye out for your book.

    One quick Excel tool that I’ve found useful for Monte Carlo type approaches is using the Data Table approach. Here’s one example: http://office.microsoft.com/en-us/excel-help/introduction-to-monte-carlo-simulation-HA001111893.aspx. I’m always looking for more techniques though. It’s likely faster in many situations to apply some VBA. Thanks!

  • Rob in Colorado Springs

    John, are you using R for some of your work?