Forecasting Made Skeezy: Projecting Meth Demand using Exponential Smoothing

NOTE: I’d highly recommend following along in the spreadsheet for this post. And don’t forget to sign up for the newsletter to hear about future Analytics Made Skeezy tutorials.

It’s been three days since I last saw Victor, and I’ve been lying low ever since. I know that if he wanted me dead, I’d be dead already. But I can’t help it. I’m freaked. Screw the DEA.

It was Tuesday night when Andre whisked me away to the freaking Marietta Dave & Busters. I couldn’t believe it. Victor was just sitting there at a table in the midst of the arcade games. Around him kids and teens ran from game to game. The din and the lights were a constant aggravation, but there he sat in the middle of it, unbothered, like some drug-dealing Buddhist monk with a laptop. I nearly burst into laughter at the sight of him, but his expression made me rethink my levity.

I took a seat next to him and immediately felt something hard and cold press against my leg. I went weak in the knees. I knew I shouldn’t have cooperated with the DEA and installed that bug on Victor’s laptop. Now I was good and dead.

Victor just smiled at me. His face betrayed none of the situation below the table.

“Tell me, Alex,” he said, “Have you been talking to anyone about our little tutorials?”

He ran a hand along his laptop, and for the first time I noticed that it wasn’t the same laptop as before. It was a small MacBook Air. And all the USB ports had little rubber nubs in them to prevent drives from being inserted. That last bit had me really and truly panicked.

“N-No,” I stammered, “Why would I do that? I’m not exactly innocent myself.”

He stared at me for a time. Suddenly, a ski-ball machine blared a loud noise, and I jumped out of my chair. I thought Victor had pulled the trigger.

Victor just burst into laughter and tucked the gun back into a holster on his ankle.

“You sure are a jumpy one, Alex,” he said with a grin.

My heart has beating like a machine gun, and I felt nauseous. Victor waved me back into a seat.

“I’m not used to this,” I said, “I-I’m not familiar with this sort of world.”

Victor just kept smiling.

“And keep it that way,” he said, “I’m not interested in you learning this world. I’m interested in learning the techniques from yours. Then I pay you. Then you leave and say nothing. Understood?”

“Understood,” I said, half relieved that I was dead, half despondent that I was dealing with Victor instead of playing games like the kids around me. I was rethinking all my bad choices in a big way.

“You want something to eat before we get started?” asked Victor, “Chicken fingers, potato skins, and other vile stuff?”

I forced a smile, “I’m all set. Why don’t we just dive in?”

Victor slapped me on the leg, “Good, my boy. That’s what I like to hear. Especially today, because today I have all sorts of troubles.”

“What’s going on?” I asked.

“My meth supplier wants me to submit orders for the entire next year by month. In the past, I could just order everything the month before, but he says the feds are making life hard for him, so he needs more time to plan,” said Victor, “That means that I have to forecast demand for each city I sell in for all of 2013 at the monthly level.”

“OK, well, that’s not so bad,” I said, “What have you tried so far?”

Victor opened his computer and pushed the screen my way.

“Here are the demand numbers in kilos for Flint, Michigan for the last five years,” he said.

“So I just graphed it, fitted a line to it, and I’m going to use that trend line for projections,” he said and showed me the graph.

I nodded, “OK, and that’d work fine if you were just doing a yearly projection for total demand, but your supplier wants demand projections by month, right?”

“Exactly,” said Victor, “And I know that line is going to be terrible for that. I already know that demand peaks in the summer and lags in the winter in Flint, but that line isn’t going to account for it. Hell, I was thinking about just sketching out the shape of the next year. Just using my eyeballs to do the projection.”

I laughed, “Well, you wouldn’t be wrong in noting that visual inspection is one of the best ways to identify unknown patterns, but I think in this case we can do better than that. Plus, I’m guessing Flint isn’t the only city you’re going to have to do these projections for.”

Victor, “No, and in fact, it’s not the only product either. I hear that some of my other suppliers are moving in this direction.”

“Well, in that case, I’m going to show you a really simple forecasting technique called Triple Exponential Smoothing or Holt-Winters Exponential Smoothing that is perfect for this problem,” I said.

I created a new tab in Victor’s Excel workbook called “HW Exp Smoothing,” and leaving the first couple of rows blank, I pasted the demand data in on row 3. I also added a fake year before 2008 called “initial” so that my data ended up looking like this:

“Now, let me explain a little bit about how Triple Exponential Smoothing works. The ‘Triple’ refers to the fact that we’re going to be more or less splitting this forecast into three components. We have seasonal adjustment factors which scoot the forecast up and down by month based on the historical data. So we’d expect the December factor to bump the forecast down and the July factor to bump it up. Then we have the trend component which helps us project out without considering the seasonal piece. This captures whether your demand is generally growing or shrinking. We also have what’s called the level which you can think of as the constant part of the forecast once we strip out a trend and seasonal variation. It’s a baseline.”

“Ok, so how do we start to break those out?” asked Victor.

“Well, I’m going to start in our fictional time just before the beginning of 2008, and the first thing I’m going to do is initialize the level as the demand from January 2008,” I said, “I’m also going to initialize the trend as the average of monthly increases in demand using same-month pairs from 2008 and 2009 to avoid seasonal wonkiness. So for instance, demand increased by .43 kg a month if I look at January 2009 versus January 2008. I’m going to average those across all 12 pairs in ’08 and ’09.”

I added columns to the sheet for level and trend and used this array formula to initialize the trend:

When I’d set the two initial values, the spreadsheet now looked like this:

“But what about the seasonal adjustment?” asked Victor.

“Well, for that we actually need 12 initial values. One for each month we’re adjusting. And this is where things get slightly more funky. The first thing I’m going to do is take the average monthly demand for each year in your data and calculate how each month in that year deviates from the average. For instance, January 2008 here had 81% of the average monthly demand for 2008,” I said.

“Ah,” said Victor, “If January is 81% of the average monthly demand, then I could use that value to adjust.”

“Yes,” I said, smiling, “But we’ve got five years worth of data, so we’re going to average these values across all five years to get our adjustment factors.”

I added an “Average Monthly Demand for Year” column into the sheet and calculated the value for each year. I then added in the month by monthly variations from the average monthly demand in the next column.

I then added a column for my seasonal adjustment factors where I initialized them as averages of the values from the previous column, using an AVERAGEIF() to make sure I only averaged the correct months.

I clapped my hands together and rubbed them like I was heating them over a fire, ”Now, we’ve got a starting place for these factors.”

“But what do we do with them?” asked Victor.

“Well, we’re going to roll them over the entire horizon of data here, refining them as each month ticks by. We’re going to take some percentage of the factor from the previous month combined with some percentage of a factor calculated from the data at the current month until we reach the very end of our data. Then we’ll have our final estimates for level, trend, and seasonal values,” I said.

Victor looked confused, “What do you mean you’re going to take some percentage of the value from the previous period?”

“Good question,” I said, “I’ll back up a bit. This is where this technique gets the term ‘smoothing’ from. I’m going to introduce three arbitrary terms called smoothing factors. One for level, another for trend, and a third for seasonal. And initially I’m just going to set them at 50%, so as we roll over the data, we’re going to take 50% of the previous estimate for level and 50% of the estimate dictated by the current month’s data, yeah?”

I added in the smoothing factors at the top of the sheet.

“So to see how this works,” I said, “Let me go ahead and roll one month down into January 2008. The first thing we’re going to do is set the January 2008 value of level. Since our level smoothing factor is 50%, we’re going to take 50% of the previous level value plus one month of trend and we’re going to combine it with 50% of the current month’s demand adjusted using the most recent seasonal factor we have for January,” I said.

I added the value to the sheet:

“As for trend, that’s just 50% of the previous trend value and 50% of the difference between the current level value and the previous one,” I said and added the value to the sheet.

“And the new seasonal variation for January is just 50% the previous seasonal variation and 50% January 2008 over the current level,” I continued.

“And here’s the cool part,” I said, “Now that we have these first smoothed values set, we just copy them all the way down the sheet.”

“But what do we do with them?” asked Victor.

I smiled, “Now we’re ready to project into the future. And it’s super easy.”

I added some rows on the sheet for 2013.

“Let’s take June 2013 for example,” I said, “We can forecast that as the final estimate we have for the level plus 6 months of trend since we ended December 2012. Then we take that value and adjust it for June. Bam, we have a forecast.”

Victor looked pleased, “And that’s it? I can use these numbers as my forecast?”

I shrugged, “You could. They wouldn’t be terrible. But that 50/50 split we’ve been doing with the smoothing factors, we really should change that.”

“To what?” asked Victor.

“Well, honestly, I don’t know. How much do you let past data influence you, and how much does your current period’s data matter? That varies from forecast to forecast. But there is a way we can have the computer figure it out for us,” I said.

“How’s that?” asked Victor.

“Well, we can forecast 2012 as if we were sitting at the end of 2011, compare the forecast with actuals, and find the smoothing values which do the best. In this way, we’re training the model like you would any machine learning model,” I said.

“Ah, OK, let’s do it,” he said.

I forecasted 2012 (below in yellow), calculated the average percent error (APE) between the forecast and actual, and then averaged the APEs at the top of the sheet to get a Mean Absolute Percentage Error (MAPE).

“Now, all I have to do is find the values of my smoothing factors between 0 and 1 that minimize my MAPE,” I said.

“So you use Solver?” asked Victor.

I laughed, “You’re catching on to my love of solver, Victor. Absolutely.”

I opened up Solver and set up the problem, “We’re going to minimize the MAPE where our decisions are to change the three smoothing factors subject to keeping them between 0 and 1. And this time let’s select the evolutionary solver from the menu since this problem is so nonlinear and ugly.”

I pressed solve and let the thing roll. The MAPE dropped over various trial solutions until finally it stopped at just 2.6%.

“2.6%. That’s a 10x improvement on the MAPE we had before Solver ran,” I said.

“Awesome,” said Victor, “And now I have a final forecast?”

“Yes,” I said, “you’re done. Let’s graph it real quick.”

I popped the values into a chart so we could look at them.

“Looks great!” said Victor, and then he paused, “but wait, what about the price adjustments we talked about putting in place in Flint?”

“Ah,” I said, “You’re right! We haven’t accounted for the fact that you moved your pricing strategy from an average of 5% less than your competitors to 8% more than your competitors. That means that this historical data is giving us an over-estimate for your newly raised prices. We’ll need to shift it down.”

“But by how much?” asked Victor.

“Well, if you’ll remember, we found in that previous conversation that for every 1% you increased prices over the competition, you lost .42 kilograms of demand for the month,” I said and added a “Price Sensitive Forecast” column to the sheet. I took the existing forecast and subtracted from it 5.46 kg that our demand model anticipated forfeiting by pricing 8% above the competition.

“There,” I said, “Done.”

Victor clapped me on the back and shook my hand before sliding me a massive wad of bills.

“Don’t spend it all on one video game, OK?” he said, smiling.

“And remember what we talked about,” he added with a twinkle in his eye, “If you talk about these conversations with anyone, you’ll need to adjust your forecasted lifespan a great deal lower.”

I nodded and stuttered to respond, but Victor had already turned away to flag down a server.

“I want more of those Jalapeno poppers,” he muttered to himself.

Big Data-ish Tip

A lot of software packages these days come with Holt-Winters built in. It’s an old method but extremely accurate for how simple it is, oftentimes beating ARIMA models in actual business settings. I wouldn’t try anything more complex than HW until I’d given it a whirl first as a baseline for comparison.

For more reading on forecasting, I’d recommend anything by Rob Hyndman.

The price-sensitive forecasting at the end of this post is simple, but believe it or not, it’s not too far from how many Fortune 500 companies combine price elasticity models and forecasts to create price optimization models.

This post, as written, is decidedly “small data.” Indeed, we don’t get into big data until we start thinking about someone like Amazon where you may need to forecast demand across zillions of SKUs. Then you’ve got a nightmare on your hands.

Luckily, that problem is highly parallelizable. And one of the reasons why I prefer exponential smoothing to something more exotic (like machine learning with exogenous variables included) is that if you’ve got this stuff in a highly automated system across scads of products, you’re not going to necessarily notice when the forecast gets screwed up. I prefer to avoid constant babysitting and retraining. For me, Holt-Winters strikes the right balance of simplicity and effectiveness that many businesses want when they talk about “set it and forget it.”

  • http://magicbeanlab.com mediaczar

    Fascinating post – as ever. The intricacies of underworld analytics are gripping.

    I came across a single hurdle — which I’m sure I could work around were I cleverer — the OS X version of Excel doesn’t handle array formulae. I ended up trying this in R with satisfying results — but results that were different from those of your protagonist.

    I’d really like to follow along with the spreadsheets (it would make me a stronger Excel user for one thing, for another I’d get a better understanding of what’s happening), but the differences between the OS X/Windows software make this tricky.

    This isn’t so much a gripe (and if it were, it wouldn’t be with your post) so much as a cry for help.

    • http://twitter.com/John4man John Foreman

      Interestingly enough, I’ve created all these posts using Excel for Mac 2011 which does support array formula calculations. Now, perhaps some of the older versions of Excel on Mac didn’t support array formulas? Hmmm…. mind telling me what version you’re running?

      • http://magicbeanlab.com mediaczar

        Aaah. Thank you. Just saved me £££s.

  • Pingback: Forecasting Google search volume in R using Holt Winters

  • nicolahery

    Great post and blog! A couple questions came to mind:

    - What if “Day of Week” is also a seasonal factor in my data (and I’m forecasting at the daily level)? Could we imagine a “Quadruple Exponential Smoothing”? Would it work the same?

    - The data in this example has five years of data (60 data points), and a lot of “academic” examples are similar: nice data to start off with. But often in the “real world” we’re asked to forecast stuff with little (to no!) data. How would you handle it? Switch to simpler models (like Moving Average or other)? Try to find similar things with more data and use that?

    Thanks. Can’t wait to next post!

    Cheers,
    Nicolas

    • http://twitter.com/John4man John Foreman

      Quadruple exponential smoothing…I wish! I’ve never seen it in the literature. A move to some kind of multiple regression might be necessary where DOW can be just another variable in the model.

      Or maybe hack together a forecast at the higher level (say, weekly?) and then have a second pass on in to break it down to daily. You could calculate daily factors that’d split a weekly number into buckets by searching over a set of values in [0-1] that best split past weekly actuals to daily actuals with a weighted preference toward recent weeks <– you could do this w/ solver.

      Hmmmm… I'll need to go check out some books here…

      Concerning your second question, yeah, if you don't have enough seasons to do seasonal adjustments then you can do your best to set those coefficients with the 1 years worth of data you have (just calculate them similarly to how we did our initial factors in the post perhaps), but the power of exp. smoothing won't be available to you. If you literally have no data, maybe you can steal it from a similar product or industry to make estimates.

      Might I also suggest D&D dice for forecasting when there's no data?

      • nicolahery

        Yup the way I’ve seen the “double seasonality” done so far was with Multiple Regression and have 11 dummy variables for Day Of Month and 6 for Day Of Week.

        Interesting thoughts on doing a Weekly Aggregate forecast first and then trying to find the best split on Day Of Week with LP…

        Guess the idea would be to try different solutions and figuring out which one has the best “accuracy/complexity” ratio (I like your closing comment on “right balance of simplicity and effectiveness”).

        For the “very small data” forecasting issue, the thing is that on problems I’ve come across, you’re forecasting a bunch of different items: some with plenty of data, some with just a couple of points, and some with no data. My idea, since you want this to be as hands-off as possible, would be to have a “detection” system (simply count the data points available?), and give each item the model you expect to work best: HW Exp Smoothing, Next period = Last period, D&D dice :) Just thinking aloud here…

        Cheers

  • http://profiles.google.com/ghuiber Gabi Huiber

    I’m late, sorry, but I do have two questions: in that very first array formula where you initialize the trend, you take the average of monthly differences and then divide it by 12 again. Why? Simply taking the average would give you an initial trend value of 3.73. You also say that monthly demand goes up by .43kg/month when you compare January 2009 to January 2008. But this is again because you divide the difference by 12. Isn’t the actual difference in demand 5.183kg?

    • john4man

      I’m taking the average y-o-y differences for each pair of matching months in the first two years. This is to eliminate seasonality. Then I divide by 12 to make the trend a monthly one. Between Jan 08 and Jan 09 demand does indeed increase by 5.183, but that’s *for the year.* Dividing by 12 makes it .43kg/mo.

      • http://profiles.google.com/ghuiber Gabi Huiber

        Makes perfect sense now. We compare the same month, so the time elapsed is one whole year. I shouldn’t do this late at night. Thank you.

  • shaldengeki

    Whoa. It’s kind of surreal reading this post, because a year or so back I wanted to forecast active users on a website from month to month, and I ended up independently working out almost exactly the method described here. My “smoothing” method was a little different, though, so maybe it’s time to revisit the topic and see if exponential smoothing gives better results!

    I love the idea of the blog, by the way. I’ve learned a lot of stuff in just a few short minutes!

  • Pingback: Forecasting using seasonal adjustment factors | and some change

  • http://datamules.com/ Kevin T. Ryan

    Definitely late to the party, but had a quick question for you on the supplied spreadsheet. I would have assumed that adding the seasonality factors across a year would have equaled 12 (i.e., if there was no seasonality, each factor would be equal to 1). Instead, it seems like they add up to “around 12″ (sometimes more, sometimes less). Is that a problem (e.g., does that imply that the trend is being “double counted”)? Thanks!