Monday, March 3, 2014

Quick Regressions in Excel or How to Predict the Future the Easy Way

Data becomes information when it’s analyzed. Information becomes knowledge when it’s communicated.
I like to tell people who ask what a crime analyst does that my job is provide the information that people at my agency need to make good decisions.

This may be on a tactical level such as the information a detective needs to investigate a case he or she is working on, or it may be on a more strategic level such as to help my Chief plan for future staffing levels.

One scenario that often applies with that last example is making a prediction about the number of calls, crimes, accidents, etc. that will occur at some point in the future. For example, at the agency where I work we are faced with the challenge of being located in an area that is consistently one of the 10 fastest growing areas of the United States. When I started with my agency in 1991, our population was about 57,000. Now just over twenty years later we are over 130,000.

To keep up with this we try to maintain a consistent officer to population ratio. But as any police chief will tell you, hiring police officers is a long process. From the time you announce you’re hiring, to offering the test for police officer, background investigations, hiring boards, selection, police academy, etc. until your officer actually pins on his or her badge might take a year.

This doesn’t even cover the year it may take to get new police officer positions included in an upcoming fiscal budget. All told, it may take two years from the time you decide to add an officer position to the police budget until they actually hit the street.

One way to help your Chief plan for the future is to make a prediction about the metrics your agency uses to measure the amount of work. At my agency, we look at population served, calls for service, crime stats, etc. to try and understand the amount of work we do.

But just how do you make a prognostication about the future?

One way I’ve seen people make an estimate of a future year’s Calls for Service is to take the percentage increase or decrease from the previous year and apply that to a future year. In other words, if the difference between the number of Calls last year as compared to the year before was a 3% increase. Then just assume that you’ll have a similar difference this year and add 3% to last year’s numbers. Some people get a bit more sophisticated and average the difference for several previous years and then apply that difference to the current year.

But there is a better and easier way, that would be to use a statistical technique called a Regression.

Back when I was in junior high school I found out that math was not my strong suit. When I first started in law enforcement, one thing I was relieved to find out was there isn’t a significant amount of higher level math involved. I know that some of you are like me and when I mention the word “statistics” your eyes begin to glaze over. Don’t panic. This is going to be easy.

My go to tool for crime analysis is Microsoft Excel. You can do a tremendous amount of analysis in Excel. A crime analysis unit that had no software other than Microsoft Office can still be a very capable unit. Don’t let the fact that your agency may not have all the specialized crime analysis tools that a larger agency has discourage you. I am very fortunate that my agency has provided some really great tools. However, even with all these tools I spend more time in Excel than anything else except maybe my email client.

The first step is to get your data collected and entered into Excel. For the purpose of this post I am going to use Uniform Crime Reports Murder data from 1995-2012 to make a prediction about the numbers for 2013 which have not been released yet.

In this example I have arranged the data with one column for the Year and one column for the number of murders for that year. I find that the vertical columns are less unwieldy than putting it into rows but that is your choice.

Once you’ve got the data entered select the columns of data and on the Windows version of Excel you can hit the Insert Chart button on the ribbon. On the Mac version select the Insert menu the the Chart command. For chart type, select a line chart. Once you’ve done this you should have something like this:

Now that you’ve got your chart Select the Chart Layout menu and look for the Add Trendline command. Using the Mac version of Excel it looks like this (the Windows version is slightly different).

Now you have a couple of options; you can pick a Linear, Exponential or Moving Average trendline. For now let’s pick Linear as it’s probably the best one to start with. We’ll fine tune this in just a bit. We do want to get to Trendline Options. You can either select the Linear trendline and then right click on the trendline itself and select Trendline Options or you can just select Trendline Options from the very beginning. Once you get that look for a checkbox that says “Display R-squared value on chart”.

The R-squared value is the coefficient of correlation. It is a measurement of how closely the trendline fits the existing data points. The R-squared value is a decimal number. The closer that number is to 1, the closer the trendline fits the data and the more accurate a prediction will likely be. This R-squared number will help us choose which trendline to use. (If you need to you can click on the R-squared number to highlight it and drag it to another part of the chart to make it easier to see.)

In my example table, a Linear trendline gives an R-squared value of 0.52987. Not bad, but it could be better. If I change the trendline to Logarithmic I get an R-squared value of 0.73639. This is much better and should make for a pretty accurate forecast.

Since I selected the year 2013 when I created the chart, Excel automatically extended the trendline to 2013 even though the number of murders cell is empty. If you didn’t leave an empty cell and stopped at 2012 you can select the option “Forecast Forward __ Periods” and fill in the number of periods you want the trendline to forecast. In this case, one would extend the trendline to 2013.

Now to come up with an approximate number for our 2013 Murder forecast you can select the Gridlines option in Chart Layout to add both Major and Minor gridlines. I’ve also changed the color of the trendline and thickened it to make it easier to see.

The final chart looks like this:

Looking at the trendline we can see the Regression we added is forecasting just under 15,000 murders for 2013.

A couple of notes, the more past data your chart has, the more accurate your Regression will be forecasting forward. Also, shorter forecasts will be more accurate than one farther out. One or two periods is best. Trying to forecast more than about 4 or 5 periods could get problematic using this method.

Spend some time playing with the charts, trendlines and options to get comfortable. Then apply this to some real world data from your agency. What metric would you like to predict?

No comments:

Post a Comment

I reserve the right to remove defamatory, libelous, inappropriate or otherwise stupid comments. If you are a spammer or are link baiting in the comments, a pox be upon you. The same goes for people trying to sell stuff. Your comment will be deleted without mercy.