Monday 16 November 2009

Dealing with variables that depend on each other.

This article attempts to follow the effect of the recession on the games industry using a graph. It seems that industry growth has been in decline since a peak in Sept 07, and that since March sales have been shrinking dramatically. I think it's actually an interesting example of how particular graphing tools are only useful in particular situations.

The graph plots year-on-year change in sales figures. Year-on-year figures are useful, because they allow you to normalise for the variation across the year. Suppose you're running a games company. Games traditionally sell very poorly in the summer and rocket off the shelves around Christmas. If you plot a simple graph of sales figures over the year, it really doesn't tell you much about how your business have been doing, because the variation you see is almost entirely down to the expected seasonal peaks and troughs. By comparing this year's figures to last year's figures, however, you can see whether your company had better summer.

My issue with the graph is that it plots these figuers over several years, and therefore the data points for different years are actually dependent upon each other. If you have an unusually good month this year, the same month next year will look bad by comparison. The chart tells us that Sept 07 was an incredible 80% better than Sept 06. In turn, Sept 08 looks pretty bad, because of the huge sales the year before. In fact, Sept 08 could've been a pretty good month.

A more informative way to represent this data would be as a series of individual year-on-year graphs, and highlight that the areas above the zero line are periods of growth. With a bit of cropping, I get this:

Sales were actually up fairly consistently for the year Sept 07 to Sept 08. Contrary to the downward slope, this was a growth period, it's just that growth in the summer wasn't quite as impressive as the growth in the summer, and January was more or less static. Here's some colour, and with the scale for decline extended to match the scale for growth:

Now we have to look seperately at Sept 08 to Sept 09, and remember that we're comparing it to the year before. It's telling us that sales in the winter of 2008 were even better than they were in the winter 2007, but summer sales have dropped by comparison.

The trouble with sticking these two charts next to each other was that a single graph was presenting two different sets of figures as though they related to the same thing. Each of the two charts above uses the previous year as its baseline, so you can't really do that. Presenting them as seperate charts, and applying colour coding to indicate the importance of positive and negative values and not just the movement of the line, clarifies this.

To really grasp what's going on over the whole period, we have to remove the dependence between the two charts, and give them the same baseline. To do this I'll take Sept 06 to Sept 07 as our starting year and plot the change compared to that year throughout.

For most of the year, sales in Sept 08 to Sept 09 were actually better than those in Sept 06 to Sept 07. The growth in the summer of 2008 concealed this in the original graph, implying that sales this year had fallen far below 2007 levels. Unwittingly building in this dependence between the different parts of the line made a big difference to how the results were presented.

Given that the years 2006 and 2007 were both huge success stories, selling at that rate is objectively impressive. It's been pointed out to me that my sales analysis is rather naive, though. Games publishers have investors, and they expect to see the publisher sell games at an ever faster rate with each passing year as an indication that the company is growing. Even if the company's still selling well, that growth has to be there. Correspondingly a year-on-year shrink isn't good news, even if it's a shrink to sales volumes which were once unprecedented.