A Chemist in Theory: July 2013

I'm working on getting the blog back up and running; rather than the ad hoc approach of last time I'm planning out a writing project that should keep the blog fed with content at least once a month. I'll fill you all in in due course. In the mean time, I've finished reviewing Nate Silver's The Signal and the Noise, which I've attached below.

Nate Silver has shot to fame as the oracular figure who decoded political polling data into plain English and successfully predicted the US election. His debut book brings him back down to earth, using familiar examples as diverse as moneyball and warfare to demonstrate the sore lack of and need for better prediction in our lives, and the path to improvement through critical thinking and Bayesian reasoning.

Each chapter uses a particular area of prediction to teach broader lessons. The book opens with great momentum, using the financial crisis as a set of unambiguous examples of how not to make predictions before drilling into the all-too-human reasons that political commentators make poor election forecasts. There are good lessons here about how the need to feel confident and a single-minded focus on a few issues can lead one astray; he turns back to the financial crisis to emphasise the same failures there.

It's not all about the human factors, though, and Silver then turns to "moneyball" - statistics-based sports recruitment - to provide an overview of the more technical aspects of the art of prognostication. The idea of a predictive model is well articulated and applied to common-sense issues with surprising complications. With the reader warmed up, he spends several chapters digging into the fundimental reasons why level-headed and critically thinking scientists are unable to predict earthquakes. Some things - weather, disease, tectonic plates - are inherently challenging to forecast for interesting reasons, and he is equally quick to emphasise the technical traps that researchers can fall into in building their models.

The heart of the book, however, is Bayesian reasoning: the idea that we should take new predictions as adjustments to whatever our existing prediction said, as a sort of rolling improvement to our models. As a simple illustration, a test result indicating that one may have a rare disease should be combined with the low probability that one had the disease before the test results were in. Even if the test is 95% accurate, if the disease only affects one in a million people then the odds are far, far lower than 95% that one actually has the condition.

This is the tool Silver uses in the latter half of the book to show the way to better predictions, while still taking the time to illuminate other forecasting challenges. Whether it's poker or chess, the stockmarket or the battlefield, making a good model and refining it with new data is the key to victory. He lays out how the problems rise in these fields, be it a new raft of human frailties or the hefty challenge of trying to beat the "wisdom of the crowds", sets out how these failures in prediction can be capitalised by good agents or bad, and suggests Bayesian solutions.

A chapter on climate change in a book aimed at at those in big business has a huge potential to be a train wreck but Silver manages to weave a fairly acceptable course through the problem. This chapter acts to draw the book together, forcing together issues of complex models, noisy new data, and incentives to mislead, with Bayesian reasoning as the knight in shining armour. The overall theme is that climate models are difficult to make for fundamental reasons, and the warming consensus that has come out from those models has stood up to new results - despite the claims of think tanks who wish it otherwise.

This section has annoyed commentators on both sides of the issue. Silver manages to make good points without falling into the many huge rhetorical traps that the denialist movement has laid in any writer's path, but he's never particularly strong on the issue either. I liked the unspoken conclusion that less-confident predictions - 95% confidence rather than 99%, say - are more resilient to contradictory data in a Bayesian world, and Silver does not make false equivalencies and is unambiguous in supporting global warming. However this is not a strong introduction into climate science, or a real challenge to many of the incorrect claims made by denialists.

Truth be told this is a deliberate stylistic choice and potential issue throughout the book. Silver avoids bringing in controversies in the fundimental results that feed forecasts, except where it is directly relevant to a chapter's lesson. In the section on the financial crisis, human incentives are raised as a source of bias, but the humans responsible are hardly taken to task. If you want to find out about the failures of reasoning that permitted the 9/11 attacks, you'll have to read elsewhere. (Donald Rumsfeld appears but only as a lead into the "unknown unknowns" idea.) The implications of Scott Armstrong's work with the notoriously vociferous anti-climate-change Heartland Institute are left for the reader to find out about on their own.

This will variously come across as refreshingly expedient, frustratingly wishy-washy, focussed or cowardly depending on your reading preferences and ideological views. Consider yourself forewarned and take the book on its own terms.

The Signal and the Noise is certainly cleanly written and well-structured. Silver's introduction sets the book up as a toolbox, first outlining the failures of prediction and their causes before moving onto the successes and the processes that enable them, but in truth he allows the book to digress around the broader themes raised in each chapter, be it the problems and benefits of the "wisdom of the crowds" or the failure to properly, quantitatively account for the uncertainty in the prediction. These digressions are brief and enlightening, and echo back and forth between the chapters to make a more cohesive whole.

With the aforementioned caveat this is a superb route into the whole issue of modeling and forecasting. It's accessible, clearly written, technically sound and meticulously reasoned. It's recommended as reading on a difficult subject, although it's probably not going to prove to be the definitive work.

(If you want an primer to thinking about statistics before you dig into this I strongly recommend Darrell Huff's "How to Lie with Statistics". It's inexpensive, funny, brief, and makes a good companion piece.)

Saturday 27 July 2013

The Signal and the Noise