The Decision Maker's Handbook to Data Science: Book Review
Businesses struggle to make the most out of their data scientists... this book offers limited help.
For the uninitiated, data scientists can seem like mystic oracles bringing esoteric wisdom from the slopes of Mount Parnassus. Unfortunately this opacity is leading to increasing skepticism of the field’s effectiveness, with less than 9% of businesses actually able to quantify the impact of their data science investment according to a 2018 survey by Domino Data Lab, and 85% of businesses’ big data projects failing according to 2018 Gartner research. Data science has enormous potential when done right, but the costs of failure are extremely high.
In his recent book The Decision Maker's Handbook to Data Science Stylianos Kampakis gives several examples of where business go wrong in hiring data scientists and running data science projects. Most of this advice seems to be to let the oracles do their work: don’t interfere with a data scientist’s process, make sure the tasks they’re being asked to do are sufficiently complex and interesting to keep their interest, and if you want to have a data-driven organisation, don’t argue with the results. In my view this underestimates the necessity for data scientists to gain excellent domain knowledge in order to understand possible interpretations of what the data is telling them. Far from acting as mystics revealing unattainable wisdom, data scientists need to work in very close partnership with their peers in order for businesses to make the most of their work. Kampakis gives an example early in the book from his own career where a model showed that athletes were more likely to be injured on Mondays. When investigated, it turned out that in fact athletes were just as likely to be injured on Fridays, but they were far less likely to report the injury on that day to avoid being kept in rehab facilities over the weekend. Another commonly cited example in this area is a study that apparently showed judges delivered harsher sentences before lunchtime. The conclusion? Hungry judges are meaner. However, not only did subsequent studies fail to replicate those results, it turned out that unrepresented prisoners were consistently scheduled in the courtroom right before breaks. Domain knowledge would have explained the implausible results right away--results that have by now intractably wormed their way into pop-psychology lore about human behaviour.
The same thing happens in businesses all the time. Business managers who perceive their data science colleagues as aloof, only willing to work on fun data puzzles rather than buckling down to figure out the sticky if boring problems that are the heart of most businesses, and unwilling to consider their ideas are unlikely to communicate enough information to make the most of their data scientists’ skills. Equally, data scientists who are unwilling to see the value in their colleagues’ suggestions or to revisit results that seem unintuitive are likely to fall into the same trap that Kampakis fell into: missing some of the picture. Asking businesses to put blind faith in their data scientists without the tempering intuition and useful insight of domain experts is asking for trouble.
There are some sound bits of advice in the book, particularly around the collection and storage of data and how failures to capture the right data in the right way will undermine any data scientist’s efforts. Many of the examples for these are drawn from Dr Kampakis’s own career. Several are also cited from Wikipedia.
For busy executives who want a surface-level overview of the issues around data science and hiring and managing data scientists without doing their own Wikipedia legwork, this book pulls it all together in one place.