Data Science As Chess: Don’t Blunder

As many others have during the Covid-19 pandemic, I picked up chess as a new hobby. I had always wanted to be good at chess, but had never understood how to be good at it. The only learning strategies that I knew of were 1) improvement through random play and 2) memorization of openings. 

I think most data scientists follow similar strategies for learning how to be “good” at Data Science. They tend to emphasize 1) improvement through accumulation of experience and 2) memorization of ML or statistical tactics. Both of these learning strategies have their place but today I will argue for a third path: developing principles that are generalizable across situations yet specific enough to be useful. 

To illustrate, I’ll take one of the chess principles I picked up from John Bartholomew and map it to a data science principle that I’ve found to be useful. One of the most important and difficult principles to follow in chess is: “Don’t blunder.” As John shows in this YouTube chess fundamentals video, you can defeat a ton of players at lower ranks by simply avoiding blundering your own pieces.

In data science, the “Don’t blunder” principle does a lot of work too. It sounds simple and obvious but, as with chess, it’s really hard to do but can help you progress a very long way. To blunder in data science is to present your analysis findings with errors in them. Example: you accidentally overfit a ML model to your eval set and present performance metrics that are better than they will actually be in the wild. Another example: you forget to apply the correct filters to your data to get the right treatment and control populations when calculating experiment effects. Such mistakes are not only personally embarrassing but can damage your credibility. 

It can be really hard (i.e. impossible) to completely avoid blunders from happening, but there are principles that can help. In chess, one way of avoiding blunders is to make sure you don’t have undefended pieces on the board. In data science, you can avoid blunders by sanity checking your work. Example: calculate a simpler metric that should closely approximate you more complicated one and check that they are close. After a while, sanity checking becomes second-nature and the extra work feels less cumbersome. Over time, you will earn the trust of your team and have greater effectiveness as a data scientist.

Leave a Comment