How to Choose Great Data Science Projects: Focus on the Obvious

There are many factors that could be used to help decide which data science projects to prioritize. In this post, I share a simple principle that I have seen work many times over:

Focus on the obvious. The ideas with the best chance of making a big impact are those that address obvious problems and/or propose obviously better solutions.

Examples

Example of an obviously good idea: suppose you are working on improving a production machine learning model. If the prod model was trained using noisy labels and you now have access to higher quality, training a new model that incorporates the new clean labels is obviously a good idea. The idea is obvious because it’s a common sense thing to do and has an obviously good chance of succeeding: 9 times out of 10, there is a way to improve the original model using the clean labels.

Example of a not so obviously good idea: suppose you are still trying to improve that machine learning model. You want to try out some new techniques involving ensembling multiple models that have been trained separately but from different randomly initial weights. This is idea is not an obviously good idea: it’s complicated, involves way more compute, makes the production setup more fragile, and has very little chance of making a big impact. Maybe in some rare circumstances it can help, where the loss surface has a lot of

Advantages from Focusing on the Obvious

  • Obvious ideas work: they tend to be more robust and have a higher chance of succeeding
  • Easy to explain: it will take much less effort to persuade others that your idea is good
  • Prevents self delusion: the human brain has an amazing ability to delude itself into believing almost anything, especially that an idea it creates is a good one. Focusing on the obvious prevents your own fanciful thinking from hijacking your time

Leave a Comment