part of Course 121 Navigating a Data Science Career


There’s nothing quite as intimate is choosing your next data science project, but if it’s not being handed to you by an instructor or a program manager, then the world can feel just too wide. How do you choose what to work on next? What data to explore? What problems to tackle? It can be overwhelming.

Like all questions of importance, I don’t have the answer. But if you are feeling well and truly lost, here are some tricks that may get you unstuck.

Find a question to answer

Applied machine learning and data center science of all varieties is an exercise in answering a question using data. No matter how much data you have, you can’t exercise your data science skills without a question to answer. Choosing a project is the act of finding a question to answer. Focus on this.

Goals drive questions

Asking a question is the act of getting closer to a goal. If my goal is to find a warm loaf of bread, then my question becomes "How do I get to the bakery?" Goals drive questions. Modeling and analysis is a form of locomotion. It moves your understanding forward. It propels you from the known into the not yet known. But if you don’t have a destiination in mind, you run the risk of driving in circles.

If you don’t have a burning question already, the fastest way to find one is to choose a goal. What do you want? This is very personal. Do you want to make a room full of money? Stop the spread of COVID19? Build a company with 1000 employees? Understand the origin of the universe? Find a city to move to?

What you want defines who you are. Nobody can tell you what that is. If you are able to see deep enough into your own self to figure out what that is, then you will have a neverending fountain of questions. Not all of these will be amenable to analysis, or will have appropriate data available, but you will have direction, and you will always be able to find an analysis or modeling question that gets you a little closer to that goal.

Your goal doesn’t need to be anything profound. It can be to tell the story of your local minor-league baseball team or to estimate your household carbon footprint. And it doesn’t need to be a long term commitment. You can change goals with every project. But it’s important that you have some purpose in mind.

Find any question at all

If finding a grand goal is a tall order for now, then find any question at all. It doesn’t matter what it is. The easiest way to do this is to steal someone else’s. Re-create their Tidy Tuesday data analysis. Implement a paper from scratch. Find a machine learning textbook and answer the questions posed at the end of each chapter. Stealing questions from others is 100% ethical. Not only that, it’s flattering to the original question to asker, and as a bonus they provide you with someone else’s answers to compare against. It lets you stretch your legs and get comfortable with new tools. It is a great way to learn to walk before you start to run.

Keep in mind however that all of this is practice. The most important thing you will do as a data scientist or machine learning practitioner is to choose the right questions to ask.

How do I choose a good question?

Just as "Good decisions come from experience, and experience comes from bad decisions," also "Useful questions come from experience, and experience comes from useless questions." There is no field guide, no manual, no cheat sheet for learning to ask good questions. It comes through trial and error. The best way to climb this learning curve is to take the first step.

I wish you success in your search for answers. More importantly I wish you success in your search for questions.