Cracking Business Case Interviews for Data Scientists: Part 1


Image by Minha Hwang

Business case study interview questions are often the most difficult part of the interview process, especially for early data scientists without job experience. Even after landing on a job, it is beneficial to have a structured way of formulating and solving business problems from other teams and stakeholders.

Based on my learning as an Ex-McKinsey consultant and data scientist over 10 years, I am sharing a “7 steps of problem-solving” approach. This structured business case/problem-solving process and related framework and templates will help you to become more comfortable in approaching and solving new business problems.

In this article, I will discuss:

(1) Overview of a “7 steps of problem-solving” approach

(2) Step 1: Define a problem — How to avoid Type III error by solving a proper problem

(3) Step 2: Structure a problem — How to break down a problem to manageable sub-problems

(4) Step 3: Prioritize issues — How to prioritize sub-problems to solve

Overview of a “7 steps of problem-solving” approach

Solving a business problem is an inherently difficult task. However, applying a structured problem-solving approach can be very helpful for data scientists to make this more enjoyable. This is not taught well in universities or graduate schools yet. Top management consulting companies such as McKinsey have developed this approach, and this is one of the secret sauces for their successes. In the spirit of sharing, I will describe the approach in detail in this series of 2 articles. This article will be useful (1) for data scientists to prepare business case interviews, (2) for data scientists and their business partners to solve business problems and have business impacts, (3) for MBAs to prepare for business case study interviews in management consulting companies, and (4) for Ph.D. students and researchers to adopt a hypothesis-driven approach and become more efficient in their research efforts. The 7 key steps of business problem-solving are as follows:

(1) Define a problem

(2) Structure a problem

(3) Prioritize issues

(4) Develop issue analysis and work plan

(5) Conduct analyses

(6) Synthesize findings

(7) Develop recommendations

This is not a linear process. Thus, you will need to iterate this a few times to come to a better solution, until you are convinced that the benefit from further iteration is limited. It is very important to periodically revise and re-evaluate. These steps sound quite intuitive and simple. At this point, you may wonder whether this would even help. Devils are in detail. Let me now deep dive into each step to show how this can work. I have summarized “7 steps of problem-solving” in Figure 1.

Figure 1: Image by Minha Hwang

Step 1: Define a Problem

The most important step of solving the business case, which is often overlooked, is “problem definition.” What is the point of perfectly solving the problem with a sophisticated natural language processing model if the machine learning engineer solved the wrong problem? Believe it or not, this “Type III error” (solving a wrong problem) happens a lot in practice. This could have been entirely avoided by having a proper problem definition stage.

Before describing key elements for this stage, I would like to point out that a “well-defined problem” that can be solved is very different from what we usually mean by a “problem.” In day-to-day interactions, what we usually mean by a “problem” is recognition that “something is not right.” There is a gap between what happened and what is supposed to happen (i.e. goal.) Or there is a gap between what happened and what could happen (i.e. opportunity.) Recognition of a “problem” is difficult and anxiety-producing. It tends to be very “general” and it can be just a “statement of fact”. It is merely the recognition of the situation, which calls for action.

A “well-defined problem” that can be solved has the following properties:

  • A thought-provoking question, not a fact
  • Specific, not general
  • Measurable
  • Action-oriented
  • Relevant (to the key problem)
  • Time-bound

Moreover, it is

  • Debatable (not a statement of fact or non-disputable assertion)
  • Focused on what the decision-maker needs to move forward

These properties of well-defined problems are often summarized by a “SMART” principle. This principle is shown in Figure 2.

Figure 2: Image by Minha Hwang

To make a “well-defined problem” more concrete, let me please use a simple toy example to contrast a “well-defined problem” from a “problem” that we usually mean. Summarized below are contexts for this toy example: John Octopus.

  • What happened: John went to see “Ready Player One” (a computer-generated animation movie (i.e. CGI) on Virtual Reality) with his friends last Saturday, and he loved it.
  • Goal: John is so impressed after the movie. Thus, he wants to become a Hollywood CGI movie director.
  • Situation calling for action (a problem in the typical sense):
  • John has no clue on how to create a CGI movie.
  • He does not even own a computer.
  • First, he needs to figure out how to buy a computer, to get started.

Do we have enough information to properly define the problem? No, not yet! I have summarized what information we have so far in Figure 3.

Figure 3: Image by Minha Hwang

Let me introduce the first tool for problem definition, to clarify what additional information we need to “properly define” the problem. The “problem definition worksheet” is a good checklist to ensure that we have all the required information to properly define the problem. The key elements are:

(1) Problem definition: Please force yourself to put this as a “question.”

(2) Contexts: It is important to know historical and organizational contexts around the problem.

(3) Key stakeholders: Who are the key decision-makers? Who will be affected by the decision?

(4) Criteria for success: Without this, how can you even know whether you successfully solve the business problem? Please make sure that you properly define “success metrics” before solving the problem. You can’t solve the problem, which can’t be measured.

(5) Constraints: It is also helpful to write down what should not be even considered.

(6) Scope of solution space: Oftentimes, there are clear requirements for geographic or business line focus for the problem. It is also good to check out timing requirements (i.e. answers in 4 weeks, 3 months, 1 year?). Sometimes, 80/20 directional answers are desire. Other times, very precise answers (1% increase in website traffic or 1.1% increase in website traffic) are required. Having clarity on required accuracy helps to properly measure whether the proposed approach or solution will be appropriate.

To facilitate future use, a first tool, “problem definition worksheet” is shown below in Figure 4.

Figure 4: Image by Minha Hwang

Now, let’s gather additional information for the John Octopus toy example. Those are:

  • Timing: John would like to get some design practices from August during the summer vacation, which is only six months away.
  • Type of computers and price ranges for computers:
  • John found that MacBook Pro is good for CGI use.
  • Used MacBook Pro can be purchased at ~ $900
  • Other contexts:
  • John does not want to borrow or rent a computer.
  • John has $140 in savings. His parents give him an allowance of $100 per month, and he earns $15 per hour walking the neighbor’s dog once a week, which amounts to $60 per month.
  • John spends $80 per month, on average.
  • If John wants a computer within 10 months, he may be able to buy it simply by saving his money.

With additional required information, we can properly define the problem for John Octopus, which is summarized in Figure 5.

Figure 5: Image by Minha Hwang

It is often helpful to see bad examples of problem definition. Figure 6 shows three examples of flawed problem definitions for the same John Octopus example.

Figure 6 Image by Minha Hwang

As a final exercise, we can consider one more example, which is less based on a personal experience example, but more from a business case example.

Figure 7 shows more examples of flawed problem definitions for Onion Bank.

Figure 7: Image by Minha Hwang

In contrast, Figure 8 shows an example of a well-defined problem for Onion Bank.

Figure 8: Image by Minha Hwang

Hope this article helps you to understand the key elements for well-defined business problems. A good definition of the problem is more than 60% of problem-solving efforts. Please do make sure that you are properly defining the problem in the data scientist interviews or day-to-day tasks as data scientists.

Step 2: Structure a Problem

Once you have properly defined a problem, the next step is “structuring a problem.” Oftentimes, the defined problem is too big to solve efficiently even after problem definition. Breaking down a problem into smaller and manageable components of sub-problems (i.e. issues) is very helpful to make problem-solving feasible and efficient. This allows the work to be divided and distributed to different team members with proper responsibility allocations. Moreover, this step is the foundation for the subsequent step of issue prioritization where priorities are set in terms of where to focus the problem-solving efforts. A “logic tree” (Tool #2) is very useful in the process of problem structuring. Figure 9 introduces a logic tree.

Figure 9: Image by Minha Hwang

A “logic tree” (Tool #2) helps us to ensure the integrity of the problem to be maintained. By checking that discrete chunks (i.e. sub-problems) are mutually exclusive and collectively exhaustive (i.e. “MECE”), we can ensure that solving the parts of the problem will really solve the problem in the end. Since the parts do not overlap, we can avoid duplicate efforts. The fact that there are no gaps (collectively exhaustive) ensures that we are not leaving out anything important. This also helps as a communication device. Creating a logic tree as a team helps to build a common understanding, which can be also shared in a structured way outside of the team. Finally, this helps to focus the use of frameworks and theories. Validating proposed frameworks and theories often reveals the gaps in logic or aspects which have not been considered yet.

There are two different types of logic trees. In the initial phase of problem-solving, an “issue tree” (what/how tree) is more useful to think about the entire solution space. Once you know enough about the problem (e.g. after a few iterations of exploratory analysis, initial secondary data research, or interviews and meetings), a “hypothesis-driven tree” (why tree) becomes more relevant. This tree is more suitable to focus problem-solving efforts based on hypotheses and serve as the foundation for prioritization. Building good hypotheses early and prioritizing efforts properly are secrets behind the success of data science leaders. Figure 10 summarizes two types of logic trees.

Figure 10: Image by Minha Hwang

To make the use of a “logic tree” more concrete, let’s start a practice by using a simple case of John Octopus. Figure 11 is a sample “issue tree” that you can create to solve the problem for John Octopus. Please note that we are asking “how” questions to develop this tree. Figure 11 shows the top 2 levels. Since the goal is to come up with $900 for a used MacBook Pro computer in 6 months, we can start dividing the problem into increasing income vs. reducing the saving. To develop the branches below, we have to think about how John Octopus earned his income and where he spent most of his money.

Figure 11: Image by Minha Hwang

Once we have developed the top levels, we can further develop the issue tree both for upper and lower branches. I would recommend you to do this without looking at the solutions below if you want to practice. After the practice, you may realize that developing a “MECE” tree takes a good deal of thinking even for a simple problem like this. Figure 12 and Figure 13 shows potential issue tress for upper and lower branches, respectively.

Figure 12: Image by Minha Hwang
Figure 13: Image by Minha Hwang

What would be considered as a bad “issue tree”? Seeing bad examples helps to understand what would be required for a good solution. Figure 14 shows flawed issue tree examples for the John Octopus problem. You can see that it is not consistent by mixing up different levels — increasing income is at the same level with many spending category breakdowns. Moreover, this is not “MECE”, since important spending categories such as coffee and candy are not shown.

Figure 14: Image by Minha Hwang

So far, we have seen an example of an “issue tree” (how/what tree) for the John Octopus problem. If you create a “hypothesis-driven tree” (why tree), how would it be different from the issue tree that you created? Figure 15 shows a “hypothesis-driven tree” example for the John Octopus problem. Please note that this can be developed by asking a series of “why” questions.

Figure 15: Image by Minha Hwang

Now, let’s do another practice with creating an “issue tree” by using a more business-oriented example. Assuming that you are hired as a management consultant for Coca-Cola Company, which tries to solve a declining global profitability problem. There can be many ways to develop “issue tree(s)”. These will help you to consider different dimensions and aspects of the given problem. Figure 16, Figure 17, and Figure 18 show three different ways of developing issue tree(s) for a given problem: By profit drivers, geographies, and lines of business.

Figure 16: Image by Minha Hwang
Figure 17: Image by Minha Hwang
Figure 18: Image by Minha Hwang

Finally, let’s consider one more example, which is a data science problem for A/B testing. The problem that we are trying to solve is “how to increase the sensitivity of A/B testing.” This is an important problem for many large Tech/Internet companies which use data-driven decision-making based on A/B testing to make their product feature release decisions. By increasing metric sensitivity, companies can do precise inference with fewer data (i.e. shorter durations for experiments or smaller sample sizes for experiments.) Typical A/B testing depends on an independent 2 sample t-test as test statistics. Having an intuitive understanding of what drives the magnitude of the test statistics in A/B tests will help you to develop an issue tree and brainstorm ways to increase metric sensitivity. As a reminder, a test statistic for A/B testing is shown below. From a more intuitive look at the formula, you can realize that 3 things would matter: an effect size (i.e. difference in means between your treatment group and control group), variance (i.e. a noise level), and sample size. Please note that I have simplified a more general formula on the top with the assumption of the same variance across treatment and control groups (i.e. pooled variance) and the same sample size between treatment and control groups, to facilitate intuition.

Figure 20: Image by Minha Hwang

Figure 20 shows a potential “issue tree” for the problem of increasing A/B test sensitivity.

Figure 20: Image by Minha Hwang

Hope you have developed a good feel about how to apply a logic tree to break down the problem into more manageable chunks (i.e. subproblem). In summary, good logic trees are (1) consistent, (2) relevant, and (3) “MECE”. Figure 21 makes this point more clearer in a picture.

Figure 21: Image by Minha Hwang

Finally, I will close this section by providing a few tips on “how to make a logic tree.” Figure 22 shows the tips, together with rationales (i.e. why’s).

Figure 22: Image by Minha Hwang

Step 3: Prioritize Sub-Problems

Once you have broken down a problem into sub-problems, the next step is “prioritization”. This is essentially cutting off branches on the issue tree or hypothesis-driven tree to focus on what is most important. Figure 23 shows this process as a graphic.

Figure 23: Image by Minha Hwang

What are the potential criteria that we can use for this prioritization? Potential (business) impacts, technical or execution feasibility, risks are often useful as criteria. Personal or corporate values also help as a guide. On the practical side, it is not a bad idea to reference the top management agenda to ensure that your projects would receive the required support from leadership and top management. Checking OKRs would help with this regard. Figure 24 summarizes potential criteria that you can consider.

Figure 24: Image by Minha Hwang

Figure 25 shows “Tool #3”, which is useful for this prioritization step of the problem-solving. Prioritization matrix (2 x 2 matrix) where one axis is potential impact, and the other axis is feasibility, is a useful visualization tool to prioritize the subproblems to focus on. In Figure 25, I have used the John Octopus problem to make it more concrete.

Figure 25: Image by Minha Hwang

To become more familiar with step 3 of problem-solving: prioritization, let’s use the John Octopus problem again. In Figure 26, I show the upper branches of the issue tree and how you can apply prioritization. Similarly, in Figure 27, the lower branches of the issue tree are shown with a similar prioritization exercise.

Figure 26: Image by Minha Hwang
Figure 27: Image by Minha Hwang

Until now, the first 3 steps of a structured problem-solving approach: (1) define a problem, (2) structure a problem, and (3) prioritize issues are introduced with examples to make them more concrete.

In the subsequent article (Part 2), I will describe the remaining parts of the problem-solving process in detail.

* John Octopus example in this article is adapted from “Problem Solving 101: A Simple Book for Smart People” book by Ken Watanabe. I would recommend that book, especially for school children.

Disclaimer: I am not representing McKinsey for the suggested problem-solving approach, which is described here. I am just sharing my opinion.

Comments

  1. Thank you for the information. Please keep posting.

    Data Analytics Solutions

    ReplyDelete
  2. A business planning template can be an invaluable tool for a first-time entrepreneur or any entrepreneur working in an unfamiliar business sector. Using a template that is specific to your business sector can shortcut the work and formatting you have to do significantly. Here are three steps to make the best use of a template.simple business plan template

    ReplyDelete
  3. I have bookmarked your blog, the articles are way better than other similar blogs.. thanks for a great blog! The psychology of money PDF

    ReplyDelete

Post a Comment

Popular posts from this blog

How The Influence of Multi-Tiered Private Label Brand Architecture Varies Across Retailers