Do you only use data points like ‘mean’ or ‘ratio’ to measure your business? If so, you are probably measuring your business wrong.
I’ll use two cases to show you why only focusing a data point can be dangerous, and what you should do instead.
We use average a lot when analyzing product and business performance, but using average alone create blind spots. Because there are always variations due to different segments of the market or pure randomness, and the average value doesn’t tell you the variation of stories.
Example: How many product do our customers buy on average?
A company is trying to understand the average amount of items purchased by a customer. For New York and LA, they found that the average of the items purchased per customer is the same (45 items).
Now, based on the plot below, should we apply the same marketing strategy for the customers in New York and LA?
In LA (green line), 85% customers purchased 40–50 items, means the average amount(45) can represent most customer’s behavior. You might only need one big campaign to target the majority.
However, in New York, the average value can only represent 50% of the customers’ behavior. The majority of customers, say 85%, lie between the buckets of purchasing 10 items to 80 items, which we can observe from the large ‘spread’ of data as shown by the orange ‘dumbbell’ shaped line.
This means, the customers in New York has more variance than those in LA, and you probably need multiple campaign strategies for New York when the customers’ behaviors are more diversified.
What should we do?
Find out the range around the mean by calculating variance.
Typically, data scientists report a Confidence Interval (CI) to estimate where the average lies in with a probability. (This link can help you construct a Confidence Interval, and you can create it within Excel)
An example for reporting is: the mean of item purchased per customer in New York is 45, and the 85% Confidence Interval is between 10 to 80.
Ratio metric consists of at least two metrics; for example, Click-Through Rate is Clicks divided by Views. With each metric’s variation, ratio metric’s variation is more complicated, and it doesn’t follow any common distribution.
Let’s look at the table below first. You are measuring Click-Through Rate, from this table, it looks like Click-Through Rate increased from Jan to Feb. Sound great?
Well, actually both Clicks and Views decreased, it’s just because the Views decreased more. So this increase is probably not what you want.
Now, let’s look at the 4 more scenarios to see how Click-Through Rate changes when we control one variable and change the other. Can we trust the ratio with the same level of certainty in each scenario?
The Left Table shows that, if the denominator (View) is stable, the ratio metric moves proportionally as the numerator (Click) moves, and the uncertainty of the data is easy to estimate, and the scale of uncertainty doesn’t change much.
In the Right Table, when the denominator (View) is large enough as shown on the first few rows, ratio (CTR) is very stable with only 1–2% uncertainty. However, if you look at the bottom rows, the ratio can be very sensitive to changes and unstable when the denominator is small! When this is the case, it’s better to monitor the Views and Clicks and expect a wide range of scenarios when you make decisions.
What should we do?
- Set a threshold for minimal acceptable value for denominator. As the ratio can have great variance when denominator is small, we only trust the ratio when denominator is large enough. If you have to use the ratio metric to make decisions when denominator is small, make sure you report a range that covers the fluctuation.
- Monitor the actual values (numerator, denominator) that we use for the ratio calculation. Understand the range of the ratio by simulating different scenarios of the numerator and denominator.
Data analytics is not just calculation, it’s also the measurement of uncertainty
While summary statistics like mean or some ratio metrics help us ‘Zoom Out’ and see a big picture of data and our business, we also need to ‘Zoom In’ for the range and shape of data, to make sure we understand the uncertainty associated with the metrics.
- A data point is NOT enough! Create the range around it, and use variance to estimate uncertainty or different segmentations of the data.
- If your metric is ratio like Click-Through Rate, analyze different scenarios to see how the metric change as the denominator and numerator change. Be careful if your denominator is small, which means the ratio can be more sensitive to the change of data, and might not be reliable!
- Visualize it to make sure we don’t miss any pattern in the data or outliers.
Follow me and give me a few claps if you find this helpful 🙂