Simpson’s Paradox – are you telling the “right” story with your data?
Sorry to disappoint Homer Simpson’s fan, but this is about “Simpson’s Paradox” (also known as the Lurking Variable), an effect that all of us should keep in mind as we grab company data and present our findings. One’s “viewpoint” can have an impact on drawing a proper conclusion (or not).
1) A simple example: Lisa and Bart are rated on percent of total New Online Customers retained. Lisa claims that she has the best retention rate. Bart also claims that he has the best retention rate. Who is right?
New Online Customer retention percentage dashboard.
|WK1||Wk2||On first glance, Bart seems to be outperforming Lisa on Rep retention at each campaign. But is that a correct conclusion?|
New Online Customer retention rate with “equal” sample size shown on the right.
When the “viewpoint” was expanded to include total numbers (equal sample size), the true picture emerges. Lisa outperformed Bart.
2) A real example: In the 1970’s UC Berkeley was sued for sex discrimination because the graduate school accepted only 35% of women while 44% of men were accepted.
BUT the investigators found that women were slightly “more favored”. How can that be?
The data from six largest departments
The investigators concluded (by breaking out the data by departments) that women were slightly more favored and that the women “tended to apply to competitive departments with low rates of admission” whereas “men tended to apply to less-competitive departments with high rates of admission”.
We need to ensure that we look at the data using multiple “viewpoints” to ensure that we do not fall into this trap.