Home > Uncategorized > Statistics Can Be Tricky

Statistics Can Be Tricky

Hi everyone! In this post I’d like to talk about Simpson’s Paradox. This wikipedia article might be helpful if you want to know more about this: http://en.wikipedia.org/wiki/Simpson’s_paradox

What is Simpson’s Paradox? In my paraphrase, that means a paradox when the decision making is reversed if the data is observed more carefully. Consider this real life example which I took from wikipedia about the passage of the Civil Rights Act of 1964 in the United States. Overall, a larger fraction of Republican legislators voted in favor of the Act than Democrats. However, when the congressional delegations from the northern and southern States are considered separately, a larger fraction of Democrats voted in favor of the act in both regions.

House Democrat Republican
Northern 94% (145/154) 85% (138/162)
Southern 7% (7/94) 0% (0/10)
Both 61% (152/248) 80% (138/172)

 

We discussed this in Statistics 1 class and all professional statisticians know about this, I suppose. What then can we make out of this knowledge? Well, at least we can be more careful if we read statistical reports on news or wherever. Coming back to the earlier example, if I am the journalist and given that data, I have two options to present the news if I want to influence the public in a certain way (biased towards Democrat or Republican). And as far as I know, statisticians and journalists do this thing all the time, i.e. to take the data in a certain way and use it to support certain opinion/thought/say/claim/whatever.

Another common issue regarding statistics which my engineering professor always mentioned in class is about significance. I’m sure we’ve seen articles saying something like, “Chocolate lovers have lower risk of getting heart attack” or “Contrary to popular belief, [a product or anything] is actually [the new claim]”. Often times they would mention that a study have been done in a university, this number of participants have taken part in the study, and the result shows that it is significant for the new claim to be correct. But, sometimes they do not tell you what the significance level is. Normally the significance level is denoted by Greek alphabet alpha. Common values are 1%, 5%, and 10%. Different conclusion can be made when using different significance level, i.e. claim A is significant when using 10% significance level, but not the case when 1% significance level is being used. Again, statistics can be tricky and we should be a little more careful!

Advertisements
  1. April 8, 2012 at 7:33 pm

    Statistics have been and will be always tricky and controversial. In fact, statistics is the main science where so easy to cheat or lie, masking the lie as a very reasonable and solid explanation of the scientific research. Just remember: how often do we read “recently, a group of British scientist from Cambridge discovered…blah, blah, blah…” People have tendency to read and believe everything, not insightfully thinking about realities of information. Many companies employ the roughness of statistical facts to make profit. For example, a tobacco company organized a whole scientific laboratory that was supposed to explore all the harmful effects of tobacco. Do you know what kind of research they made? It was similar to: “Recent studies of the Academy of Tobacco found out that it smoking worsens the medical condition of a person with Parkinson disease”. Great job guys! Why don’t you mention or talk about some real problems associated with tobacco? Such as a 50% increase probability of lung cancer by the age of 40? Or realities of cancer incidence among kids who smoke? Statistics, as a science, has many different and sometimes not obvious tricks to present information in the way that a particular group of people is interested. It does not mean that it is lying; it just emphasizes some parts of studies more than the others. A person should be very careful when reading and thinking about statistics data published, always think who is benefiting from the way the information is presented.

  2. Leon
    April 23, 2012 at 1:35 am

    From a journalistic perspective, it’s a mistake (or dishonest) to involve party politics in a summary of the final passage of the bill, as clearly geography had a lot more to do with it than party. The fact that it is also an example of Simpson’s Paradox is a very interesting aside, but really not the key historical feature. If you want a fair journalistic summary, I’d say ” Democrats wrote, sponsored, and got the Civil Rights Act of 1964 to a vote, and then the north voted for it, and the south voted against it.”

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: