Monday, August 25, 2014

Causing Correlation



One of the biggest misconceptions with data is that a relationship implies a meaningful relationship.  We are quick to assume that two seemingly related (i.e. increase in temperature shows an increase in murder rate) means that one causes the other.  This is all too common!
Often times this relationship is merely coincidental or random; if you look at the relationships between enough variable combinations, eventually (by chance) you will find a couple instances where they happen to correlate.  Other times, it may be that these two variables indeed have a relationship but it is a “confounding” variable that is the true “cause”.  
An example (made up) is perhaps you may find a negative relationship between more years of education and life expectancy... implying that if you attend more schooling you will die younger - when infact the confounding factors may be that those with more education, such as doctors, tend to work longer hours and carry more stress in their career positions.
Although my favorite is #10, this list brings together a multitude of examples demonstrating why you should never conclude a meaningful relationship from a graph or suggested relationship in general!
P.S. Make sure to Google the “flying spaghetti monster”

Anyone have any good examples of implied causation by correlation?

No comments:

Post a Comment