{"id":312221,"date":"2021-07-12T13:05:10","date_gmt":"2021-07-12T11:05:10","guid":{"rendered":"https:\/\/www.scribbr.nl\/?p=312221"},"modified":"2023-06-22T09:58:05","modified_gmt":"2023-06-22T07:58:05","slug":"correlation-vs-causation","status":"publish","type":"post","link":"https:\/\/www.scribbr.com\/methodology\/correlation-vs-causation\/","title":{"rendered":"Correlation vs. Causation | Difference, Designs & Examples"},"content":{"rendered":"
Correlation<\/strong> means there is a statistical association between variables. Causation<\/strong> means that a change in one variable causes a change in another variable.<\/p>\n In research, you might have come across the phrase \u201ccorrelation doesn\u2019t imply causation.\u201d Correlation and causation are two related ideas, but understanding their differences will help you critically evaluate sources<\/a> and interpret scientific research.<\/p>\n <\/p>\n Correlation<\/strong> describes an association between types of variables<\/a>: when one variable changes, so does the other. A correlation is a statistical indicator<\/a> of the relationship between variables. These variables change together: they covary. But this covariation isn\u2019t necessarily due to a direct or indirect causal link.<\/p>\n Causation<\/strong> means that changes in one variable brings about changes in the other; there is a cause-and-effect relationship between variables. The two variables are correlated with each other and there is also a causal link between them.<\/p>\n There are two main reasons why correlation isn\u2019t causation. These problems are important to identify for drawing sound scientific conclusions from research.<\/p>\n The third variable problem<\/strong> means that a confounding variable<\/a> affects both variables to make them seem causally related when they are not. For example, ice cream sales and violent crime rates are closely correlated, but they are not causally linked with each other. Instead, hot temperatures, a third variable, affects both variables separately. Failing to account for third variables can lead research biases<\/a> to creep into your work.<\/p>\n The directionality problem<\/strong> occurs when two variables correlate and might actually have a causal relationship, but it\u2019s impossible to conclude which variable causes changes in the other. For example, vitamin D levels are correlated with depression, but it\u2019s not clear whether low vitamin D causes depression, or whether depression causes reduced vitamin D intake.<\/p>\n You\u2019ll need to use an appropriate research design<\/a> to distinguish between correlational and causal relationships:<\/p>\n In a correlational research design, you collect data on your variables without manipulating them.<\/p>\n You find that physical activity level is positively correlated with self esteem: lower levels of physical activity are associated with lower self esteem, while higher levels of physical activity are associated with higher self esteem.<\/figure>\n Correlational research is usually high in external validity<\/a>, so you can generalize<\/a> your findings to real life settings. But these studies are low in internal validity<\/a>, which makes it difficult to causally connect changes in one variable to changes in the other.<\/p>\n These research designs are commonly used when it\u2019s unethical, too costly, or too difficult to perform controlled experiments. They are also used to study relationships that aren\u2019t expected to be causal.<\/p>\n You find a positive correlation between the variables: children who spend more time playing violent video games have higher rates of aggressive behavior.<\/figure>\n Without controlled experiments, it’s hard to say whether it was the variable you\u2019re interested in that caused changes in another variable. Extraneous variables<\/a> are any third variable<\/strong> or omitted variable<\/a> other than your variables of interest that could affect your results.<\/p>\n Limited control<\/a> in correlational research means that extraneous or confounding variables serve as alternative explanations for the results. Confounding variables can make it seem as though a correlational relationship is causal when it isn\u2019t.<\/p>\n But it\u2019s not something you control for, so you can only draw a conclusion of correlation between your main variables.<\/figure>\n When two variables are correlated, all you can say is that changes in one variable occur alongside changes in the other.<\/p>\n Regression to the mean<\/a><\/strong> is observed when variables that are extremely higher or extremely lower than average on the first measurement move closer to the average on the second measurement. Particularly in research that intentionally focuses on the most extreme cases or events, RTM should always be considered as a possible cause of an observed change.<\/p>\n Players or teams featured on the cover of SI<\/em> have earned their place by performing exceptionally well. But athletic success is a mix of skill and luck, and even the best players don\u2019t always win.<\/p>\n Chances are that good luck will not continue indefinitely, and neither can exceptional success.<\/p>\n In other words, due to RTM, a great performance is more likely to be followed by a mediocre one than another great one, giving the impression that appearing on the cover brings bad luck.<\/figure>\n A spurious correlation<\/strong> is when two variables appear to be related through hidden third variables or simply by coincidence.<\/p>\n The Theory of the Stork<\/a> draws a simple causal link between the variables to argue that storks physically deliver babies. This satirical study shows why you can\u2019t conclude causation from correlational research alone.<\/p>\n In reality, the correlation may be explained by third variables (such as weather patterns, environmental developments, etc.) that caused an increase in both the stork and human populations, or the link may be purely coincidental.<\/figure>\n When you analyze correlations in a large dataset with many variables, the chances of finding at least one statistically significant<\/a> result are high. In this case, you\u2019re more likely to make a type I error<\/a>. This means erroneously concluding there is a true correlation between variables in the population<\/a> based on skewed<\/a> sample data.<\/p>\n To demonstrate causation, you need to show a directional relationship<\/strong> with no alternative explanations. This relationship can be unidirectional, with one variable impacting the other, or bidirectional, where both variables impact each other.<\/p>\n A correlational design won\u2019t be able to distinguish between any of these possibilities, but an experimental design can test each possible direction, one at a time.<\/p>\n In correlational research, the directionality of a relationship is unclear because there is limited researcher control. You might risk concluding reverse causality, the wrong direction of the relationship.<\/p>\n Causal links between variables can only be truly demonstrated with controlled experiments<\/a>. Experiments test formal predictions, called hypotheses<\/a>, to establish causality in one direction at a time.<\/p>\n Experiments are high in internal validity<\/a>, so cause-and-effect relationships can be demonstrated with reasonable confidence.<\/p>\n You can establish directionality in one direction because you manipulate an independent variable<\/a> before measuring the change in a dependent variable.<\/p>\n To test whether this relationship is bidirectional, you\u2019ll need to design a new experiment assessing whether self esteem can impact physical activity level.<\/figure>\n In a controlled experiment, you can also eliminate the influence of third variables by using random assignment and control groups.<\/p>\n Random assignment<\/a> helps distribute participant characteristics evenly between groups so that they\u2019re similar and comparable. A control group<\/a> lets you compare the experimental manipulation to a similar treatment or no treatment (or a placebo, to control for the placebo effect<\/a>).<\/p>\n The control group receives an unrelated, comparable intervention, while the experimental group receives the physical activity intervention. By keeping all variables constant between groups, except for your independent variable treatment, any differences between groups can be attributed to your intervention.<\/figure>\n If you want to know more about statistics<\/a>, methodology<\/a>, or research bias<\/a>, make sure to check out some of our other articles with explanations and examples.<\/p>\n <\/em>Statistics<\/strong><\/p>\n <\/em> Methodology<\/strong><\/p>\n <\/em> Research bias<\/strong><\/p>\n A correlation<\/a> reflects the strength and\/or direction of the association between two or more variables.<\/p>\n Correlation<\/b> describes an association between <\/span>variables<\/span><\/a>: when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.<\/span><\/p>\n Causation<\/b><\/a> means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there\u2019s also a causal link between them.<\/span><\/p>\n While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B\u2014but A doesn\u2019t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to <\/span>false cause fallacy<\/b><\/a>.<\/p>\n\n <\/div>\n <\/dd>\n <\/div>\n The third variable and directionality problems are two main reasons why correlation isn\u2019t causation<\/a>.<\/p>\n The third variable<\/strong> problem means that a confounding variable<\/a> affects both variables to make them seem causally related when they are not.<\/p>\n The directionality problem<\/strong> is when two variables correlate and might actually have a causal relationship, but it\u2019s impossible to conclude which variable causes changes in the other.<\/p>\n\n <\/div>\n <\/dd>\n <\/div>\n Controlled experiments<\/a> establish causality, whereas correlational studies<\/a> only show associations between variables.<\/p>\n In general, correlational research is high in external validity<\/a> while experimental research is high in internal validity<\/a>.<\/p>\n\n <\/div>\n <\/dd>\n <\/div>\n <\/dl>\n","protected":false},"excerpt":{"rendered":" Correlation means there is a statistical association between variables. Causation means that a change in one variable causes a change in another variable. In research, you might have come across the phrase \u201ccorrelation doesn\u2019t imply causation.\u201d Correlation and causation are two related ideas, but understanding their differences will help you critically evaluate sources and interpret […]<\/p>\n","protected":false},"author":115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":""},"categories":[23650],"tags":[],"acf":[],"yoast_head":"What\u2019s the difference?<\/h2>\n
Why doesn\u2019t correlation mean causation?<\/h2>\n
\n
Correlational research<\/h2>\n
Third variable problem<\/h2>\n
Regression to the mean<\/h2>\n
Spurious correlations<\/h2>\n
Directionality problem<\/h2>\n
\n
Causal research<\/h2>\n
Other interesting articles<\/h2>\n
\n
\n
\n
Frequently asked questions about correlation and causation<\/h2>\n
\n
\n
\n