Correlation Coefficients
Correlation coefficients are essential descriptive statistical tools that quantify the strength and direction of the relationship between two variables of data. When evaluating the relationship between two variables, it is important to determine how the variables are related.
- Linear relationship: two variables change together at a constant rate in the same direction, either both increasing or both decreasing. A linear relationship is a trend in the data that can be modeled by a straight line.
- Nonlinear monotonic relationship: one where the variables consistently move in the same general direction (always increasing or always decreasing) but not at a constant, steady rate.
- No relationship: knowing the value of one variable provides no information about the value of the other variable
Scatter Plot
One way to get a general idea about whether or not two variables are related is to plot them on a “scatter plot”.
Selecting the Correlation Coefficient
Examining the distribution and pattern of data points in scatter plots is crucial for choosing the appropriate correlation test.
- When a scatter plot shows data points that roughly follow a straight line pattern with relatively even distribution around that line, Pearson correlation is appropriate. Look for data that appears to cluster around an imaginary straight line, whether it slopes upward (positive correlation) or downward (negative correlation). There are online calculators for the Pearson correlation. Spreadsheet software can calculate the Pearson correlation using the formula =PEARSON(array1, array2).
- If a scatter plot reveals a curved relationship, such as an exponential growth pattern, Spearman rank correlation would be more suitable since it can detect these monotonic but non-linear patterns. There are online calculators for the Spearman rank correlation. Spreadsheet software can calculate the Spearman rank correlation. First the data sets need to be individually ranked. To rank values in Excel, use the formular =RANK.EQ(number, ref, [order]). Number is the value being ranked, ref is the range of values to rank against, and order specifies ascending (1). Once each data set has been ranked, the use the formula =PEARSON(array1, array2) with the ranked data.
Interpreting Correlation Coefficients
Correlation Coefficients Communicate the Direction of a Relationship:
Correlation Coefficients Always Fall Between -1.00 and +1.00:
- If a correlation coefficient is a negative number, there is an indirect, negative relationship between the two variables. A negative relationship means that as values on one variable increase (go up) the values on the other variable tend to decrease (go down) in a predictable manner.
- If a correlation coefficient is a positive number, there is a direct, positive relationship between the two variables. A positive relationship means that as one variable increases (or decreases) the values of the other variable tend to go in the same direction. If one increases, so does the other. If one decreases, so does the other in a predictable manner.
Correlation Coefficients Always Fall Between -1.00 and +1.00:
- A correlation coefficient of -1.00 indicates that there is a perfect negative relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable decrease in values on the other variable. In other words, as one variable goes up, the other goes in the opposite direction (it goes down).
- A correlation coefficient of +1.00 indicates that there is a perfect positive relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable increase in values on the other variable. In other words, as one variable goes up so does the other.
- The closer a correlation coefficient approaches plus or minus 1.00, the stronger the relationship is and the more accurately one can predict what happens to one variable based on the knowledge of the other variable.
- Generally, values between 0.7-1.0 (or -0.7 to -1.0) indicate strong correlations, 0.3-0.7 (or -0.3 to -0.7) suggest moderate correlations, and 0.0-0.3 (or -0.3 to 0.0) represent weak correlations. In biological systems, even moderate correlations can be biologically meaningful due to the complexity of living organisms and environmental factors.
- A correlation coefficient of 0.00 indicates that there is a zero correlation, or no relationship, between the two variables. In other words, as one variable changes (goes up or down), it is impossible to know anything about what happens to the other variable.
Making Statistical Inferences from Correlation Coefficients:
The correlation coefficient is both a descriptive and inferential statistic, depending on its use. It is a descriptive statistic when it simply summarizes the strength and direction of a linear relationship between two variables within a specific dataset. It is an inferential statistic when it is used to determine whether or not a correlation is simply a chance occurrence or if it really is true of the population. Just like other inferential statistical tests, the significance of a correlation tests two hypotheses:
|
Null Hypothesis (Ho):
"There is not a significant correlation between the two variables; any observed trend or relationship may be due to chance and sampling error." For example:
|
Alternative Hypothesis (H1):
"There is a significant correlation between the two variables; the observed trend or relationship is most likely not due to chance or sampling error." For example:
|
Using the calculated Correlation Coefficient and the number of pairs of data being correlated, online calculators can be used to determine whether the correlation is statistically significant (in other words, “is generalizable” in the larger population). The calculator returns a "p-value," which represented the probability of calculating the given correlation by chance (assuming there is actually no true relationship between the variables.
A weak correlation coefficient (such as r = 0.3) might still be statistically significant if you have a large sample size, while a stronger correlation (such as r = 0.7) might not reach statistical significance with a very small sample. This is why both the magnitude of the correlation coefficient and its associated p-value matter when interpreting results.
Remember that statistical significance doesn't automatically mean biological significance. Findings may not have meaningful implications for understanding biological processes even if they meet the mathematical criteria for significance.
- A low p-value (typically below 0.05) suggests the correlation is statistically significant, meaning it is unlikely to have occurred randomly and there is a real relationship between the variables. Reject the null hypothesis.
- A high p-value (typically above 0.05) suggests the correlation is statistically insignificant, meaning it could have occurred randomly by chance and there may not be a real relationship between the variables. Fail to reject the null hypothesis.
A weak correlation coefficient (such as r = 0.3) might still be statistically significant if you have a large sample size, while a stronger correlation (such as r = 0.7) might not reach statistical significance with a very small sample. This is why both the magnitude of the correlation coefficient and its associated p-value matter when interpreting results.
Remember that statistical significance doesn't automatically mean biological significance. Findings may not have meaningful implications for understanding biological processes even if they meet the mathematical criteria for significance.