Karl Pearson’s coefficient of correlation, also known as Pearson’s correlation coefficient or simply the Pearson correlation, is a measure of the linear relationship between two continuous variables. It quantifies the strength and direction of the linear association between the variables.
Pearson’s correlation coefficient, denoted by “r,” ranges between -1 and +1. The value of “r” indicates the strength and direction of the linear relationship as follows:
- If “r” is close to +1, it indicates a strong positive linear relationship, implying that as one variable increases, the other variable tends to increase as well.
- If “r” is close to -1, it indicates a strong negative linear relationship, meaning that as one variable increases, the other variable tends to decrease.
- If “r” is close to 0, it suggests a weak or no linear relationship between the variables.
Properties of Correlation:
- Range: The correlation coefficient “r” always ranges between -1 and +1, inclusive.
- Symmetry: The correlation coefficient is symmetric, meaning that the correlation between variable A and variable B is the same as the correlation between variable B and variable A.
- Independence of Scale: Correlation is unaffected by changes in scale or units of measurement of the variables. It remains the same if the variables are multiplied by a constant or if their units are changed.
- Linearity: Pearson’s correlation coefficient measures the linear relationship between variables. It assumes a linear association and may not capture nonlinear relationships.
- Sensitivity to Outliers: The presence of outliers can have a significant impact on the correlation coefficient, especially when they exert undue influence on the relationship between the variables.
- No Causation: Correlation does not imply causation. A strong correlation between two variables does not necessarily imply that changes in one variable cause changes in the other variable. Correlation only measures the degree of association, not causality.
- Sample Dependency: The calculated correlation coefficient is based on a sample of data and may vary from sample to sample. Larger sample sizes generally provide more reliable estimates of the true population correlation.
- Multivariate Relationships: Pearson’s correlation coefficient measures the pairwise relationship between two variables. It does not capture the influence of other variables or account for multivariate relationships.
It is important to interpret correlation coefficients in conjunction with other statistical measures and consider the context of the data and research question to draw meaningful conclusions about the relationship between variables.