Correlation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two or more variables. Karl Pearson’s Coefficient of Correlation, often referred to simply as Pearson’s correlation coefficient (r), is one of the most widely used methods for quantifying the correlation between two continuous variables. The Rank Method, on the other hand, is used to calculate correlation when the data is in the form of ranks or ordinal data.

Karl Pearson’s Coefficient of Correlation (r):

Pearson’s correlation coefficient (r) measures the linear relationship between two continuous variables, X and Y. It provides a value between -1 and 1, where:

r = 1 indicates a perfect positive linear correlation (as X increases, Y increases).
r = -1 indicates a perfect negative linear correlation (as X increases, Y decreases).
r = 0 indicates no linear correlation between X and Y.

The formula for Pearson’s correlation coefficient (r) is:

$r = \frac{n ( \sum X Y ) - ( \sum X ) ( \sum Y )}{[ n \sum X ^{2} - ( \sum X ) ^{2} ] [ n \sum Y ^{2} - ( \sum Y ) ^{2} ]}$

Where:

n is the number of data points.
Σ represents the sum.
X and Y are the individual data points for the two variables.

Pearson’s correlation coefficient is sensitive to outliers and assumes a linear relationship between the variables. It is not suitable for non-linear relationships or categorical data.

Rank Method:

The Rank Method is used when the data is in the form of ranks or ordinal data and does not meet the assumptions of Pearson’s correlation coefficient. This method is also known as Spearman’s Rank Correlation Coefficient (ρ or rs).

The Rank Method involves the following steps:

Rank the data: Assign ranks to each data point for both variables, X and Y. If there are ties (i.e., multiple data points with the same value), assign an average rank to those tied values.
Calculate the differences: For each data point, calculate the difference between the ranks of X and Y (d = rank(X) – rank(Y)).
Square the differences: Square each of the differences (d^2).
Sum the squared differences: Sum up all the squared differences.
Calculate the Rank Correlation Coefficient: Use the formula:

$ρ = 1 - \frac{6Σ d ^{2}}{n ( n ^{2} - 1 )}$

Where:

ρ (rho) is the Rank Correlation Coefficient.
Σ represents the sum.
d is the difference between ranks.
n is the number of data points.

The Rank Correlation Coefficient (ρ) provides a value between -1 and 1, similar to Pearson’s correlation coefficient. It quantifies the strength and direction of the monotonic relationship between the two variables. A positive ρ indicates a positive monotonic relationship, a negative ρ indicates a negative monotonic relationship, and ρ = 0 indicates no monotonic relationship.

The Rank Method is more robust to outliers and does not assume linearity, making it suitable for non-linear relationships or ordinal data.