Correlation analysis is used to quantify the degree to which two variables are related or on a set of paired observations. You evaluate the correlation coefficient that tells you how much one variable changes when the other variable changes. Correlation analysis can show you a linear relationship between variables.
The possible values of the correlation coefficient or r can range from -1 (when there is a perfect negative correlation) to + 1 (when there is a perfect positive correlation). The closer the r values are to 0, the weaker the correlation (either negative or positive).
As an example, we can consider weight and diastolic BP. We can document weight in kilograms as a continuous variable and diastolic BP in mm Hg as a continuous variable. We can explore if weight and diastolic BP are correlated. If diastolic BP decreases as weight increases, we might find a perfect negative correlation-higher weight leading to lower diastolic BP. If there is a unit increase in diastolic BP as weight increases, we might find a perfect positive correlation- higher weights leading to higher diastolic BP. Pragmatically, perfect negative or perfect positive correlations are rare, what we usually get is somewhere in between. If both weight and diastolic BP show no pattern of relation, then we may find a r value closer to 0.
Correlation does not imply causation
Pearson’s r is probably one of the most frequently used measures of agreement for continuous variables in the biomedical literature and is also one of the least appropriate tests to do.
Pearson’s r
- The variables that are considered for Pearson’s correlation analysis preferably have a continuous structure.
- It is an index of linear association but does not necessarily mean good agreement. It is insensitive to systematic differenceeach variabs between two observers or readings.
- The value of r is sensitive to the range of values and is usually higher when the spread of values is higher. Pearson’s r is very sensitive to extreme values (outliers) which can change the r values significantly.
- The r can give you an idea of the strength (weak, strong) and direction (positive, negative, none) of the relationship between the two variables.
To do a Pearson’s r, the following assumptions must be met
- Each variable must be continuous
- Both variables must be normally distributed
- The two variables are assumed to have a linear relationship
- The observations are paired observations
- There are no significant outliers
Spearman’s Correlation or Spearman’s p (pronounced rho) or Rs
This is a nonparametric measure of rank correlation or correlation between the ranking or ordering of two variables. It explores how well the relationship between two variables can be described using a monotonic function. The variables can be continuous or Ordinal and the relationship is assessed based on the ranked values for each variable rather than the raw data.
What is a monotonic relationship?
- As the value of one variable increases, the value of the other variable increases
- As the value of one variable increases, the other variable value decreases
However, not exactly at a constant rate whereas in a linear relationship the rate of increase/decrease is constant.
The Spearman P can thus be used both for a linear and a non-linear relationship.
The Spearman P can be used with data that is normally distributed and with data that is not normally distributed
The Spearman P works with rank-ordered variables and not with raw data values and hence it measures the strength and direction of the monotonic relationship between the two ranked or ordered variables
The Spearman P is less affected by outliers and hence can be used even in the presence of outlier values
Assumptions for Spearman’s P
- The two variables must be measured on an ordinal, interval or ratio scale
- The variables represent paired observations
- There is a monotonic relationship between the variables (can be checked using a scatterplot)
Caution in interpreting P values around R values
- Even when the relationship is weak (r=0.3 or 0.4 for example), the corresponding p value may be significant if the sample size is reasonably large and maybe misinterpreted as showing a significant relationship.
- It is more meaningful to look at and interpret the confidence limits of the R rather than interpret the p values associated with the R value.
- Scaleable 7 elements of Preventive Radiology in Cancers & Risk Factor reduction in NCD for healthy ageing - August 1, 2022
- Health from the Heart | Imaging to Detect and Prevent Progress of Fatty Liver- Discussion - April 11, 2022
- Health from the Heart| Hypertension in Pregnancy- Conversation - April 7, 2022