in ,

Correlation Analysis in Healthcare

Correlation analysis is used to quantify the degree to which two variables are related or on a set of paired observations.  You evaluate the correlation coefficient that tells you how much one variable changes when the other variable changes. Correlation analysis can show you a linear relationship between variables.

The possible values of the correlation coefficient or r can range from -1 (when there is a perfect negative correlation) to + 1 (when there is a perfect positive correlation).  The closer the r values are to 0, the weaker the correlation (either negative or positive).

As an example, we can consider weight and diastolic BP. We can document weight in kilograms as a continuous variable and diastolic BP in mm Hg as a continuous variable. We can explore if weight and diastolic BP are correlated.  If diastolic BP decreases as weight increases, we might find a perfect negative correlation-higher weight leading to lower diastolic BP. If there is a unit increase in diastolic BP as weight increases, we might find a perfect positive correlation- higher weights leading to higher diastolic BP. Pragmatically, perfect negative or perfect positive correlations are rare, what we usually get is somewhere in between. If both weight and diastolic BP show no pattern of relation, then we may find a r value closer to 0.

Correlation does not imply causation

Pearson’s r is probably one of the most frequently used measures of agreement for continuous variables in the biomedical literature and is also one of the least appropriate tests to do.

Pearson’s r

  1. The variables that are considered for Pearson’s correlation analysis preferably have a continuous structure.
  2. It is an index of linear association but does not necessarily mean good agreement. It is insensitive to systematic differenceeach variabs between two observers or readings.
  3. The value of r is sensitive to the range of values and is usually higher when the spread of values is higher. Pearson’s r is very sensitive to extreme values (outliers) which can change the r values significantly.
  4. The r can give you an idea of the strength (weak, strong) and direction (positive, negative, none) of the relationship between the two variables.

To do a Pearson’s r, the following assumptions must be met

  • Each variable must be continuous
  • Both variables must be normally distributed
  • The two variables are assumed to have a linear relationship
  • The observations are paired observations
  • There are no significant outliers

Spearman’s Correlation or Spearman’s p (pronounced rho) or Rs

This is a nonparametric measure of rank correlation or correlation between the ranking or ordering of two variables. It explores how well the relationship between two variables can be described using a monotonic function. The variables can be continuous or Ordinal and the relationship is assessed based on the ranked values for each variable rather than the raw data.

What is a monotonic relationship?

  • As the value of one variable increases, the value of the other variable increases
  • As the value of one variable increases, the other variable value decreases

However, not exactly at a constant rate whereas in a linear relationship the rate of increase/decrease is constant.

The Spearman P can thus be used both for a linear and a non-linear relationship.

The Spearman P can be used with data that is normally distributed and with data that is not normally distributed

The Spearman P works with rank-ordered variables and not with raw data values and hence it measures the strength and direction of the monotonic relationship between the two ranked or ordered variables

The Spearman P is less affected by outliers and hence can be used even in the presence of outlier values

Assumptions for Spearman’s P

  • The two variables must be measured on an ordinal, interval or ratio scale
  • The variables represent paired observations
  • There is a monotonic relationship between the variables (can be checked using a scatterplot)

Caution in interpreting P values around R values

  • Even when the relationship is weak (r=0.3 or 0.4 for example), the corresponding p value may be significant if the sample size is reasonably large and maybe misinterpreted as showing a significant relationship.
  • It is more meaningful to look at and interpret the confidence limits of the R rather than interpret the p values associated with the R value.

Dr Praveen Nirmalan

Written by Dr Praveen Nirmalan

Dr. Nirmalan did his basic medical education from Thrissur, Kerala and followed it with a PG Diploma in Ophthalmology from Aravind Eye Care System, Madurai and a Vitreo-retinal Fellowship from Mumbai. Subsequently, he completed his MPH and a Public Health Ophthalmology Fellowship from the Johns Hopkins School of Public Health in the USA. He has led community-based and clinical research in some of the top eye care institutes of India and led a clinical research program at a top tier obstetric and neonate institute as well. He has experience chairing Ethics Committees and has helped with the setting up of Institutional Review Boards. Besides mentoring clinical faculty, he has mentored DNB and PhD students through their dissertation work and research methods.

Design and development of objective questionnaires

Redefine the role of Radiology in Indian Healthcare-Optimize Potential for Impact