Introduction to biostatistics: Part 6, correlation and regression

Monica L. Gaddis, Gary Gaddis

Research output: Contribution to journalArticle

30 Scopus citations


Correlation and regression analysis are applied to data to define and quantify the relationship between two variables. Correlation analysis is used to estimate the strength of a relationship between two variables. The correlation coefficient r is a dimensionless number ranging from -1 to +1. A value of -1 signifies a perfect negative, or indirect (inverse) relationship. A value of +1 signifies a perfect positive, or direct relationship. The r can be calculated as the Pearson-product r, using normally distributed interval or ratio data, or as the Spearman rank r, using non-normally distributed data that are not interval or ratio in nature. Linear regression analysis results in the formation of an equation of a line (Y = mX + b), which mathematically describes the line of best fit for a data relationship between X and Y variables. This equation can then be used to predict additional dependent variable values (Ŷ), based on the value or the independent variable X, the slope m, and the Y-intercept b. Interpretation of the correlation coefficient r involves use of r2, which implies the degree of variability of Y due to X. Tests of significance for linear regression are similar conceptually to significance testing using analysis of variance. Multiple correlation and regression, more complex analytical methods that define relationships between three or more variables, are not covered in this article. Closing comments for this final installment of this introduction to biostatistics series are presented.

Original languageEnglish
Pages (from-to)1462-1468
Number of pages7
JournalAnnals of emergency medicine
Issue number12
StatePublished - Dec 1990


  • biostatistics

Fingerprint Dive into the research topics of 'Introduction to biostatistics: Part 6, correlation and regression'. Together they form a unique fingerprint.

  • Cite this