TY - JOUR
T1 - Introduction to biostatistics
T2 - Part 6, correlation and regression
AU - Gaddis, Monica L.
AU - Gaddis, Gary M.
PY - 1990/12
Y1 - 1990/12
N2 - Correlation and regression analysis are applied to data to define and quantify the relationship between two variables. Correlation analysis is used to estimate the strength of a relationship between two variables. The correlation coefficient r is a dimensionless number ranging from -1 to +1. A value of -1 signifies a perfect negative, or indirect (inverse) relationship. A value of +1 signifies a perfect positive, or direct relationship. The r can be calculated as the Pearson-product r, using normally distributed interval or ratio data, or as the Spearman rank r, using non-normally distributed data that are not interval or ratio in nature. Linear regression analysis results in the formation of an equation of a line (Y = mX + b), which mathematically describes the line of best fit for a data relationship between X and Y variables. This equation can then be used to predict additional dependent variable values (Ŷ), based on the value or the independent variable X, the slope m, and the Y-intercept b. Interpretation of the correlation coefficient r involves use of r2, which implies the degree of variability of Y due to X. Tests of significance for linear regression are similar conceptually to significance testing using analysis of variance. Multiple correlation and regression, more complex analytical methods that define relationships between three or more variables, are not covered in this article. Closing comments for this final installment of this introduction to biostatistics series are presented.
AB - Correlation and regression analysis are applied to data to define and quantify the relationship between two variables. Correlation analysis is used to estimate the strength of a relationship between two variables. The correlation coefficient r is a dimensionless number ranging from -1 to +1. A value of -1 signifies a perfect negative, or indirect (inverse) relationship. A value of +1 signifies a perfect positive, or direct relationship. The r can be calculated as the Pearson-product r, using normally distributed interval or ratio data, or as the Spearman rank r, using non-normally distributed data that are not interval or ratio in nature. Linear regression analysis results in the formation of an equation of a line (Y = mX + b), which mathematically describes the line of best fit for a data relationship between X and Y variables. This equation can then be used to predict additional dependent variable values (Ŷ), based on the value or the independent variable X, the slope m, and the Y-intercept b. Interpretation of the correlation coefficient r involves use of r2, which implies the degree of variability of Y due to X. Tests of significance for linear regression are similar conceptually to significance testing using analysis of variance. Multiple correlation and regression, more complex analytical methods that define relationships between three or more variables, are not covered in this article. Closing comments for this final installment of this introduction to biostatistics series are presented.
KW - biostatistics
UR - http://www.scopus.com/inward/record.url?scp=0025204683&partnerID=8YFLogxK
U2 - 10.1016/S0196-0644(05)82622-8
DO - 10.1016/S0196-0644(05)82622-8
M3 - Article
C2 - 2240762
AN - SCOPUS:0025204683
SN - 0196-0644
VL - 19
SP - 1462
EP - 1468
JO - Annals of emergency medicine
JF - Annals of emergency medicine
IS - 12
ER -