Adjusted RSquared, RSquared Adjusted  A version of RSquared that has been adjusted for the number of predictors in the model. RSquared tends to over estimate the strength of the association especially if the model has more than one independent variable.
(See RSquare Adjusted.)
C_{p} Statistic  C_{p} measures the differences of a fitted regression model from a true model, along with the random error. When a regression model with p independent variables contains only random differences from a true model, the average value of C_{p }is (p+1), the number of parameters. Thus, in evaluating many alternative regression models, our goal is to find models whose C_{p }is close to or below (p+1). (Statistics for Managers, page 917.)
C_{p} Statistic formula:.
C_{p} = ((1R_{p}^{2})(nT) / (1R_{T}^{2}))
– [n – 2(p+1)]
p = number of independent variable included in a
regression model
T = total number of parameters (including the intercept) to be estimated in the
full regression model
R_{p}^{2} = coefficient of multiple determination for a
regression model that has p independent variables
R_{T}^{2} = coefficient of
multiple determination for a full regression model that contains all T estimated
parameters.
Confidence Interval  The
lower endpoint on a confidence interval is called the lower bound or lower
limit.
The lower bound is the point estimate minus the margin of error. The upper
bound is the point estimate plus the margin of error.
Coefficient of Determination – In general the coefficient of determination measures the amount of variation of the response variable that is explained by the predictor variable(s). The coefficient of simple determination is denoted by rsquared and the coefficient of multiple determination is denoted by Rsquared. (See rsquare)
Coefficient of Variation – In general the coefficient of variation measures the amount of variation of the response variable. If this value is small, then the data is considered ill conditioned.
Correlation Coefficients, Pearson’s r  Measures the strength of linear association between two numerical variables. (See r.)
DFITS, DFFITS: Combines leverage and studentized residual (deleted t residuals) into one overall measure of how unusual an observation is. DFITS is the difference between the fitted values calculated with and without the i^{th} observation, and scaled by stdev (Ŷ_{i}). Belseley, Kuh, and Welsch suggest that observations with DFITS >2Ö(p/n) should be considered as unusual. (Minitab, page 29.)
Error  In general, the error difference in the observed and estimated value of a parameter.
Error
in Regression
= Error in the prediction for the i^{th}
observation (actual Y minus predicted Y)
Errors, Residuals  In regression analysis, the error is the difference in the observed Y values and the predicted Y values that occur from using the regression model. See the graph below.
Ftest: An Ftest is usually a ratio of two numbers, where each number estimates a variance. An Ftest can be used in the test of equality of two population variances. An Ftest is also used in analysis of variance (ANOVA), where it tests the hypothesis of equality of means for two or more groups. For instance, in an ANOVA test, the F statistic is usually a ratio of the Mean Square for the effect of interest and Mean Square Error. The Fstatistic is very large when MS for the factor is much larger than the MS for error. In such cases, reject the null hypothesis that group means are equal. The pvalue helps to determine statistical significance of the Fstatistic. (Vogt, page 117)
The F test statistic can be used in Simple Linear Regression to assess the overall fit of the model.
F
= test statistics for ANOVA for Regression= MSR/MSE,
where MSR=Mean
Square Regression, MSE = Mean Square Error
F
has df_{SSR} for the numerator and df_{SSE} for
the denominator
The null and alternative hypotheses for simple linear regression for the Ftest statistic are
H_{o}: b_{1}=0; where b_{1 }is the coefficient for x (i.e. the slope of x)
H_{a}: b_{ 1} is not 0
pvalue = the probability that the random variable F > the value of the test statistics. This value is found by using an F table where F has df_{SSR} for the numerator and df_{SSE} for the denominator.
Leverages, Leverage Points  An extreme value in the independent (explanatory) variable(s). Compared with an outlier, which is an extreme value in the dependent (response) variable.
The hat matrix is H = X (X'X)^{1} X', where X is
the design matrix. The leverage of the i^{th} observation is the i^{th}
diagonal element, h_{i} (also called v_{ii} and r_{ii}),
of H. Note that h_{i} depends only on the predictors; it does not
involve the response Y. If h_{i} is large, the ith observation has
unusual predictors (X_{1i}, X_{2i}, ..., X_{ki}).
Many people consider hi to be large enough to merit checking if it is
more than 2p/n or 3p/n, where p is the number of predictors (including one for
the constant). This observation
will have a large influence in determining the regression coefficients.
Mean Square Error
MSE
= Mean Square Errors = Error Mean Square = Residual Mean Square
Mean Square Regression
MSR
= MSRegression
=Mean Square of Regression
Multiple Correlation Coefficient, R  A measure of the amount of correlation between more than two variables. As in multiple regression, one variable is the dependent variable and the others are independent variables. The positive square root of Rsquared. (See R.)
Prediction Interval  In regression analysis, a range of values that estimate the value of the dependent variable for given values of one or more independent variables. Comparing prediction intervals with confidence intervals: prediction intervals estimate a random value, while confidence intervals estimate population parameters.
where
r, Correlation Coefficients, Pearson’s r  Measures the strength of linear association between two numerical variables.
R, Coefficient of Multiple Correlation  A measure of the amount of correlation between more than two variables. As in multiple regression, one variable is the dependent variable and the others are independent variables. The positive square root of Rsquared.
r^{2} , rsquared, Coefficient of Simple Determination  The percent of the variance in the dependent variable that can be explained by of the independent variable.
r^{2
}= SSRegression / SSTotal = (explained variation)/(total variation) = percent
of the variation of Y that is explained by the model.
Rsquared, Coefficient of Multiple Determination  The percent of the variance in the dependent variable that can be explained by all of the independent variables taken together.
_{ }
_{ } = 1 – percent of deviation that is not explained by the model = the percent of variation that is explained by the model.
For
simple linear regression R^{2} reduces r^{2}.
Note: The coefficient of simple (multiple) determination is the square of the simple (multiple) correlation coefficient.
RSquared Adjusted, Adjusted RSquared,  A version of RSquared that has been adjusted for the number of predictors in the model. RSquared tends to over estimate the strength of the association especially if the model has more than one independent variable.
where
R=multiple regression coefficient.
More
equivalent formulas for R^{2} and R^{2}adjusted
are shown below. From this
formulation, we can see the relationship between the two statistics.
We can see how Rsquared Adjusted, “adjusts” for the number of
variables in the model.
,
where
k=the number of coefficients in the regression equation. Note, k includes the constant coefficient.
For simple linear regression when you fit the yintercept, k=2.
If you do not fit the yintercept (i.e. let the yintercept be zero) then
k=1.
For
simple linear regression, when you do not fit the yintercept, then k=1
and the formula for Rsquared Adjusted simplifies to Rsquared.
If k=1, then










Regression SS (See SSregression)  The sum of squares that is explained by the regression equation. Analogous to betweengroups sum of squares in analysis of variance.
Standard Deviation  A statistic that shows the square root of the squared distance that the data points are from the mean.
for a sample
for a population
Standard Error, Standard Error of the Regression, Standard Error of the Mean, Standard Error of the Estimate  In regression the standard error of the estimate is the standard deviation of the observed yvalues about the predicted yvalues. In general, the standard error is a measure of sampling error. Standard error refers to error in estimates resulting from random fluctuations in samples. The standard error is the standard deviation of the sampling distribution of a statistic. Typically the smaller the standard error, the better the sample statistic estimates of the population parameter. As N goes up, so does standard error.
Formula
for the Standard Error of Estimate:
df_{errors}
= number of observations – number of independent variables in the model –1
For
simple linear regression: df_{errors} = n11 = n2
for simple linear regression when fitting the yintercept.
(Two degrees of freedom are lost since we are estimating the slope and
the yintercept)
Standardized Residuals  Standardized residuals are of the form (residual) / (square
root of the Mean Square Error). Standardized residuals have variance 1. If the standardized
residual is larger than 2, then it is usually considered large.
(Minitab.)
where
Sum Square Errors
SSE
= SSErrors = Sum Square of Errors = Error Sum of Squares = SSResidual
= Sum Square of Residuals = Residual Sum of Squares
Alternative
computational formula for SSE
SSR
= SSRegression =
Sum square of regression =
Sum
of square of the differences in the predicted value of Y and the average value
of Y. This tells how far the
predicted value is from the average value.
Sum Square Total
SST
= SSTotal =
Sum Square of Total Variation of Y = sum of square of error from Y to the mean
of Y.
SST
= SSE + SSR =
unexplained variation + explained variation
Note:
has a definite pattern, but
is the error and it should be
random.
Variance Inflation Factor (VIF)  A statistics used to measuring the possible collinearity of the explanatory variables. Let X1, X2, ..., Xk be the k predictors. Regress Xj on the remaining k  1 predictors and let RSQj be the Rsquared from this regression. Then the variance inflation factor for Xj is 1/(1  RSQj). When Xj is highly correlated with the remaining predictors, its variance inflation factor will be very large. When Xj is orthogonal to the remaining predictors, its variance inflation factor will be 1. (Minitab)
=Actual value of Y for observation i
= Predicted or estimated value of Y based on the given
= average of the original Yvalues
Regression Tutorial Menu Dictionary