ch02-sec01-4.pdf

ST430 Introduction to Regression Analysis
ST430: Introduction to Regression Analysis, Ch2, Sec
2.1-2.4
Luo Xiao
August 26, 2015
1 / 13
ST430 Introduction to Regression Analysis
Introduction to Regression Analysis
2 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
Modeling a response
A regression model describes how a dependent variable (or response) Y is
affected, on average, by one or more independent variables (or factors, or
covariates) X1 , X2 , . . . , Xp .
Example
Baltimore Longitudinal Study on Aging:
Y = daily physical activity
X1 = sex
X2 = age
X3 = employment
3 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
The average value of Y , E (Y ), depends on X1 , X2 , . . . , Xp , so it is a
function of them:
E (Y ) = f (X1 , X2 , . . . , Xp ) = f (X).
We may know the general form of f (X), but it may contain constants
β0 , β1 , . . . , βp whose values are unknown.
So more completely,
E (Y ) = f (X1 , X2 , . . . , Xk ; β0 , β1 , . . . , βp ) = f (X, β).
This equation is a regression model.
4 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
Y is a random variable. In any given measurement, Y will differ from E (Y ).
The difference
= Y − E (Y )
is called the random error, and clearly
E () = E (Y ) − E (Y ) = 0.
We can then write the regression model as
Y = E (Y ) + = f (X, β) + .
5 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
The simplest model is a linear function:
E (Y ) = β0 + β1 X1 + β2 X2 + β3 X3 .
Example: Baltimore Longitudinal Study on Aging
E (daily physical activity) = β0 + β1 × age + β2 × sex + β3 × employment.
Interpretation of parameters
How to interept β1 ?
If sex = 0 if female, sex = 1 if male: how to interpret β2 ?
If employment = 0 if unemployed, employment = 1 if employed: how
to interpret β3 ?
6 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
Origin of “Regression”
Francis Galton studied inheritability of physical characteristics such as
height.
Consider the deviation of an individual’s height from the gender average.
Suppose that the deviation height Y of a son is, on average, linearly related
to the average deviation height X of his parents:
E (Y ) = β0 + β1 X
7 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
The intercept β0 measures overall increase in height between generations,
which is interesting but not related to inheritability.
If β1 = 1, the son inherits the full characteristic of his parents.
If β1 = 0, there is no inheritability.
Galton observed β1 ≈ 2/3, and described this as a regression to the mean.
8 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
See Francis Galton, “Regression towards mediocrity in hereditary stature”.
The Journal of the Anthropological Institute of Great Britain and Ireland,
Vol 15, pages 246–263. (or Wikipedia!)
The term “regression” has since been used for any such analysis, involving
one or more variables, and involving linear and nonlinear relationships,
mostly having no connection with inheritability.
9 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
Estimation
In a regression context, we sample from many populations.
For example, in the Baltimore Longitudinal Study on Aging, we observe
many subjects for each combination of sex, age and employment. Each
time, the measured data is drawn from some population.
The constants β0 , β1 , . . . , βp are parameters of that collection of
populations.
10 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
We need to make inferences about them, in the form of:
point estimates;
interval estimates;
hypothesis tests.
We shall get point estimates using the method of least squares.
For other inferences, we need to know the distribution of the errors , and
we shall assume that they are normally distributed.
11 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
Observational and experimental data
In some investigations, the independent variables X1 , X2 , . . . , Xp can be
controlled; that is, held at desired values.
For example, case and control in a clinical trial.
The resulting data are called experimental.
12 / 13
Introduction to Regression Analysis
ST430 Introduction to Regression Analysis
In other cases, the independent variables cannot be controlled, and their
values are simply observed.
For example, Galton’s heights of parents and sons.
The resulting data are called observational.
Observational data show how the value of the response is associated with
values of the independent variables, but generally cannot reveal cause and
effect.
George Box: “To find out what happens to a system when you interfere
with it, you have to interfere with it (not just passively observe it).”
13 / 13
Introduction to Regression Analysis