ST430 Introduction to Regression Analysis ST430: Introduction to Regression Analysis, Ch2, Sec 2.1-2.4 Luo Xiao August 26, 2015 1 / 13 ST430 Introduction to Regression Analysis Introduction to Regression Analysis 2 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis Modeling a response A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates) X1 , X2 , . . . , Xp . Example Baltimore Longitudinal Study on Aging: Y = daily physical activity X1 = sex X2 = age X3 = employment 3 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis The average value of Y , E (Y ), depends on X1 , X2 , . . . , Xp , so it is a function of them: E (Y ) = f (X1 , X2 , . . . , Xp ) = f (X). We may know the general form of f (X), but it may contain constants β0 , β1 , . . . , βp whose values are unknown. So more completely, E (Y ) = f (X1 , X2 , . . . , Xk ; β0 , β1 , . . . , βp ) = f (X, β). This equation is a regression model. 4 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis Y is a random variable. In any given measurement, Y will differ from E (Y ). The difference = Y − E (Y ) is called the random error, and clearly E () = E (Y ) − E (Y ) = 0. We can then write the regression model as Y = E (Y ) + = f (X, β) + . 5 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis The simplest model is a linear function: E (Y ) = β0 + β1 X1 + β2 X2 + β3 X3 . Example: Baltimore Longitudinal Study on Aging E (daily physical activity) = β0 + β1 × age + β2 × sex + β3 × employment. Interpretation of parameters How to interept β1 ? If sex = 0 if female, sex = 1 if male: how to interpret β2 ? If employment = 0 if unemployed, employment = 1 if employed: how to interpret β3 ? 6 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis Origin of “Regression” Francis Galton studied inheritability of physical characteristics such as height. Consider the deviation of an individual’s height from the gender average. Suppose that the deviation height Y of a son is, on average, linearly related to the average deviation height X of his parents: E (Y ) = β0 + β1 X 7 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis The intercept β0 measures overall increase in height between generations, which is interesting but not related to inheritability. If β1 = 1, the son inherits the full characteristic of his parents. If β1 = 0, there is no inheritability. Galton observed β1 ≈ 2/3, and described this as a regression to the mean. 8 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis See Francis Galton, “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland, Vol 15, pages 246–263. (or Wikipedia!) The term “regression” has since been used for any such analysis, involving one or more variables, and involving linear and nonlinear relationships, mostly having no connection with inheritability. 9 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis Estimation In a regression context, we sample from many populations. For example, in the Baltimore Longitudinal Study on Aging, we observe many subjects for each combination of sex, age and employment. Each time, the measured data is drawn from some population. The constants β0 , β1 , . . . , βp are parameters of that collection of populations. 10 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis We need to make inferences about them, in the form of: point estimates; interval estimates; hypothesis tests. We shall get point estimates using the method of least squares. For other inferences, we need to know the distribution of the errors , and we shall assume that they are normally distributed. 11 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis Observational and experimental data In some investigations, the independent variables X1 , X2 , . . . , Xp can be controlled; that is, held at desired values. For example, case and control in a clinical trial. The resulting data are called experimental. 12 / 13 Introduction to Regression Analysis ST430 Introduction to Regression Analysis In other cases, the independent variables cannot be controlled, and their values are simply observed. For example, Galton’s heights of parents and sons. The resulting data are called observational. Observational data show how the value of the response is associated with values of the independent variables, but generally cannot reveal cause and effect. George Box: “To find out what happens to a system when you interfere with it, you have to interfere with it (not just passively observe it).” 13 / 13 Introduction to Regression Analysis
© Copyright 2025 Paperzz