ST430 Introduction to Regression Analysis ST430: Introduction to Regression Analysis, Ch4, Sec 4.11-4.12 Luo Xiao September 23, 2015 1 / 23 ST430 Introduction to Regression Analysis Multiple Linear Regression 2 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Quadratic models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic model by adding a second term to the equation: E (Y ) = β0 + β1 X + β2 X 2 . This a special case of the two-variable model E (Y ) = β0 + β1 X1 + β2 X2 with X1 = X and X2 = X 2 . 3 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis 10 0 5 20 15 30 40 25 50 Quadratic functions: concave upward or downward? 2 4 6 x 4 / 23 8 10 2 4 6 x Multiple Linear Regression 8 10 ST430 Introduction to Regression Analysis Example: human immune system and exercise Example 4.7 in textbook X = maximal oxygen uptake (VO2 max, mL/(kg · min)); Y = immunoglobulin level (IgG, mg/dL); data for 30 subjects (AEROBIC.Rdata). Get the data and plot them (next slide). Slight curvature suggests a linear model may not fit well. 5 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis 1600 1200 800 IGG 2000 Scatter plot 40 50 60 70 MAXOXY 6 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis How to fit a quadratic model in R For the human immune system data, use the R code: fit = lm(IGG~MAXOXY+ I(MAXOXY^2),data = AEROBIC) In the above model formula, to add the quadratic term, we use the symbol: I(MAXOXY^2) Just adding the following to the model formula will not work: MAXOXY^2 7 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis 1600 1200 fitted quadratic curve least squares line 800 IGG 2000 Quadratic fit in R: graph 40 50 60 70 MAXOXY 8 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis See the text file for R output The global F-test shows the model is useful. The quadratic term for ’MAXOXY’ is significant, so we reject the null hypothesis that the linear model is acceptable. The quadratic term is negative, which is consistent with the concavity of the curve. The other two t-ratios test irrelevant hypotheses, because the quadratic term is important. 9 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Caution with the quadratic model Extrapolation: the fitted curve has a maximum at MAXOXY = 88.3071 ≈ 82 2 × 0.5362 and declines for higher ’MAXOXY’, which seems unlikely to represent the real relationship. Extrapolation can be dangerous. Quadratic model might not be realistic for this data. 10 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis An alternative analysis The graph of ’IGG’ against ’log(MAXOXY)’ is more linear (next slide) Fit the corresponding model (see the text file for output). 11 / 23 Multiple Linear Regression 1600 1200 800 IGG 2000 ST430 Introduction to Regression Analysis 3.6 3.8 4.0 4.2 log(MAXOXY) 12 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis 1600 1200 Quadratic Logrithmic 800 IGG 2000 Graph the two models 40 50 60 70 MAXOXY 13 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Comparison of the two models Similar adjusted R-squares: .933 for the quadratic model and .932 for the logarithmic model. The blue curve continues to increase indefinitely, but with diminishing slope. 14 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Qualitative variables A qualitative variable (or factor) is one that indicates membership of different categories. E.g., a person’s ’gender’ = ’male’ or ’female’: a qualitative variable with two levels, indicating membership of one of two categories. E.g., package ’type’ = ’Fragile’, ’Semifragile’, or ’Durable’: three levels, corresponding to three categories. 15 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis We code a qualitative variable using indicator (dummy) variables: Choose one level to use as a base or reference level, say ’male’ or ’Durable’. For each other level, create a variable ( Xj = 1 0 if this item is in this category otherwise. For gender, there is only one other category, so the only indicator variable is ( X= 16 / 23 1 0 for a female for a male. Multiple Linear Regression ST430 Introduction to Regression Analysis For packages, there are two other categories, so the indicator variables are X1 = X2 = ( for a fragile package otherwise, ( for a semifragile package otherwise, 1 0 1 0 For any item, at most one of the indicator variables is non-zero, indicating a non-base category; if they are all zero, the item belongs to the base category. 17 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Example: cost of shipping packages Example 4.9 in textbook Y : cost of package (dollars) Variable: package types: fragile, semifragile, and durable X1 : see the previous slide X2 : see the previous slide Data See next slide. Model EY = β0 + β1 X1 + β2 X2 . 18 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 COST 17.2 11.1 12.0 10.9 13.8 6.5 10.0 11.5 7.0 8.5 2.1 1.3 3.4 7.5 2.0 CARGO X1 X2 Fragile 1 0 Fragile 1 0 Fragile 1 0 Fragile 1 0 Fragile 1 0 SemiFrag 0 1 SemiFrag 0 1 SemiFrag 0 1 SemiFrag 0 1 SemiFrag 0 1 Durable 0 0 Durable 0 0 Durable 0 0 Durable 0 0 Durable 0 0 19 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Box plots Useful for plotting response againist a categorical variable. For the cost of shipping package example, use the R code: boxplot(COST~CARGO,data=CARGO) 20 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis 5 10 15 Box plot: R output Durable 21 / 23 Fragile SemiFrag Multiple Linear Regression ST430 Introduction to Regression Analysis See the text file for R output The global F-test shows that the model is useful; that is, there is mean difference for the three package types. Note that the intercept is the fitted value for X1 = X2 = 0; that is, mean for ’Durable’ packages. The coefficient of X1 measure the mean difference between ’Fragile’ and ’Durable’. The coefficient of X2 measure the mean difference between ’SemiFrag’ and ’Durable’. 22 / 23 Multiple Linear Regression ST430 Introduction to Regression Analysis Fitting model with qualitative variables in R An alternative and simpler way: fit = lm(COST~CARGO,data=CARGO) Compare with: fit = lm(COST~X1 + X2,data=CARGO) 23 / 23 Multiple Linear Regression
© Copyright 2025 Paperzz