ST430 Introduction to Regression Analysis ST430: Introduction to Regression Analysis, Case Study 3 Luo Xiao October 26, 2015 1 / 22 ST430 Introduction to Regression Analysis Case Study 3 2 / 22 Case Study 3 ST430 Introduction to Regression Analysis Deregulation of trucking industry What was the impact of deregulation on trucking prices in Florida? What is a good model for predicting prices? Ge the data and plot them (see "output1.pdf"): setwd("~/Dropbox/teaching/2015Fall/R_datasets/Cases") load("TRUCKING.Rdata") pairs(TRUCKING[, c("LNPRICE","DISTANCE", "WEIGHT", "ORIGIN", "DEREG")]) 3 / 22 Case Study 3 ST430 Introduction to Regression Analysis Data: 134 observations Y : natural logarithm of price; X1 : weight of product shipped (in 1,000 pounds); X2 : miles traveled (in hundreds); X3 : indicator variable: 1 for deregulation and 0 for regulation; X4 : indicator variable: 1 if originate in Miami and 0 if originate in Jacksonville. 4 / 22 Case Study 3 ST430 Introduction to Regression Analysis Stepwise regression (output in “output2.txt”): truck = list() truck$Y = TRUCKING$LNPRICE truck$X1 = TRUCKING$WEIGHT truck$X2 = TRUCKING$DISTANCE truck$X3 = TRUCKING$ORIGI truck$X4 = TRUCKING$DEREG truck$X5 = TRUCKING$PCTLOAD truck$X6 = TRUCKING$MARKET truck = as.data.frame(truck) start = lm(Y~1,data=truck) firstOrder = Y~X1 + X2 + X3 + X4 + X5 + X6 summary(step(start,scope=firstOrder)) 5 / 22 Case Study 3 ST430 Introduction to Regression Analysis The (first-order) stepwise regression identifies: X1 , ’WEIGHT’; X2 , ’DISTANCE’; X3 , the ’DEREG’ indicator; X4 , the ’ORIGIN’ indicator. Stepping down from the full first-order model, instead of stepping up from the empty model, finds the same variables: R code: summary(step(lm(firstOrder, truck), firstOrder)) 6 / 22 Case Study 3 ST430 Introduction to Regression Analysis The study continues with the full second order model (Model 1) in X1 , X2 , X3 and X4 : Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β6 X3 + β7 X4 + β8 X3 X4 + β9 X1 X3 + β10 X1 X4 + β11 X1 X3 X4 + β12 X2 X3 + β13 X2 X4 + β14 X2 X3 X4 + β15 X1 X2 X3 + β16 X1 X2 X4 + β17 X1 X2 X3 X4 + β18 X12 X3 + β19 X12 X4 + β20 X12 X3 X4 + β21 X22 X3 + β22 X22 X4 + β23 X22 X3 X4 . R code (output in “output3.txt”): lm1 <- lm(Y ~ (X1*X2 + truck) summary(lm1) 7 / 22 I(X1^2) + I(X2^2)) * X3 * X4, Case Study 3 ST430 Introduction to Regression Analysis Note that none of the 8 squared terms are significant; try the model without the quadratic terms (i.e., terms involving X12 or X22 ), denoted by Model 2: Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β6 X3 + β7 X4 + β8 X3 X4 + β9 X1 X3 + β10 X1 X4 + β11 X1 X3 X4 + β12 X2 X3 + β13 X2 X4 + β14 X2 X3 X4 + β15 X1 X2 X3 + β16 X1 X2 X4 + β17 X1 X2 X3 X4 . R code (output in “output4.txt”): lm2 <- lm(Y ~ X1*X2 * X3 * X4, truck) summary(lm2) anova(lm2, lm1) #compare nested models 8 / 22 Case Study 3 ST430 Introduction to Regression Analysis R 2 drops substantially, and F is highly significant, so the simpler Model 2 is rejected. Next try dropping, from the full second order model (Model 1), the interactions between quantitative and qualitative variables (Model 3): Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β6 X3 + β7 X4 + β8 X3 X4 . R code (output in “output5.txt”): lm3 = lm(Y ~ X1 + X2 + X1:X2+ data = truck) summary(lm3) anova(lm3, lm1) 9 / 22 I(X1^2) + I(X2^2) + X3 * X4, Case Study 3 ST430 Introduction to Regression Analysis Again F is significant, and the simpler Model 3 is rejected. Next try: drop the interactions of the qualitative variables with only the squared terms (Model 4): Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β6 X3 + β7 X4 + β8 X3 X4 + β9 X1 X3 + β10 X1 X4 + β11 X1 X3 X4 + β12 X2 X3 + β13 X2 X4 + β14 X2 X3 X4 + β15 X1 X2 X3 + β16 X1 X2 X4 + β17 X1 X2 X3 X4 . R code: lm4 = lm(Y ~ X1*X2*X3*X4 + I(X1^2) + I(X2^2), truck) summary(lm4) anova(lm4, lm1) 10 / 22 Case Study 3 ST430 Introduction to Regression Analysis Success! R 2 drops only a little, and Ra2 actually increases; also F is not significant. This simpler Model 4 is not rejected. Next, explore whether X4 can be dropped from Model 4 (Model 5): Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β6 X3 + β9 X1 X3 + β12 X2 X3 + β15 X1 X2 X3 . R code: lm5 = lm(Y ~ X1*X2*X3 + I(X1^2) + I(X2^2), truck) summary(lm5) anova(lm5, lm4) 11 / 22 Case Study 3 ST430 Introduction to Regression Analysis F is highly significant, so we reject the simpler Model 5. Next, explore whether X3 can be dropped (Model 6): Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β7 X4 + β10 X1 X4 + β13 X2 X4 + β16 X1 X2 X4 . R code: lm6 = lm(Y ~ X1*X2*X4 + I(X1^2) + I(X2^2), truck) summary(lm6) anova(lm6, lm4) 12 / 22 Case Study 3 ST430 Introduction to Regression Analysis Again, F is highly significant, so we reject the simpler model Model 6. Finally, explore whether X3 interacts with X4 by dropping their interaction terms (Model 7): Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 + β6 X3 + β7 X4 + β9 X1 X3 + β10 X1 X4 + β12 X2 X3 + β13 X2 X4 + β15 X1 X2 X3 + β16 X1 X2 X4 . lm7 = lm(Y ~ X1*X2*(X3 + X4) + I(X1^2) + I(X2^2), truck) summary(lm7) anova(lm7, lm4) 13 / 22 Case Study 3 ST430 Introduction to Regression Analysis This time, F is not significant, so the simpler Model 7, without the interactions, is not rejected. Model-building with step() Suppose we begin with the full second order model and simplify it using ’step()’ and BIC: R code: stepLm1 = step(lm1, direction = "both", k = log(nrow(truck))) summary(stepLm1) lmBIC = lm(Y~X1*X2*X3*X4 + I(X2^2),truck) summary(lmBIC) 14 / 22 Case Study 3 ST430 Introduction to Regression Analysis Note that the ’step’ function each times only drops one term, not useful for evaluating dropping multiple terms. Let’s drop the interaction of X3 and X4 (Model 8) manually and compare BIC: R code: lm8 = lm(Y~X1*X2*(X3 + X4) + I(X2^2),truck) extractAIC(lmBIC,k=log(nrow(truck))) extractAIC(lm8,k=log(nrow(truck))) 15 / 22 Case Study 3 ST430 Introduction to Regression Analysis Now we see that Model 8 is preferred and we continue using the ’step’ function but make sure that X2 and X22 are always included in the models: R code: lower = Y~X2 + I(X2^2) upper = Y~X1*X2*(X3 + X4) + I(X2^2) stepLm2 = step(lm8, scope = list(lower =lower, upper =upper), direction = "both",k=log(nrow(truck))) summary(stepLm2) lm9 = lm(Y~X1*X2*X3 + X1*X4 + I(X2^2),truck) summary(lm9) extractAIC(lm9,k = log(nrow(truck))) 16 / 22 Case Study 3 ST430 Introduction to Regression Analysis Now we end up with a model (Model 9) that can not be simplified further. Y =β0 + β1 X1 + β2 X2 + β3 X1 X2 + β5 X22 + β6 X3 + β7 X4 + β9 X1 X3 + β10 X1 X4 + β12 X2 X3 + β15 X1 X2 X3 . 17 / 22 Case Study 3 ST430 Introduction to Regression Analysis Which model to use? Model 7 or Model 9? Let’s look at their AIC/BIC. R code: #compare BIC extractAIC(lm9,k = log(nrow(TRUCKING))) extractAIC(lm7, k = log(nrow(TRUCKING))) #compare AIC extractAIC(lm9,k = 2) extractAIC(lm7, k = 2) 18 / 22 Case Study 3 ST430 Introduction to Regression Analysis Model 9 has smaller AIC and BIC and is thus preferred. Estimated Model 9: Y =12.131 + 0.002X1 − 0.578X2 + 0.674X3 − 0.666X4 + 0.085X22 − 0.012X1 X2 − 0.026X1 X3 − 0.273X2 X3 − 0.031X1 X4 + 0.013X1 X2 X3 . 19 / 22 Case Study 3 ST430 Introduction to Regression Analysis Effect of deregulation If X3 = 0: Y =12.131 + 0.002X1 − 0.578X2 − 0.666X4 + 0.085X22 − 0.012X1 X2 − 0.031X1 X4 . If X3 = 1: Y =(12.131 + 0.674) + (0.002 − 0.026)X1 − (0.578 + 0.273)X2 − 0.666X4 + 0.085X22 − 0.031X1 X4 + (0.013 − 0.012)X1 X2 . 20 / 22 Case Study 3 ST430 Introduction to Regression Analysis Effect of deregulation: E (Y |X1 , X2 , X4 , X3 = 1) − E (Y |X1 , X2 , X4 , X2 = 0) = 0.674 − 0.026X1 − 0.273X2 − 0.012X1 X2 . X4 does not appear in the abover formula: X3 and X4 does not interect. If we plug in the observed values of (X1 , X3 ) in the data, we get 95% negative values. The positive values are obtained when X1 and X2 are both small. 21 / 22 Case Study 3 ST430 Introduction to Regression Analysis 10 Effect of deregulation 0 2 4 X2 6 8 positive effect negative effect 0 5 10 15 20 X1 22 / 22 Case Study 3
© Copyright 2025 Paperzz