Social Interactions under Incomplete Information: Games, Equilibria, and Expectations Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Chao Yang, B.A., M.A. Graduate Program in Economics The Ohio State University 2015 Dissertation Committee: Lung-fei Lee, Advisor Jason Blevins Stephen Cosslett Lixin Ye c Copyright by Chao Yang 2015 Abstract My dissertation research investigates interactions of agents’ behaviors through social networks when some information is not shared publicly, focusing on solutions to a series of challenging problems in empirical research, including heterogeneous expectations and multiple equilibria. The first chapter,“Social Interactions under Incomplete Information with Heterogeneous Expectations”, extends the current literature in social interactions by devising econometric models and estimation tools with private information in not only the idiosyncratic shocks but also some exogenous covariates. For example, when analyzing peer effects in class performances, it was previously assumed that all control variables, including individual IQ and SAT scores, are known to the whole class, which is unrealistic. This chapter allows such exogenous variables to be private information and models agents’ behaviors as outcomes of a Bayesian Nash Equilibrium in an incomplete information game. The distribution of equilibrium outcomes can be described by the equilibrium conditional expectations, which is unique when the parameters are within a reasonable range according to the contraction mapping theorem in function spaces. The equilibrium conditional expectations are heterogeneous in both exogenous characteristics and the private information, which makes estimation in this model more demanding than in previous ones. This problem is solved in a computationally efficient way by combining the quadrature method and the nested fixed point maximum likelihood estimation. In Monte Carlo experiments, if some exogenous ii characteristics are private information and the model is estimated under the mis-specified hypothesis that they are known to the public, estimates will be biased. Applying this model to municipal public spending in North Carolina, significant negative correlations between contiguous municipalities are found, showing free-riding effects. The Second chapter,“A Tobit Model with Social Interactions under Incomplete Information”, is an application of the first chapter to censored outcomes, corresponding to the situation when agents’ behaviors are subjected to some binding restrictions. In an interesting empirical analysis for property tax rates set by North Carolina municipal governments, it is found that there is a significant positive correlation among near-by municipalities. Additionally, some private information about its own residents is used by a municipal government to predict others’ tax rates, which enriches current empirical work about tax competition. The third chapter, “Social Interactions under Incomplete Information with Multiple Equilibria”, extends the first chapter by investigating effective estimation methods when the condition for a unique equilibrium may not be satisfied. With multiple equilibria, the previous model is incomplete due to the unobservable equilibrium selection. Neither conventional likelihoods nor moment conditions can be used to estimate parameters without further specifications. Although there are some solutions to this issue in the current literature, they are based on strong assumptions such as agents with the same observable characteristics play the same strategy. This paper relaxes those assumptions and extends the all-solution method used to estimate discrete choice games to a setting with both discrete and continuous choices, bounded and unbounded outcomes, and a general form of incomplete information, where the existence of a pure strategy equilibrium has been an open question for a long time. By the use of differential topology and functional analysis, it is found that when all exogenous characteristics are public information, there are a finite iii number of equilibria. With privately known exogenous characteristics, the equilbria can be represented by a compact set in a Banach space and be approximated by a finite set. As a result, a finite-state probability mass function can be used to specify a probability measure for equilibrium selection, which completes the model. From Monte Carlo experiments about two types of binary choice models, it is found that assuming equilibrium uniqueness can bring in estimation biases when the true value of interaction intensity is large and there are multiple equilibria in the data generating process. iv This is dedicated to my beloved Father, Xianmao Yang, and Mother, Fuyun Liao. v Acknowledgments I owe my great thanks to my advisor, Lung-fei Lee, for his guidance in my economic research and career development. Professor Lee is an experienced economist with deep insights in econometric theories. His expertise greatly helped me formulate life-intriguing ideas into concrete theoretical research questions and apply mathematical tools in rigorous analysis. Moreover, I was greatly impressed by his great passion in developing practical econometric tools and pursuing for rigor and perfection, which led to me be a responsible and active researcher. I would like to thank Jason Blevins. Professor Blevins introduced me a large string of recent literature about Microeconometrics, which helped me advance my research and catch up with the frontier. I benefited a lot from him in computation skills, too. I would also like to thank Stephen Cosslett. I have learned quite a lot from his lectures, which covered a broad range of econometric theories. Additionally, when serving as the grader for Professor Cosslett, from his well-designed problem sets and exams, I got knowledge on both econometric theory and teaching skills. I am also indebted to Lixin Ye for his guidance and encouragement. As my work combines game theory and econometric theories, it is important to build game-based models from real life problems. Professor Ye gave me many constructive suggestions, which aided me to build models with consolidated microeconomic foundations and practical applications. vi I appreciate Robert De Jong, Javier Donna, Daeho Kim, Maryam Saeedi, and Bruce Weinberg. Their precious comments and encouragements led me to pursuing research in Econometrics and Microeconometrics. I also appreciate Yaron Azrieli, David Blau, Lucia Dunn, Bill Dupor, Paul J. Healy, John Kagel, Aubhik Khan, Pok-sang Lam, Dan Levin, Hu McCulloch, James Peck, Julia Thomas, and Huanxing Yang. Their courses provided me with rigorous training in economic science and their comments were beneficial for me to make progress. I send my appreciation to The Ohio Supercomputer Center, for the support in allocation of computing time, which facilitated the investigation of the performances of my new estimation methods. I would like to appreciate Hajime Miyazaki. His great devotion to administrative work ensures graduate students to concentrate their time and energy on research and teaching. In particular, his encouragement and guidance on speaking English, teaching skills, and career development greatly helped me to make a firm decision to become a productive and competent economist. I am grateful to all the stuff in the Department of Economics, for their efficient work and warm-hearted help. I also owe thanks to my friends and classmates. As an international student, without their help, I could not have had such a pleasant time during my graduate studies. Last but not the least, I would like to thank my beloved mother, for her strong support, deep concern, and continuing encouragement. vii Vita 1982 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Born - Wuhan, China 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Economics and Mathematics, Wuhan University 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Economics, The Ohio State University 2009-present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Associate, The Ohio State University Publications Research Publications Chao Yang, Lian-sheng Wu, and Xiaohui Bo “Career Concern and Tax Preparer Fraud”. Annuals of Economics and Finance, 11(2):355–379, 2010. Fields of Study Major Field: Economics viii Table of Contents Page Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. Social Interactions under Incomplete Information with Heterogeneous Expectations 8 2.1 2.2 2.3 2.4 2.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 A Model Framework . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Continuous and Discrete Choices and Information Structures 2.2.3 Game Theoretical Explanation . . . . . . . . . . . . . . . . . Equilibrium Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Equilibrium and Expectations . . . . . . . . . . . . . . . . . 2.3.2 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . Equilibrium Solution and Computation . . . . . . . . . . . . . . . . 2.4.1 All Characteristics are Publicly Known . . . . . . . . . . . . 2.4.2 Characteristics are Self-Known . . . . . . . . . . . . . . . . . 2.4.3 Socially-Known Characteristics . . . . . . . . . . . . . . . . . Identification, Likelihood, and Estimation . . . . . . . . . . . . . . . 2.5.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 12 12 15 16 20 20 22 25 25 26 31 35 35 . . . . 42 44 48 51 A Tobit Model with Social Interactions under Incomplete Information . . . . . 53 3.1 3.2 . . . . . . . . . . . . . . 53 57 57 58 60 60 62 64 70 74 75 76 79 83 Social Interactions under Incomplete Information with Multiple Equilibria . . . 85 2.6 2.7 2.8 3. 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4 4.5 4.6 2.5.2 Estimation . . . . Monte Carlo Experiments An Empirical Application Conclusion and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . The Model . . . . . . . . . . . . . . . 3.2.1 The Model Framework . . . . 3.2.2 Game Theoretical Foundation Equilibrium Analysis . . . . . . . . . 3.3.1 Equilibrium and Expectations 3.3.2 Unique Equilibrium . . . . . . 3.3.3 Equilibrium Computation . . . Identification . . . . . . . . . . . . . . Estimation . . . . . . . . . . . . . . . Extensions . . . . . . . . . . . . . . . Monte Carlo Experiments . . . . . . . Empirical Application . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . Models . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 A Model Framework . . . . . . . . . . . . 4.2.2 Examples . . . . . . . . . . . . . . . . . . Game, Equilibrium, and Expectations . . . . . . 4.3.1 Game Explanations . . . . . . . . . . . . 4.3.2 Equilibrium and Expectations . . . . . . Estimation with Publicly Known Characteristics 4.4.1 Characterization of the Equilibrium Set . 4.4.2 Selection Rule and Complete Likelihood . 4.4.3 Identification . . . . . . . . . . . . . . . . 4.4.4 Computation and Estimation . . . . . . . Estimation with Self-Known Characteristics . . . 4.5.1 Discrete Private Characteristics . . . . . 4.5.2 Continuous Private Characteristics . . . . Discussions and Extensions . . . . . . . . . . . . 4.6.1 Group Unobservables . . . . . . . . . . . 4.6.2 Peer Effects . . . . . . . . . . . . . . . . . x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 89 89 91 93 93 94 100 101 105 106 109 113 115 115 126 126 128 . . . . . 131 132 133 136 142 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Appendices 153 A. Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 A.1 A.2 A.3 A.4 A.5 . . . . . 153 155 158 160 163 Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 B.1 Equilibrium Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Identification Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 185 Appendix to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 C.1 C.2 C.3 C.4 C.5 C.6 196 199 205 208 218 220 4.7 4.8 5. B. C. 4.6.3 Deterministic Rule . . . . . . Binary Choice Models: Analysis and 4.7.1 Binary Choice Model I . . . 4.7.2 Binary Choice Model II . . . Conclusion . . . . . . . . . . . . . . Proofs . . . . . . . . . . . . . . . . . Numerical Methods . . . . . . . . . Identification . . . . . . . . . . . . . Unobserved Group Random Effects Identification: Bayesian Analysis . . . . . . . . . . Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expectations, Equilibria, and Functions . . . . . . . . . . . . . . . . . Proofs for Equilibrium Characterizations with Public Characteristics . Proofs for Identification with Public Characteristics . . . . . . . . . . Equilibrium with Privately Known Characteristics . . . . . . . . . . . Equilibrium for Peer Effects . . . . . . . . . . . . . . . . . . . . . . . . Proofs for Equilibrium Set Characterization in Binary Choice Models xi . . . . . . . . . . . . . . . . . . . . . . List of Tables Table Page A.1 Linear Model with Discrete Characteristics for Independent Groups . . . . 166 A.2 Binary Choice with Discrete Characteristics for Independent Groups . . . . 167 A.3 Linear Model with Continuous Characteristics for Independent Groups . . . 168 A.4 Binary Choice with Continuous Characteristics for Independent Groups . . 169 A.5 Linear Model with Discrete Characteristics and Constant Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 A.6 Linear Model with Discrete Characteristics and Random Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 A.7 Binary Choice with Discrete Characteristics and Constant Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 A.8 Binary Choice with Discrete Characteristics and Random Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 A.9 Linear Model with Continuous Characteristics and Constant Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.10 Linear Model with Continuous Characteristics and Random Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 A.11 Binary Choice with Continuous Characteristics and Constant Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 A.12 Binary Choice with Continuous Characteristics and Random Friend Number for a Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 xii A.13 Linear Model with Discrete Characteristics and Unobserved Group Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 A.14 Sample Statistics: Municipalities in North Carolina . . . . . . . . . . . . . . 179 A.15 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 B.1 Tobit Model with Discrete Characteristics for Independent Groups . . . . . 187 B.2 Tobit Model with Continuous Characteristics for Independent Groups . . . 188 B.3 Tobit Model with Discrete Characteristics and Constant Friend Number for A Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 B.4 Tobit Model with Discrete Characteristics and Random Friend Number for A Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 B.5 Tobit Model with Continuous Characteristics and Constant Friend Number for A Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 B.6 Tobit Model with Continuous Characteristics and Random Friend Number for A Single Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 B.7 Tobit Model with Discrete Characteristics and Unobserved Group Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 B.8 Sample Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 B.9 Tobit Model for Property Tax Competition . . . . . . . . . . . . . . . . . . 195 C.1 Binary Choice I: Estimation Comparison . . . . . . . . . . . . . . . . . . . . 224 C.2 Binary Choice II: Estimation Comparison for Moderate Interactions . . . . 225 C.3 Binary Choice II: Estimation Comparison for Large Interactions . . . . . . 226 xiii List of Figures Figure Page A.1 Identification for Continuous Choices . . . . . . . . . . . . . . . . . . . . . . 181 A.2 Identification for Binary Choices . . . . . . . . . . . . . . . . . . . . . . . . 182 C.1 The Haar Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 C.2 Equilibrium Illustration for Binary Choice I with Influences from Peers A . 228 C.3 Equilibrium Illustration for Binary Choice I with Influences from Peers B . 229 C.4 Equilibrium Illustration for Binary Choice I with Influences from Peers C . 230 C.5 Equilibrium Illustration for Binary Choice I with Influences from Peers D . 231 C.6 Equilibrium Illustration for Binary Choice I with General Social Relations . 232 C.7 Equilibrium Outcomes for Binary Choice I with General Social Relations . 233 C.8 Equilibrium Illustration for Binary Choice II with Influences from Peers A . 234 C.9 Equilibrium Illustration for Binary Choice II with Influences from Peers B . 235 C.10 Equilibrium Illustration for Binary Choice II with Influences from Peers C . 236 C.11 Equilibrium Illustration for Binary Choice II with Influences from Peers D . 237 C.12 Equilibrium Illustration for Binary Choice II with General Social Relations 238 C.13 Equilibrium Outcomes for Binary Choice II with General Social Relations . 239 C.14 Homotopic Mappings on Sphere for Binary Choices . . . . . . . . . . . . . . 240 xiv Chapter 1: Introduction With the rapid development of social media, social networks have been influencing more and more aspects of human life. There is also a boom in real life cases and data sets, which helps to investigate how social networks influence people’s behaviors. However, endeavors in empirical research face some challenges due to the difference between the social interaction models and the traditional models about agents’ behaviors. Unlike the previous analysis where agents are making independent decisions or their direct interdependence is negligible in a large market, when analyzing people’s behaviors in a social group, the interactions between two closely related agents should be taken into account. This type of interactions implies coordination or conflicts of individual behaviors, which in turn relates estimation of social interactions to estimation of games. Then several issues arise, such as the choice of equilibrium concepts, the relationship between the expectations and/or outcomes of two different individuals, and equilibrium multiplicity. The goal of my thesis is to build models on the foundation of game theory and find effective estimation methods for individual heterogeneity and multiple equilibria. The thesis focuses on the setting where agents make decisions simultaneously and some individual characteristics and idiosyncratic shocks are not shared as public information in a social group. Agents’ behaviors in this setting can be viewed as the equilibrium outcomes of a simultaneous move game with incomplete information. This is a natural setting for 1 course enrollment in a class and market entry of a number of firms in the same industry. Modeling and estimation in this setting has been analyzed by Manski(1993), Brock and Durlauf(2001), and Lee, Li and Lin(2014). However, in their models it is assumed that all exogenous characteristics that can be observed from the data by the econometricians are public information to agents in a social group. Only the variables unobserved by the econometricians are private information. Under this assumption, when investigating students’ interactions in class performances, exogenous characteristics used by the econometricians, such as individual IQ’s and GPA, are viewed as public information known to every student in a class. Obviously, this assumption is unrealistic. This assumption is relaxed in this thesis such that there is private information in not only the variables unobservable from the data sets but also exogenous characteristics that can be observed by the econometricians. The framework here is very general. It includes many models which are frequently used in empirical studies, such as the linear model for continuous choices, the model for binary choices, and the tobit model for truncated outcomes. Although this new setting about the information structure is more realistic, it makes estimation and computation more challenging than that for the previous models. With incomplete information, a Bayesian Nash Equilibrium (BNE) is equivalent to the equilibrium conditional expectations on agents’ behaviors. When all exogenous characteristics are public information and independent of the unobservables, every agent has the same expectations as the econometricians do. It is then safe to assume that individuals have rational expectations. When agents are symmetric, as it is shown by Manski(1993) and Brock and Durlauf(2001), an equilibrium can be represented by a scalar satisfying an equation. Lee, Li and Lin(2014) take into account heterogeneity in individual characteristics and allow the expected outcomes of two different agents to differ. Then in their model, an equilibrium 2 can be represented by a vector, which solves a system of nonlinear equations. However, in this thesis, with asymmetric private information about exogenous characteristics, there is another type of heterogeneity. That is, two agents may form different expectations on the behaviors of the same third agent. As the value of expectations vary with the information used to make predictions, in this general framework, an equilibrium can be represented by a vector-valued function, satisfying a system of function equations. Identification and estimation depend on analyzing these function equations, where the difficulty lies. To identify and estimate model primitives requires characterize the distribution of observed outcomes, which entails an investigation of the equilibria. Two fundamental issues about the equilibria of a game are existence and uniqueness. If it can be guaranteed that there is a unique equilibrium, as the equilibrium can be computed from the model primitives, the distribution of observed outcomes can be captured without further assumptions. However, if equilibrium multiplicity is possible, as the real equilibrium that is played is not observed, there is not a complete distribution of the observed outcomes without additional assumptions. This thesis first searches for a reasonable assumption ensuring the existence of a unique equilibrium and investigate effective methods to compute it. Then this thesis extends the methods for game estimation with multiple equilibria by Bajari et al(2010b) and (2010c) to a more general context and delves into efficient computation and estimation. The investigation of the equilibrium set relies on mathematical theories about function spaces. An expedient way to find sufficient conditions for equilibrium uniqueness is to use the contraction mapping in Banach spaces. It is found that there is a unique equilibrium when the intensity of social interactions is bounded in a range, which depends on the behaviors agents make and the network structures. For large interaction effects where this 3 restriction is not necessarily satisfied, this thesis applies the intersection theory in differential topology and the Schauder fixed point theorem to characterize the set of equilibria. It is found that when all exogenous characteristics are public information, there are a finite number of equilibria under some general regularity conditions. When some exogenous characteristics are private information, the set of equilibrium is compact in a Banach space and can be approximated by a finite number of equilibria. Therefore, it is possible to specify a parametric random equilibrium selection rule and use a finite probability mass function for the distribution of equilibrium selection to complete the model. This extends the estimation method by Bajari et al(2010b) and (2010c). In their setting, there are a finite number of players, each of them has a finite number of choices. It is well established that there are a finite number of Nash equilibria. Nonetheless, the existence of a pure strategy equilibrium when the set of actions is not finite has been an open question for a long period of time. Following the seminal paper by Milgrom and Weber(1985) and Radner and Rosenthal(1982), Khan and Sun(1995) and Kan and Zhang(2014) derive existence theorems when the set of actions is compact. Although their conditions can be used for a general setting about the information structure, it is not reasonable to restrict the values of outcomes to be bounded for many empirical studies. The equilibrium characterization in this thesis provides with complements to that large literature, as the analyses of a specific reduced-form BNE for unbounded outcomes. Computation is the challenges this all-solution method has to face. Computation (or approximation when some exogenous characteristics are private information) of the equilibrium set can be converted to deriving all the solutions to a system of nonlinear equations. This ambitious goal, nonetheless, is achievable by using the homotopy continuation method 4 proposed by Garcia and Zangwill(1981), given some generic regularity conditions are satisfied. Through detailed discussions and Monte Carlo experiments about two types of Binary Choice models, it is found in this thesis that this computationally intensive estimation method is rewarded by good performances in terms of estimate biases and log likelihoods when there are multiple equilibria. In the recent literature of social interaction model estimation, there are some estimation methods which does not require solving all the equilibria and can derive consistent estimates with some special model structures or additional assumptions. The basis of these methods is the two-step algorithm proposed by Bajari et al.(2010a) for a game with discrete choice and incomplete information on individual unobservables. They first derive consistent nonparametric estimates of individual choice probabilities and then use them to estimate structural parameters. If there are a large number of repetitions of the same game, under the assumption that the same equilibrium is played for all the repetitions, the individual choice probabilities can be estimated consistently in the first step. However, unlike the empirical work in industrial organization for which there are usually a long time series for a small group of agents in the data sets, for empirical work in social interactions, there are often cross-section data sets with a large number of heterogeneous agents. Then this two-step algorithm cannot be directly applied without special model structures and further assumptions. Bisin et al.(2011) consider the case that individual outcomes are influenced by a global equilibrium aggregate. Then it is possible to first estimate the equilibrium aggregate and then recover other parameters. This algorithm can also be used if equilibria are refined. Leung(2013) focuses on one particular type of equilibria, where individuals with the same observable characteristics play the same strategy. However, this two-step algorithm would be invalid with the presence of unobserved group heterogeneity which cannot be fully 5 explained by observed characteristics. As a result, although the all-solution method in this thesis is more computationally intensive, it can ensure consistent and efficient estimates in a general model framework without imposing stringent assumptions on the data set or the data generating process. In addition to the discussions about the equilibrium solution and estimation in a general framework, there are detailed analyses of the identification, estimation, and empirical studies for three frequently used models that are incorporated in this framework: the linear model for continuous choices, the model for binary choices, and the tobit model for truncated outcomes. As for the binary choices, two types of the model are analyzed as their equilibrium sets have different characteristics. The entry problem is a good example for the type I binary choice models, where firms choose whether or not to enter a market and the firms who do not enter have no effects on others’ profits ex post. The second type model for binary choices, like that in Brock and Durlauf(2001), allows interactions from agents choosing both actions, for individual utilities depend on coordination or conflicts of actions. The tobit model with social interactions corresponds to the case that the actions the socially associated agents can take are subject to some bounds. As censored and /or truncated outcomes are frequently observed in the data sets, the tobit model has gained more and more attention both empirically and theoretically. Recent theoretical research includes the papers by Kumar(2012) and Abreyava and Shen (2014). This thesis contributes to this literature by incorporating interactive influences between individuals into the model. Estimation methods are applied to analyze the interactions among municipalities in North Carolina for policy-making. It is found that when neighboring cities increase their public safety spending, the spending on public safety of a city will be reduced, indicating 6 a “free-riding” effects. When the property tax rates of near-by cities are lowered, however, the property tax rate of a city will be reduced, showing competing effects. This thesis proceeds as follows. Chapter 2 investigates computation and estimation of social interaction models with a general form of incomplete information when there is a unique equilibrium. It discusses about the linear model for continuous choices and the binary choice model in detail. Chapter 3 applies the theoretical results in Chapter 2 to the tobit model for truncated outcomes. Chapter 4 extends the previous analysis to investigate estimation with multiple equilibria. Chapter 5 concludes and discusses about future research. 7 Chapter 2: Social Interactions under Incomplete Information with Heterogeneous Expectations 2.1 Introduction It is natural to believe that an agent may not know the features of all other members in her social group. For example, a college student knows the SAT scores of her close friends in a class but not those of others. Because of different knowledge, even two students with the same characteristics may have different predictions on academic performance of a third student. Hence, heterogeneity in private information may influence an individual’s actions through social interactions. However, this type of heterogeneity has not been analyzed in the existing literature. In models like Manski (1993), Brock and Durlauf (2001), and Lee, Li, and Lin (2014), under the assumption of “rational expectations”, two agents form the same expectation on a third individual. Although this assumption simplifies equilibrium solution, it is not suitable when there is asymmetric private information and decisions are not made repeatedly. In this paper, we related social interactions with private information to a simultaneous move game under incomplete information and adopt the solution concept of Bayesian Nash Equilibrium (BNE). In our behavioral model, an agent’s action depends on her expectations on the behaviors of her linked agents, based on her information about group features, social relationships, 8 and some specific personal characteristics. We can view this model as a simultaneous move game under incomplete information. A strategy of an agent is a correspondence which maps her characteristics and private information to an action. We show a correspondence between a BNE and the equilibrium conditional expectations on group members’ behaviors. The equilibrium expectation function shows two types of heterogeneity. On one hand, the expectation on behaviors of two different agents are generally different, for they have different traits. On the other, if the private information of two agents differ, their expectations on a third person may not be the same. As the equilibrium conditional expectation is a vector-valued function of privately known individual characteristics, we embed conditional expectations in a function space and use functional contraction mapping to derive a sufficient condition for the existence of a unique equilibrium. It is shown that for some frequently used models, such as linear and binary choice models, after row-normalizing the social weighting matrix, this condition reduces to the intensity of social interaction not being too large. Except for some special cases where a closed-form solution of equilibrium conditional expectations exists, in general, equilibrium must be solved numerically. We illustrate the solution process when privately known characteristics are discretely distributed with a finite support. When they are continuously distributed, since expectations are integrals, we approximate them by Gaussian quadratures. We can first solve the values of expectation functions at fixed quadrature abscissae by contraction mapping and then approximate the expectation functions. The solution of equilibrium conditional expectations is nested in calculating sample log likelihood functions for estimation, the “nested fixed point” (NFXP) ML estimation originated in Rust (1987) for the estimation of dynamic discrete choice models. The NFXP ML estimation performs well for both the linear and binary choice models. 9 In Monte Carlo experiments, we also compare estimation under various information structures. It is found that estimation under a true information structure generally results in a higher maximized sample log likelihood than that under misspecified information structures. Hence, maximized log likelihood may provide a criterion of model selection. We also extend the model by incorporating unobserved group effects. Applying the model to correlation of public safety expenditure for neighboring municipalities in North Carolina, we find significantly negative spatial correlation under various regression settings, indicating possible “free-riding” effects. Including linear models and binary choice models as special cases, the model is related to the “linear-in-means” model of social interactions by Manski (1993) and Brock and Durlauf (2001). In those models, an individual makes predictions on other agents’ outcomes. After imposing symmetry and rational expectations, expectation reduces to a scalar. Relaxing symmetry in exogenous characteristics, Lee, Li, and Lin (2014) allow expectation on one agent’s behavior to be different from those on others. As personal characteristics are common knowledge for all members in a group, by imposing rational expectations, any two agents will have the same expectation on the action of a third agent. So those two types of models are special cases of ours. Our paper also connects the literature of estimation of games under incomplete information, such as Bajari et al. (2010a) and Aradillas (2010). In Bajari et al. (2010a), a finite number of agents choose one action from a finite action set simultaneously. Private information resides in the idiosyncratic shocks. All personal characteristics are public information. Identification and estimation are based on the one-to-one correspondence between choice probabilities and expected payoffs. Incomplete information about both exogenous characteristics and idiosyncratic shocks is discussed in a two-player binary choice game in 10 Aradillas (2010). The author focused on one type of equilibrium where expectations of both players are based on a commonly observed public signal. As a result, expectations do not vary with asymmetric private information in that paper. Therefore, our model is an extension to the above two papers in the form of incomplete information. Methodologies in game estimation have been used to investigate social interactions recently. In Bisin et al. (2011), individual behaviors are socially interacted in the context that all exogenous characteristics are public information for group members and idiosyncratic shocks are privately known. In Leung (2013), endogenous formation of social networks is formulated as a game where agents choose social association simultaneously. In that model, whether an agent connects with another depends on individual features, pair-specific match value, and an idiosyncratic shock. Only the idiosyncratic shock is privately known. Discussions are focused on the case when individuals of identical attributes take the same strategies ex ante. Then Bayesian equilibrium reduces to a single function of exogenous characteristics. In both papers, a two-stage method similar to that in Bajari et al. (?) is used for estimation.1 So our model is more general in terms of incomplete information. The paper proceeds as follows. In Section 2.2, we introduce the model framework, as well as the relationship between the model and a simultaneous move game with incomplete information. In Section 2.3, we first show the one-to-one correspondence between a BNE and a consistent conditional expectation function and then discuss equilibrium existence and uniqueness. A detailed discussion on computation of equilibrium is in Section 2.4. We investigate identification and estimation in Section 2.5. Monte Carlo experiments are conducted and results are presented in Section 2.6. After an empicial analysis in Section 2.7, we conclude in Section 2.8. Proofs of equilibrium analysis and numerical methods 1 In Bisin et al. (2011), an alternative estimation method is to solve all equilibria for each parameter value and choose the one to maximize log likelihood. 11 for equilibrium computation are put in Appendix A.1 and A.2, respectively. In Appendix A.3, we discuss in detail sufficient conditions for identification. An extension to unobserved group random effects and the Bayesian analysis of identification are elaborated, respectively, in Appendix A.4 and A.5. 2.2 2.2.1 Models A Model Framework Consider a group of socially related agents of size n.2 The n × n matrix, W , is used to represent their (exogenous) social relations. For example, W represents friendship. For any i 6= j, Wi,j = 1 if an individual i is associated with the individual j; and Wi,j = 0 otherwise. For any i, Wi,i = 0, as is usual in a network. More generally, Wi,j may depend on geographic, economic, or social distances between i and j. We require that elements of W are all non-negative, Wi,j ≥ 0 for any i, j. So Wi,j can represent not only whether i and j are associated but also the strength of their relation. Note that W may be asymmetric or symmetric. To incorporate possible discrete choices in addition to continuous outcomes, the model is presented with a latent variable and its relationship with observed outcomes. The observed outcome, or agent’s behavior, yi , depends on the latent variable, yi∗ , as yi = hi (yi∗ ), (2.2.1) where hi (·) : < → < is a real-valued function, which can be linear or nonlinear. The latent variable for i is interacted with her expectations about her linked agents’ outcomes. yi∗ = u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i . (2.2.2) j6=i 2 Theoretically, we can analyze social interactions in a single group. In empirical application, however, we may have independent groups. In the presence of many groups for estimation, we will use subscript, g, as each group’s index. 12 u(Xi ) represents the direct effect of exogenous covariates. Xi = (X g , Xic , Xip ) is composed of three types of exogenous characteristics, the group features, X g ; some commonly known individual characteristics, Xic ; as well as privately known information, Xip . i is the idiosyncratic shock. i ’s are i.i.d. and independent of all the exogenous characteristics and social relations. Its distribution is characterized by a pdf function, f (·), with its CDF, F (·). Assume that i is known by individual i herself, but not by other agents or econometricians. Innovations of our model reside in the middle part on the right hand side of (4.2.2). It shows two features of social interactions. First, yi∗ is related to the behavior of agent j only if i is associated with j; i.e., Wi,j 6= 0. Second, j influences i through i’s expectation on the true outcome of j’s behavior. This is intuitively appealing, because agents make decisions simultaneously before actual outcomes are realized. In this model, expectations are based on public information on social relations in W , group features, X g , commonly known individual characteristics, Xjc ’s, and private information about exogenous characteristics, Xjp ’s.3 The information structure can be fully described by specifying for every agent the subset of agents whose Xjp ’s are known to her. Therefore, for each i, we define an n × 1 vector, Ji , such that Ji (j) = 1 if i knows Xjp and Ji (j) = 0 otherwise, for each 1 ≤ j ≤ n. Hence, 0 0 0 information structure in a group can be represented by an n2 × 1 vector, J = (J1 , · · · , Jn ) . For every i, we define by XJpi , the vector formed by the random variables, Xjp ’s, which are 0 known to i, XJpi = (Xjp : Ji (j) = 1)0 . Suppose that Xjp is of dimension kp . Then the P dimension of XJpi is Mi = ( nj=1 Ji (j))kp . For instance, if 1 only knows X1p , X2p and X3p , 0 0 0 0 J1 (j) = 1 for j = 1, 2, 3 and J1 (j) = 0 for j > 3, and XJp1 = (X1p , X2p , X3p ) . We assume that such an information structure J is publicly known. However, the realization of those random variables are private information. For example, it is publicly known that agent 1 0 3 Here, W represents the vector formed by all its elements, (W1,1 , · · · , W1,n , · · · , Wn,1 , · · · , Wn,n ) , which shows all social relations in the group. 13 knows her own features and those of agents 2 and 3. But the realizations of X1p , X2p and X3p are unknown to agent 4. To simplify notation, we summarize all the publicly known variables in one vector, 0 0 0 0 0 Z = (X g , X1c , · · · , Xnc , W1,1 , · · · , W1,n , · · · , Wn,1 , · · · , Wn,n , J1 , · · · , Jn ) . (2.2.3) In sum, i’s expectations about other agents’ behaviors are based on realizations of two random vectors, one about private information, XJpi , and the other about public information, Z. Two agents have the same private information if Ji = Jj , so that XJpi = XJpj . Although it is natural to assume that an agent’s prediction may also be based on the realization of her idiosyncratic shocks, it can be excluded from her private information. Due to independence across idiosyncratic shocks, {i }’s, and also their independence with exogenous characteristics, {Xi }’s, an agent’s conditional expectations will not be changed if her i is also used to make prediction.4 Therefore, the information framework that all exogenous characteristics are public information and idiosyncratic shocks are private information, which is frequently used in estimation of incomplete information games (such as Bajari et al. (2010a)), is included in our model as a special case. Although the presence of unobserved group effects may complicate identification and estimation, because they are observed by group members, they do not influence theoretical analysis of social interactions. So we focus on a model without unobserved group effects first and include them in the model as an extension in Appendix A.4. We allow XJpi to vary across agents. As a result, there are two types of heterogeneity 0 in E[yj,g |XJpi , Z]’s. Since agent j and j may be different in personal characteristics, i.e., Xj 6= Xj 0 , their expected outcomes may differ in i’s predictions. That is, in general, R For any k 6= i, since k is independent of i and all X, E[yi |XJpk , Z, k ] = E[ hi (u(Xi ) + R P P λ j6=i Wi,j E[yj |XJpi , Z, i ] − i )|XJpk , Z, k ] = E[ hi (u(Xi ) + λ j6=i Wi,j E[yj |XJpi , Z, i ] − i )|XJpk , Z]. 4 14 0 0 E[yj |XJpi , Z] 6= E[yj 0 |XJpi , Z], for i 6= j, i 6= j , and j 6= j . On the other, because two agents may have different private information on an individual, their expectations on the behavior of a same person might differ; that is, E[yj |XJpi , Z] 6= E[yj |XJpk , Z] when Ji 6= Jk in general. Those two types of heterogeneity distinguishes the current model from previous ones in Manski (1993), Brock and Durlauf (2001), and Lee, Li, and Lin (2014). In Manski (1993), expectations are taken on the average behavior of the whole group based on commonly known group features. In the discrete choice model by Brock and Durlauf (2001), every agent has the same expectation of any other agents in her group. Lee, Li, and Lin (2014) introduce asymmetry in personal characteristics and allow expectation on choices of different agents to differ. However, with rational expectations, any two agents have the same expectation on another agent. Therefore, our model extends existing research in social interactions with incomplete information by allowing two types of heterogeneity in conditional expectations. The intensity of social interaction effects is captured by the parameter, λ, in (4.2.2). If λ > 0, the social interaction effect is positive. If λ < 0, outcomes are negatively related. The case of λ = 0 represents absence of social interactions. 2.2.2 Continuous and Discrete Choices and Information Structures The model, (2.2.1) and (2.2.2), provides with a framework broad enough to incorporate many interesting cases, in particular, a linear model with continuous choices and a discrete choice model with social interactions. 1. (Continuous Choices) hi (z) = h(z) = z for any z ∈ <, and yi = yi∗ ; i.e., the latent variable, yi∗ , can be perfectly observed. 15 2. (Binary Choice) hi (·) = h(·), where h(·) can take only two values, 0 and 1. That is, yi = I(yi∗ > 0). The observed outcome is a binary variable, whose value depends on sign of the latent variable. Specifications on XJpi ’s reflect information structures. Several examples follow. 1. (Publicly Known Characteristics) All exogenous group and individual characteristics, social relations, and the information structure are publicly known. In that case, XJpi 0 0 0 include all Xjp ’s, i.e., for all i, XJpi = (X1p , · · · , Xnp ) . 2. (Self-Known Characteristics) Xip is observed by i but not by any other group members. That is, for each i, XJpi = Xip . 3. (Socially-Known Characteristics) i knows Xjp if i is associated with j; i.e., Wi,j 6= 0. 0 0 In this case, for each i, XJpi = (Xjp : j = i or Wi,j 6= 0) . They will be discussed in detail in subsequent sections. 2.2.3 Game Theoretical Explanation In the model, (2.2.1) and (2.2.2), behaviors of agents in a group are interacted when they are uncertain about others’ attributes. This is just like a simultaneous move game with incomplete information. According to Harsanyi (1967a; 1967b), such interactions can be modeled as a Bayesian game. By assuming that agents’ payoffs are related to a randomly determined “state,” an agent’s uncertainty comes from the fact that her signal does not completely recover the true state. The game form can be set up following Osborne and Rubinstein (1994). There are n players, i = 1, · · · , n. Let Si represent the support of Xip . The set of states, Qn i Si , is defined as the set of all possible Xip ’s for all players. In this case, player 16 i’s “type” is her private information, XJpi . Accordingly, her set of types is the support of those characteristics, Ti = states to her type, τi : Qn i Q Si → k:Ji (k)=1 Sk . Q The signal function is a mapping from the k:Ji (k)=1 Sk . Her prior belief on the set of states is the joint distribution of Xip ’s, Fp (·). The prior belief is the same across all players. For each player, i = 1, · · · , n, there is an action set, Yi . For player i, a strategy is a contingent plan specifying her actions for each of her types, si : actions, y = (y1 , · · · , yn ) ∈ Qn i=1 Yi , Q k:Ji (k)=1 Sk → Yi . For a profile of every player receives a payoff. Because the actions taken by players depend on their types, which is related to the uncertain state, before actions are taken, the payoff of a player is uncertain. We assume that a player’s preference over uncertain payoffs can be expressed by expected utilities. Specification for a continuous choice model differs from that for a discrete choice one. So we discuss payoffs, strategies, and equilibria separately for continuous choice and binary choice. 1. (Continuous Choice) With a strategy profile, s = (s1 (·), · · · , sn (·)), given public information, Z = z, if the realized types are xpJ1 , · · · , xpJn , the payoff for player i is r(si (xpJi ), s−i (xp−Ji )) =q(xi ) + (u(xi ) − i )si (xpJi ) + λsi (xpJi ) X j6=i 1 Wi,j sj (xpJj ) − (si (xpJi ))2 , 2 where s−i (·) denotes the strategies of players other than i, and xp−Ji represents their realized types.5 Given public information Z = z, the expected payoff of player i when 5 We use uppercase letters to represent random variables and lowercase letters for their realizations. 17 her type is xpJi is6 p E[r(si (XJpi ), s−i (X−J ))|XJpi = xpJi , Z = z] i =q(xi ) + (u(xi ) − i )si (xpJi ) + λsi (xpJi ) X Wi,j E[sj (xpJj )|XJpi = xpJi , Z = z] j6=i 1 − (si (xpJi ))2 . 2 This is a quadratic function of the action, si (xpJi ). To maximize expected utility, she chooses si (xpJi ) = u(xi ) + λ X Wi,j E[sj (XJpj )|XJpi = xpJi , Z = z] − i . (2.2.4) j6=i Eq (2.2.4) shows player i’s best response to other players’ strategies, s−i (·). Therefore, a profile of strategies, s = (s1 (·), · · · , sn (·)), is a BNE in this game, if it satisfies (2.2.4), for any i = 1, · · · , n and xpJi ∈ Q 7 k:Ji (k)=1 Sk . Suppose that our observed actions are derived from a BNE, with public information, Z, and privately known characteristics, XJp1 , · · · , XJpn , we have the following relation, yi = u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i , j6=i which is just the case of (2.2.1) and (2.2.2), where hi (·) is an identity function. 2. (Binary Choice) Consider the case that agent i’s action can only take two values, 0 and 1. Namely, Yi = {0, 1}. In this case, for a realization of her type, player i’s strategy, si , specifies whether to take action 1 or 0. Given a strategy profile, s = (s1 , · · · , sn ), the payoff associated with action 0 is normalized to 0 for any player and any type. When public information is Z = z and the realized types are xpJ1 , · · · , xpJn , if player i 6 Because i ’s are i.i.d. and are also independent of the characteristics, X’s, expectation does not change if i is used to make predictions, as we discussed before. 7 need (2.2.4) holds on Q According to Mas-collel, Whinston, and Green (1995), for a player i, we only p S almost everywhere with respect to the probability distribution of X . k Ji k:Ji (k)=1 18 chooses 1, her payoff will be ri (si (xpJi ), s−i (xp−Ji )) = u(xi ) + λ X Wi,j sj (xpJj ) − i . j6=i Thus, her expected payoff when taking action 1 is p E[ri (si (XJpi ), s−i (X−J ))|XJpi = xpJi , Z = z] i =u(Xi ) + λ X Wi,j E[sj (XJpj )|XJpi = xpJi , Z = z] − i . j6=i It is optimal to choose si (xpJi ) = 1 if and only if u(xi ) + λ X Wi,j E[sj (XJpj )|XJpi = xpJi , Z = z] > i , j6=i and si (xpJi ) = 0, otherwise. Thus, the following equation system, si (xpJi ) = I(u(xi ) + λ X Wi,j E[sj (XJpj )|XJpi = xpJi , Z = z] − i > 0), (2.2.5) j6=i defines a BNE. The observed binary choices, y = (y1 , · · · , yn ), can be viewed as actions in equilibrium. That is to say, given public information, Z, and XJp1 , · · · , XJpn , yi = I(u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i > 0), j6=i which corresponds to (2.2.1) and (2.2.2), where hi (·) is an indicator. From the above two examples, the inter-dependence of behaviors in our model, (2.2.1) and (2.2.2), yi = hi (u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i ), (2.2.6) j6=i can be viewed as an outcome of a reduced-from Bayesian game where a BNE satisfies si (XJpi ) = hi (u(Xi ) + λ X Wi,j E[sj (XJpj )|XJpi , Z] − i ), j6=i such that yi = si (XJpi ). 19 (2.2.7) 2.3 2.3.1 Equilibrium Analysis Equilibrium and Expectations Given that observed outcomes are realizations of an equilibrium satisfying (2.2.7), we n o solve equilibrium strategies via solving E[yj |XJpi , Z] . Pick any i and k such that i 6= k. By consistency, we get that E[yi |XJpk , Z] = E[hi (u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i )|XJpk , Z]. (2.3.1) j6=i As defined, Z is a vector representing public information on group and individual features, social relations, and information structures. XJpi is a vector representing private information of i, composed of privately known personal characteristics. According to our assumption, k only knows Ji but not the realization of XJpi . For her, XJpi is a random vector, and E[yj |XJpi , Z] is a random variable. As a result, given the distribution of X p , she has to integrate over all possible realizations of XJpi conditioning on realizations of her own information, XJpk (and Z). The conditional expectation, E[yi |XJpk , Z], is a function of the random vector XJpi . Conditional expectations for behaviors of all group members, E[y1 |·, Z], · · · , E[yn |·, Z], are such functions. This distinguishes our model from previous research by Manski (1993), Brock and Durlauf (2001), and Lee, Li, and Lin (2014). Adopting BNE, with heterogeneity in private information, expectations of yj vary with private information used to make predictions. Conditional on Z = z, let (Ω, F, P ) denote the probability space on which Xjp ’s are defined. When the dimension of each Xkp is kp , XJpj is a measurable function from (Ω, F) P M M to (<Mj , B< j ), where Mj = ( nk=1 Jj (k))kp is the dimension of the vector XJpj and B< j e , is the corresponding Borel set. Then for each Jj , we define a measurable function, ψi,J j M e (x) = E[y |X p = x, z], for any x ∈ <Mj . Then from (<Mj , B< j ) to (<, B< ) as ψi,J i Jj j e (X p ) : (Ω, F, P ) → (<, B , m ) is a random variable with the composite function ψi,J < < Jj j 20 e (X p ) = E[y |X p , z]. The function ψ ψi,J i i,Jj shows how the conditional expectation on yi Jj Jj j varies with realization of XJpj . Thus, we can characterize how the conditional expectation changes with the random vector on which the expectation is based by defining a function on a random vector. With the network connections in our model, an agent needs only to predict the behaviors of agents with whom she is associated. Therefore, for each i, we only need to consider conditional expectations on behaviors of her associated agents. Let’s begin n o with the domain defined by Ai = XJpj : Wj,i 6= 0 . For each 1 ≤ i ≤ n, Ai ∈ Ai , there is j, such that Wj,i 6= 0 and Ai = XJpj .8 Denote the space of all random variables on (Ω, F, P ) by C. Define for each i, ψie : Ai → C, such that, for any Ai ∈ Ai , e ψie (Ai ) = ψi,J (XJpj ) = E[yi |XJpj , z], j (2.3.2) when Ai = XJpj for some j = 1, · · · , n. By collecting those functions into a vector-valued function, we have ψ e : Qn i=1 Ai → Cn , such that for any A = (A1 , · · · , An ) ∈ Qn i=1 Ai , ψ e (A) = (ψ1e (A1 ), · · · , ψne (An )). Expectation functions are defined as conditional expectations of actions in (2.2.6). We now investigate its relationship to a BNE. For simplicity, we will write E[yi |XJpj , z] instead of E[yi |XJpj , Z = z] in the discussions below. Proposition 2.3.1 Conditional on public information Z = z, s = (s1 , · · · , sn ) : Qn i=1 Yi is a profile of strategies. Define vector-valued function, ψ e : Qn i=1 Ai Qn i=1 Ti → → Cn such that ψ e (A)i = E[si (XJpj )|Ai , z], 8 (2.3.3) If no one connects to i, that is, Wj,i = 0 for any j, others’ expectations on i’s behavior do not affect outcomes. For example, n = 3, W2,1 = W1,2 = W3,2 = 1 and Wi,j = 0 otherwise. Agents’ behaviors are influenced by E[y1 |XJp2 ], E[y2 |XJp1 ], and E[y2 |XJp3 ], and not by E[y3 |XJp1 ] or E[y3 |XJp2 ]. Therefore, although we can define ψ3e , it is not relevant for the outcome of the model. To simplify notation, for general analysis of model equilibrium, we define conditional expectations for behaviors of the whole sample and keep in mind that we only need to do that for agents whose expected actions can influence the outcomes. 21 for any i = 1, 2, · · · , n, and A = (A1 , · · · , An ) ∈ Qn i=1 Ai , where Ai = XJpj , for some j with Wj,i 6= 0. Suppose that s is a BNE in the model, i.e., it satisfies (2.2.7), then ψ e satisfies the consistency condition: ψ e (A)i = ψie (Ai ) = E[hi (u(Xi ) + λ X Wi,j ψje (XJpi ) − i )|Ai , z], (2.3.4) j6=i for any i. Additionally, ψ e (A)i = E[yi |Ai , z], where yi ’s are the actions of this equilibrium. On the reverse, suppose that the vector-valued function ψ e : Qn i=1 Ai isfies ψ e (A)i = ψie (Ai ), for any i = 1, 2, · · · , n, and A = (A1 , · · · , An ) ∈ → Cn sat- Qn i=1 Ai , and condition (2.3.4) holds, then the strategy profile, s = (s1 , · · · , sn ), defined by si (XJpi ) = hi (u(Xi ) + λ p e j6=i Wi,j ψj (XJi ) P − i ) is a BNE; i.e., it satisfies (2.2.7). Moreover, (2.3.2) holds for actions associated with this equilibrium. Proof. See Appendix A.1. Hence, a BNE corresponds to an expectation function satisfying (2.3.4). We can analyze equilibrium via analyzing this functional equation. 2.3.2 Existence and Uniqueness From the above discussion, we see that there is a BNE in our model if and only if there is a vector-valued expectation function, ψ e : Qn i=1 Ai → Cn , satisfying (2.3.4). We investigate possible existence and uniqueness of such a mapping by functional analysis. The following are primitive assumptions in our model. Assumption 2.3.1 f () > 0 for all −∞ < < ∞. That is, the support for all i ’s is <. Assumption 2.3.2 u(·) is continuous. 22 Assumption 2.3.3 For any real number x, Hi (x) = E[hi (x − )] < ∞ and is differentiable with respect to x. Moreover, the derivative of Hi (x) is uniformly bounded in x and i; i.e., i (x) | < ∞. max1≤i≤n supx | dHdx Suppose that conditional on Z = z, the joint distribution of X p = (X1p , · · · , Xnp ) is Fp (·|Z = z).9 As Z will be a fixed realization for a group, we will suppress it in subsequent analysis. Let Ψ be a set of functions such that any ψ ∈ Ψ is a vector-valued function which maps a profile of sets A = (A1 , · · · , An ) ∈ Qn i=1 Ai to a random vector in Cn . It has the 0 form, ψ(A) = (ψ1 (A1 ), · · · , ψn (An )) , where for each i = 1, · · · , n, ψi : Ai → C satisfies the following integrability condition: Z max max 1≤i≤n {j:Wj,i 6=0} |ψi (xpJj )|dFp (xp ) < ∞. (2.3.5) That is, ψ is integrable with respect to the distribution of private personal characteristics, conditional on public information in the group, Z = z. From previous discussions, the conditional expectation in equilibrium will be in Ψ once it meets requirement (3.3.4). Define sum and scalar product operations of functions in Ψ in the conventional way. That 0 0 0 is, for any ψ, ψ ∈ Ψ, α ∈ <1 , (ψ + ψ )(A1 , · · · , An ) = ψ(A1 , · · · , An ) + ψ (A1 , · · · , An ), and (αψ)(A1 , · · · , An ) = αψ(A1 , · · · , An ) for any (A1 , · · · , An ) ∈ Qn i=1 Ai . Zero is a function which is equal to 0 almost everywhere with respect to Fp (·). It is easy to see that with these operations, Ψ is a linear space. In addition, define Z kψk = max max |(ψ)i (xpJj )|dFp (xp ), 1≤i≤n {j:Wj,i 6=0} (2.3.6) which is a norm from the following lemma. Lemma 2.3.1 k·k is a well-defined norm on Ψ. We allow correlation between public information, Z, and private characteristics, (X1p , · · · , Xnp ). When they are independent, the conditional distribution will be the same as the unconditional distribution. 9 23 Proof. See Appendix A.1. Moreover, (Ψ, k·k) is a complete linear normed space as the following lemma shows. Lemma 2.3.2 The linear normed space (Ψ, k·k) is a Banach space. Proof. See Appendix A.1. Define an operator T on Ψ, such that (T (ψ))i (A1 , · · · , An ) = (T (ψ))i (Ai ) = E[Hi (u(Xi ) + λ X Wi,j ψj (XJpi ))|Ai , z], (2.3.7) j6=i where Hi (x) = R hi (x − )f ()d. We can see that if Hi (u(Xi ) + λ p j6=i Wi,j ψj (XJi )) P is integrable with respect to Fp (·), T (ψ) ∈ Ψ for any ψ ∈ Ψ. If ψ e ∈ Ψ, it is a fixed point of T . Additionally, if ψ ∈ Ψ is a fixed point of T , it satisfies the consistency condition. Thus, a fixed point of T is an equilibrium conditional expectation (vector-valued) function. That is to say, if we focus on functions in (Ψ, k · k), there is a one-to-one correspondence between equilibrium conditional expectation functions and fixed points of operator T . The following proposition shows that T can be a contraction mapping under a minor condition. Proposition 2.3.2 Under the condition that |λ| kW k∞ D < 1, (2.3.8) i (x) where D = maxi supx | dHdx | is well-defined under Assumption 4.3.2, T : (Ψ, k·k) → (Ψ, k·k) is a contraction mapping. Thus, there is one unique BNE in the model. Proof. See Appendix A.1. For spatial networks, it is conventional to row-normalize W such that kW k∞ = 1. As a result, how stringent the condition (2.3.8) will be depends on the upper bound, D. We take the popular continuous choice and binary choice models as examples below. 1. (Continuous Choice) In this case, hi (x) = x, Hi (x) = Thus, D = 1. 24 R (x − )f ()d = x − E[]. 2. (Binary Choice) When hi (x) = I(x > 0), Hi (x) = F (x). Thus, D = supx |f (x)|. i (x) In this case, the uniform boundedness of | dHdx | will be due to boundedness of ’s density. For the binary choice probit and logit models, the corresponding densities are, respectively, the standard normal density with supx |f (x)| ≤ √1 , 2π and the logistic density with supx |f (x)| ≤ 14 . 2.4 Equilibrium Solution and Computation Proposition 2.3.2 is a key result for the general model framework. It not only provides a sufficient condition for existence and uniqueness of BNE, but also suggests numerical methods for estimation. That is because the fixed point of a contraction mapping can be derived by recursive iterations beginning with an arbitrary initial guess. In this section, we consider three different information structures and illustrate how equilibrium expectations can be solved either analytically or numerically in each case. For all the analysis that follows, condition (2.3.8) is assumed to be satisfied, so there is one and only one BNE. 2.4.1 All Characteristics are Publicly Known When all personal characteristics are publicly known, XJpi = (X1p , · · · , Xnp ), for all i. An agent has the same expectation as that of econometricians, so for any i 6= j, E[yi |XJpj , z] = E[yi |X1p , · · · , Xnp , z]. The equilibrium conditional expectation is a vector, 0 ψ e = (E[y1 |X1p , · · · , Xnp , z], · · · , E[yn |X1p , · · · , Xnp , z]) , which is characterized by the consistency condition (2.3.4). That condition now reduces to ψie = E[hi (u(Xi ) + λ X Wi,j ψje − i )|X1p , · · · , Xnp , z], (2.4.1) j6=i for all i. This is the case in Lee, Li, and Lin (2014). For the linear model, as E[i |X1p , · · · , Xnp , z] = 0, we have a linear system ψie = u(Xi ) + λ e j6=i Wi,j ψi . P 25 The equilibrium expectation vector is 0 ψ e = (I − λW )−1 (u(X1 ), · · · , u(Xn )) . 2.4.2 (2.4.2) Characteristics are Self-Known In this case, each Xip is known only to i herself. We have XJpi = Xip and Ai = n o p Xjp : Wj,i 6= 0 . Xi,g ’s might be correlated across individuals. As they have the same support, we consider the situation that they are “exchangeable” conditional on group public information, Z = z. Assumption 2.4.1 Conditional on public information, Z = z, Xip ’s have the same support, Sp . Their conditional joint distribution, f p (X1p , · · · , Xnp |Z = z),10 is exchangeable, i.e., for any permutation, s : {1, · · · , n} → {1, · · · , n}, p p f p (X1p , · · · , Xnp |Z = z) = f p (Xs(1) , · · · , Xs(n) |Z = z). This includes the obvious case that Xip ’s are independent of each other. More generally, exchangeability of random variables can be represented by Xip = α + pi , where α captures some unobservable common features or shocks to all group members and pi are some i.i.d. idiosyncratic shocks. This notion of exchangeability has been characterized by De Finetti (1975). Under Assumption 2.4.1, f (Xip |Xkp = x, Z = z) = f (Xip |Xkp0 = x, Z = z), for any k 6= i 0 and k 6= i. Hence, Z X e ψi,J (x) = Hi (u(X g , Xic , y) + λ Wi,j ψje (y))fp (y|XkP = x, Z = z)dy k j6=i Z = Hi (u(X g , Xic , y) + λ X Wi,j ψje (y))fp (y|XkP0 = x, Z = z)dy (2.4.3) j6=i e =ψi,J k 0 (x). 10 When Xip ’s are discrete random variables, f p (·|Z = z) is the probability mass function. If Xip ’s are continuous, f p (·|Z = z) represents the distribution density. 26 where Hi (x) = R h(x − )f ()d. Thus, with Assumption 3.3.2, the identity of private characteristics used to make predictions does not matter. It is its realization that influences conditional expectations. Owing to this fact, we can directly define ψie as a mapping from e (x) = <kp (kp is the dimension of Xjp ’s) to <1 . To be specific, for any i, define ψie (x) = ψi,J k E[yi |Xkp = x, z], which is invariant with any k 6= i. Similarly, define ψ e : <nkp → <n as ψ e (x1 , · · · , xn )i = ψie (xi ). Because this is a special case of the general framework, our previous analysis about equilibrium existence and uniqueness applies here. In this case, consistency condition (2.3.4) reduces to Z X e ψi (x) = Hi (u(X g , Xic , y) + λ Wi,j ψje (y))fp (y|x)dy, (2.4.4) j6=i where fp (y|x) is the density of Xip conditional on Xjp = x for any j 6= i and Z = z. So with exchangeability on Xip ’s, one needs to consider the solution of n functions, ψ1e , · · · , ψne , characterized by (2.4.4). We investigate possible quantitative solutions in three different cases that follow. Independent Characteristics Given Z = z, if the characteristics Xjp ’s are independent of each other, fp (y|x) = fp (Xkp = y|Xip = x, z) is the same as fp (y) = fp (Xkp = y|z) without x. Hence, ψie (x) = R Hi (u(X g , Xic , y) + λ e j6=i Wi,j ψj (y))fp (y)dy. P This means that ψie (x) is a constant func- tion of x, whose value will be denoted as ψie . The equilibrium expectation vector, ψ e = 0 (ψ1e , · · · , ψne ) , is characterized by Z X e ψi = Hi (u(X g , Xic , y) + λ Wi,j ψje )fp (y)dy, (2.4.5) j6=i for i = 1, · · · , n. For the binary choice model, it can be solved by contraction mapping iteration similar to that in Lee, Li and Lin(2014). For the linear model, (2.4.5) is a linear 27 equation system which has an analytical solution, 0 ψ e = (I − λW )−1 (E[u(X g , X1c , y)], · · · , E[u(X g , Xnc , y)]) . (2.4.6) Compared with (2.4.2), instead of true realizations, expectations are used in (2.4.6). That is because Xip is not observed by any agent other than i in the group. If there are no publicly known individual characteristics, Xic , the expectations on the right hand side of (2.4.6) would be the same. We can rewrite (2.4.6) as ψ e = E[u(X g , y)](I − λW )−1 ln , where n is the group population and ln is the n × 1 vector composed of 1’s. Therefore, when W were row-normalized, ψ e = E[u(X g , y)]ln /(1 − λ). Then all the components in ψ e would be equal. This would be the case in Brock and Durlauf (2001). If W were not row-normalized and all its entries were 0 or 1 values, ψ e would be proportional to the vector of “centrality measures,” as in Ballester et. al. (2010) Correlated Discrete Characteristics In general, Xip ’s may be correlated. In this subsection, we focus on the case that Xip has a finite support. As a result, ψie (·) is a function defined on a finite set and can be viewed as a vector. Suppose that the support of Xip is xl : 1 ≤ l ≤ m , where xl is a vector with specific values and m is the number of support points. The conditional probability function, fp (y|x), is fully captured by the transition matrix, p11 p12 · · · .. .. .. P = . . . p1m .. , . pm1 pm2 · · · pmm 0 where pll0 = prob(Xip = xl |Xkp = xl ), for k 6= i. With P , (2.4.4) becomes ψie (xl ) = m X 0 pll0 Hi (u(X g , Xic , xl ) + λ 0 X 0 Wij,g ψje (xl )), j6=i l =1 which characterizes the equilibrium expectations vector, 0 ψ e = (ψ1e (x1 ), · · · , ψne (x1 ), · · · , ψ1e (xm ), · · · , ψne (xm )) . 28 (2.4.7) For the linear model, ψ e is an nm × 1 vector and can be solved analytically from the linear system, ψie (xl ) = m X 0 pll0 (u(X g , Xic , xl ) + λ X 0 0 Wi,j ψje (xl )), (2.4.8) j6=i l =1 for i = 1, · · · , n, l = 1, · · · , m. To be specific, define 0 u = (u(X g , X1c , x1 ), · · · , u(X g , Xnc , x1 ), · · · , u(X g , X1c , xm ), · · · , u(X g , Xnc , xm )) . Then (2.4.8) can be written as ψ e = (P ⊗In )u+λ(P ⊗W )ψ e ,where In is the n×n-dimension identity matrix. As kP ⊗ W k∞ ≤ kP k∞ kW k∞ = kW k∞ , |λ(P ⊗ W )k∞ < 1 under (2.3.8). So I − λ(P ⊗ W ) is invertible. Hence, ψ e = (I − λ(P ⊗ W ))−1 (P ⊗ In )u. (2.4.9) Correlated Continuous Characteristics Now we consider the case where Xip ’s are continuous random variables. In this case, the equilibrium expectations, ψie for i = 1, · · · , n, in (2.4.4) are functions. Under some stochastic structure for the simpler linear model, analytical solutions may still be possible. Here we consider a linear model where the expectation of Xip conditional on Xjp and Z is linear in Xjp . Assumption 2.4.2 hi (x) = x for any x and all i; i.e., the model is linear. Assumption 2.4.3 E[i ] = 0 for any i. 0 Assumption 2.4.4 u(X g , Xic , Xip ) = v(X g , Xic ) + Xip β. That is, individual own effect is separable in public information and private information, and is linear in self-known characteristics, Xip . 29 Assumption 2.4.5 In addition to Assumption 2.4.1, E[Xip |Xjp , z] = µ + CXjp , for any i 6= j, where the dimension of Xip is kp × 1, µ is a kp × 1 vector and C is a kp × kp matrix. Both µ and C may depend on Z = z. One multivariate distribution that satisfies Assumption 2.4.5 is the joint normal distribution. Suppose that conditional on Z = z, (Xip , Xjp ) with i 6= j are jointly nor Σ1 Σ2 0 0 0 . Then, E[Xip |Xjp , z] = mal with mean (e µ ,µ e ) and variance-covariance matrix 0 Σ2 Σ1 p (I − Σ2 Σ−1 µ + Σ2 Σ−1 1 )e 1 Xj . Proposition 2.4.1 Under Assumptions 2.4.1 - 2.4.5, the unique equilibrium conditional expectation is linear, i.e., 0 ψie (X) = ai + bi X, 0 0 0 (2.4.10) 0 for any i = 1, · · · , n. b = (b1 , · · · , bn ) and a = (a1 , · · · , an ) are characterized by 0 0 b = (Inkp − λ(W ⊗ C ))−1 (ln ⊗ C β), (2.4.11) and 0 0 a = (In − λW )−1 (v + (µ β)ln + λ(W ⊗ µ )b), (2.4.12) 0 where v = (v(X g , X1c ), · · · , v(X g , Xnc )) and ln is an n × 1 vector of ones. Proof. See Appendix A.1. In general, equilibrium expectation cannot be solved analytically. We need to use numerical methods. When self-known characteristics are continuously distributed, the equilibrium expectation is a vector-valued function. Inspecting model specifics, we find that for each i = 1, · · · , n, ψie (·) is expressed as an integration, which can be approximated by Gaussian quadratures. Suppose that there are K abscissae, x1 , ..., and xK . We first solve the vector 0 (ψie (x1 ), · · · , ψie (xK )) by contraction mapping as we did for discrete characteristics in the 30 last section. Then we approximate ψie (x) at any x in the support by associating those abscissae with corresponding weights. A detailed discussion on numerical methods is in Appendix A.2. In Monte Carlo experiments, we take K = 8. 2.4.3 Socially-Known Characteristics Consider the case that Xip is known to i herself and any k who is linked to i, Wk,i 6= 0. The information structure in this case is Ji (l) = 1 if l = i or Wi,l 6= 0; and Ji (l) = 0, otherwise. Use XJpi to represent the vector of Xjp ’s which is known to i. Consistency for equilibrium conditional expectation can be written as follows: for any k 6= i and Wk,i 6= 0, ψie (XJpk ) e =ψi,J (XJpk ) k e =ψi,J (Xkp , Xip , (Xlp : l 6= i, l 6= k, Wk,l 6= 0)) k =E[Hi (u(Xi ) + λ X Wi,j ψje (XJpi ))|XJpk ] (2.4.13) j6=i Z = Hi (u(X g , Xic , Xip ) + λ X e Wi,j ψj,J (Xip , (Xlp0 : Wi,l0 6= 0))) i j6=i fp (Xlp0 0 : l 6= k, Wi,l0 6= 0, Wk,l0 = 0|Xkp , Xip , (Xlp : l 6= i, l 6= k, Wk,l 6= 0), Z = z) 0 d(Xlp0 : l 6= k, Wi,l0 6= 0, Wk,l0 = 0). To solve the equilibrium conditional expectation is to solve those functions for all those i and k who are linked and their associated information structures. When all Xip ’s are discrete, we can solve them as vectors. For continuous variables in Xip , we can either discretize Xip ’s or use our Gaussian quadrature approximation. What is different from previous discussion about private information is that the number of such functions increases, because every agent i can be associated with many agents who have different social relations and information structures. 31 Independent Characteristics When Xip ’s are independent of each other, conditional on Z = z, the consistency condition (2.4.13) can be simplified as e (Xkp , Xip , (Xlp : l 6= i, l 6= k, Wk,l 6= 0)) ψie (XJpk ) =ψi,J k Z X e = Hi (u(X g , Xic , Xip ) + λ Wi,j ψj,J (Xip , (Xlp0 : Wi,l0 6= 0))) i j6=i fp (Xlp0 0 0 : l 6= k, Wi,l0 6= 0, Wk,l0 = 0|Z = z)d(Xlp0 : l 6= k, Wi,l0 6= 0, Wk,l0 = 0). (2.4.14) We see that the right hand side of the above equation is a function of those Xlp ’s that are e known to both i and k. Hence, effectively, the function ψi,J may have fewer arguments k e (X p , X p , (X p : l 6= i, l 6= k, W than ψie (XJpk ) = ψi,J k,l 6= 0)). Actually, it can be written as i k l k e (X p , X p , (X p : l 6= i, l 6= k, W 6= 0, W ψi,J i,l k,l 6= 0)). i k l k Hence, for each i and k with Wk,i 6= 0, we only need to know how k’s expectation on yi varies with Xlp ’s which are known to both i and k. This feature reduces the dimension of function domains and hence can alleviate computational burden. To demonstrate the equilibrium solution, we provide an illustration of a simpler case. Consider a simply network made of n = 3 agents. The social matrix W is such that Wi,i = 0 0 for all i = 1, 2, 3; for i 6= j, W2,3 = W3,1 = 0 and Wi,j 6= 0 otherwise. Thus, J1 = (1, 1, 1) , 0 0 J2 = (1, 1, 0) and J3 = (0, 1, 1) . Namely, agent 1 knows all agents. Agent 2 knows the most informative agent, agent 1. However, agent 3 knows only agent 2. To solve equilibrium, we solve the following equation system:11 e e ψ2,J (X1p , X2p ) = H2 (u(X g , X2c , X2p ) + λW2,1 ψ1,J (X1p , X2p )), 1 2 (2.4.15) e e ψ3,J (X2p , X3p ) = H3 (u(X g , X3c , X3p ) + λW3,2 ψ2,J (X2p )), 1 3 (2.4.16) 11 We apply two properties of equilibrium expectation functions here. First, i makes a prediction on j only if i links herself to j; i.e., Wi,j 6= 0. Second, when k is making predictions on i, only the characteristics which they both know affect the expectation. 32 e ψ1,J (X1p , X2p ) = 2 Z e e H1 (u(X g , X1c , X1p ) + λW1,2 ψ2,J (X1p , X2p ) + λW1,3 ψ3,J (X2p , X3p )) 1 1 · fp (X3p |Z = z)dX3p , (2.4.17) and e ψ2,J (X2p ) 3 Z = e H2 (u(X g , X2c , X2p ) + λW2,1 ψ1,J (X1p , X2p ))fp (X1p |Z = z)dX1p . 2 (2.4.18) This is an equation system of four functions. Although Xip ’s are independent of each other, equilibrium conditional expectations are much different from those when Xip ’s are only self-known. We need to solve expectations under each possible information structure, e , with W ψi,J k,i 6= 0, as a function. k For computation of equilibrium expectations, consider the case that Xip ’s are discrete. Suppose that conditional on Z = z, Xip ’s are i.i.d. distributed, taking m possible values, x1 , · · · , xm , with probabilities, pl = prob(Xip = xl |Z = z), for any i = 1, · · · , n and l = 1, · · · , m. Then (2.4.17), (2.4.15), (2.4.18), and (2.4.16) can be written as follows: 0 0 0 e e ψ2,J (xl , xl )), (xl , xl ) = H2 (u(X g , X2c , xl ) + λW2,1 ψ1,J 2 1 0 ∗ (2.4.19) 0 ∗ e e ψ3,J (xl , xl ) = H3 (u(X g , X3c , xl ) + λW3,2 ψ2,J (xl )), 1 3 0 e ψ1,J (xl , xl ) = 2 m X 0 (2.4.20) 0 ∗ e e pl∗ H1 (u(X g , X1c , xl ) + λW1,2 ψ2,J (xl , xl ) + λW1,3 ψ3,J (xl , xl )), 1 1 l∗ =1 (2.4.21) and e ψ2,J (xl 3 0 )= m X 0 0 e pl H2 (u(X g , X2c , xl ) + λW2,1 ψ2,J (xl , xl )). 1 (2.4.22) l=1 e ’s as vectors, for instance, We can define ψi,J k 0 e e e e e = (ψ1,J (x1 , x1 ), · · · , ψ1,J (x1 , xm ), · · · , ψ1,J (xm , x1 ), · · · , ψ1,J (xm , xm )) , ψ1,J 2 2 2 2 2 and solve them by contraction mapping iteration. For the case of continuous characteristics, other considerations, such as quadrature method, are possible. 33 Correlated Characteristics It is possible that Xip ’s are correlated with each other. We illustrate the equations of equilibrium expectations by using the same network relations in the last section. In that small group made up of n = 3 agents, W11 = W22 = W2,3 = W3,1 = W3,3 = 0 and Wi,j 6= 0 0 0 0 otherwise. So J1 = (1, 1, 1) , J2 = (1, 1, 0) and J3 = (0, 1, 1) . Then we have the following equations for equilibrium conditional expectations: e e ψ2,J (X1p , X2p ) = H2 (u(X g , X2c , X2p ) + λW2,1 ψ1,J (X1p , X2p )), 1 2 (2.4.23) e e (2.4.24) ψ3,J (X2p , X3p ) = H3 (u(X g , X3c , X3p ) + λW3,2 ψ2,J (X2p , X3p )), 1 3 Z p p e e e ψ1,J (X2p , X3p )) (X , X ) = H1 (u(X g , X1c , X1p ) + λW1,2 ψ2,J (X1p , X2p ) + λW1,3 ψ3,J 1 2 1 2 1 fp (X3p |X1p , X2p , Z = z)dX3p , (2.4.25) and e ψ2,J (X2p , X3p ) 3 Z = e H2 (u(X g , X2c , X2p ) + λW2,1 ψ2,J (X1p , X2p ))fp (X1p |X2p , X3p , Z = z)dX1p . 1 (2.4.26) Compared with (2.4.15), (2.4.16), (2.4.17), and (2.4.18), we see that contrary to the e independent case, ψ2,J varies with not only X2p but also X3p . Hence, dimension of the 3 domain of equilibrium conditional expectation can be higher with correlated characteristics than that with independent characteristics. To show how to solve equilibrium conditional expectation, let us consider the case that Xip ’s are discrete random variables. Keep on assuming “exchangeability”. The joint distribution of those discrete random variables can be represented as P (X1p , · · · , Xnp ) = R P (X1p |η) · · · · · P (Xnp |η)dFη (η), where η is some common factor. Then we can simplify notations for conditional probabilities. Assume that the support for each Xip conditional 34 on Z = z is x1 , · · · , xm , we denote 0 ∗ pl∗ |ll0 = P rob(Xip3 = xl |Xip1 = xl , Xip2 = xl , Z = z) 0 for any i1 , i2 , i3 = 1, 2, 3, i1 6= i2 , i1 6= i3 , i2 6= i3 , l, l , l∗ = 1, · · · , m. In this case, (2.4.23), (2.4.24), (2.4.25), (2.4.26) can be written as 0 0 0 e e ψ2,J (xl , xl ) = H2 (u(X g , X2c , xl ) + λW2,1 ψ1,J (xl , xl )), 1 2 0 ∗ 0 ∗ (2.4.27) ∗ e e ψ3,J (xl , xl ) = H3 (u(X g , X3c , xl ) + λW3,2 ψ2,J (xl , xl )), 1 3 0 e ψ1,J (xl , xl ) = 2 m X 0 (2.4.28) 0 ∗ e e pl∗ |ll0 H1 (u(X g , X1c , xl ) + λW1,2 ψ2,J (xl , xl ) + λW1,3 ψ3,J (xl , xl )), 1 1 l∗ =1 (2.4.29) and 0 ∗ e ψ2,J (xl , xl ) = 3 m X 0 0 e pl|l0 l∗ H2 (u(X g , X2c , xl ) + λW2,1 ψ2,J (xl , xl )). 1 (2.4.30) l=1 e ’s ψi,J k Regarding as vectors, they can be solved analytically for linear models or numer- ically by contraction mapping iteration for binary choice models. 2.5 2.5.1 Identification, Likelihood, and Estimation Identification For identification, we focus on how to recover the model primitives, the function u(·), the intensity of social interaction effects, λ, and the distribution of the exogenous variables, FX, (·), from the data. In the following discussion, we use Y , X c and X g to denote the matrices of observations of those endogenous and exogenous variables for a whole group. The vector, X g , represents features of a group. Individual observations are represented by Yi , Xic and Xip . All the exogenous characteristics of a group are denoted as X = 0 ln ⊗ X g , X c , X p , where n is the size of the group; ln is an n × 1 vector composed by 0 0 0 0 0 0 1’s; X c = (X1c , · · · , Xnc ) and X p = (X1p , · · · , Xnp ) . 35 Analogous to Brock and Durlauf (2007), observational equivalence is defined based on the distribution of observed variables and consistency of equilibrium expectations. Definition 2.5.1 Given social relations, W = W , (u, λ, F,X ) is observationally equivalent e Fe,X ) at W , if they imply the same distribution of observables, to (e u, λ, e Fe,X ). FY,X|W (·, ·; u, λ, F,X ) = FY,X|W (·, ·; u e, λ, (2.5.1) Following from (2.5.1), they are consistent with the same equilibrium expectation, ψ e (·), Z X W i,j ψje (XJpi ) − i )f,X p |X p ,X g ,X c (i , XJpi )ddXJpi ψie (XJpk ) = hi (u(Xi ) + λ Ji j6=i Z = e hi (e u(Xi ) + λ X Jk (2.5.2) W i,j ψje (XJpi ) − i )fe,X p |X p j6=i Ji Jk p p ,X g ,X c (, XJi )ddXJi , for any Y and X in their support. For some X in the support of X, we say that (u, λ, F,X ) e Fe,X ) are observationally equivalent at (X, W ), if and (e u, λ, (2.5.10 ) e Fe,X ), FY |X,W (·, ·; u, λ, F,X ) = FY |X,W (·, ·; u e, λ, and (2.5.2) holds for X = X. e Fe,X ) 6= (u, λ, F,X ) is not u, λ, Definition 2.5.2 (u, λ, F,X ) is identifiable at W if any (e observationally equivalent to (u, λ, F,X ) at W = W . (u, λ, F,X ) is identifiable at (X, W ) e Fe,X ) 6= (u, λ, F,X ) is not observationally equivalent to (u, λ, F,X ) at X = X if any (e u, λ, and W = W . Our discussions of identification are based on the following assumptions. Assumption 2.5.1 is independent of X. Moreover, has a full support. Assumption 2.5.2 FX can be identified from observations of X. 0 0 0 Assumption 2.5.3 u(·) is a linear function of the form u(Xi ) = β0 +Xic β1 +Xip β2 +X g β3 . 36 We focus on identification of two models, the continuous choice model, hi (z) = z, and the binary choice model, hi (z) = I(z > 0), for all i and z. For each model, we first discuss the special case that all exogenous characteristics are public information to group members, and then the general structure of incomplete information. Additionally, since X g is constant for all members within the same group, if there is just one single group, it will be absorbed by the constant term. As a result, we categorize the sample network structures into two classes, depending on whether W is block-diagonal or not. The block-diagonal case corresponds to a sample consisting of several independent groups. Variation across different groups help identify the effect of group-level features. But of course, for a single network, the effect of group-level features is unidentified. Linear Model for Continuous Choices We focus our analysis on a single network W and then comment on the case with many different networks. In a single network, X g is absorbed by the constant term. By 0 0 Assumption 2.5.3, u(Xi ) = β0 + Xic β1 + Xip β2 . If all exogenous characteristics are publicly known to group members, u(Xi ) = β0 + 0 Xic β1 . Given the network structure, W , and the commonly known individual characterisc tics, X c = X , we try to identify the true parameters, β0∗ , β1∗ , λ∗ , and F∗ from the observable c data about Y . To ensure identification, we impose additional assumptions on X and W . c Assumption 2.5.4 ln , W ln , X , W X c has full column rank, where n is the size of c this group and ln is an n × 1 vector of 1’s. When W is row-normalized, ln , X , W X c has full column rank. c Proposition 2.5.1 Suppose that all exogenous variables are public information. For X in the support of X c , and a social network matrix, W = W , of a single group, if Assumptions 37 c 2.3.8, 2.5.1, 2.5.2, 2.5.3, and 2.5.4 hold, (β0∗ , β1∗ , λ∗ , F∗ ) is identified at (W , X ) for the linear model. Proof. See Appendix A.3. In the case that some characteristics are privately known, we need to impose restrictions on X p in order to ensure identification. c p Assumption 2.5.5 Given an information structure, J = J, X c = X , X p = X and c p W = W , E[Yi |J = J, W = W , X c = X , XJpk = X Jk ] can be identified (nonparametrically), for any i, k = 1, · · · , n, i 6= k. c p Assumption 2.5.6 ln , X , X , E p has full column rank, where E p is an n×1 vector, whose i-th component is P j6=i W i,j E[Yj |J c p = J, W = W , X c = X , XJpi = X Ji ]. c p Proposition 2.5.2 Given an information structure, J, X in the support of X c , X in the support of X p , and a social network matrix, W = W , if Assumptions 2.3.8, 2.5.1, 2.5.2, c p 2.5.3, 2.5.5, and 2.5.6 hold, (β0∗ , β1∗ , λ∗ , F∗ ) is identified at (J, W , X , X ) for the linear model. Proof. See Appendix A.3. From Proposition 2.5.2, the key to ensure identification is to make sure the relevant conditional expectations are not linearly dependent on relevant exogenous characteristics. In some special cases, more elementary sufficient conditions can be derived. For example, when Xip is known only to i and is a discrete random variable with a finite support, the conditional expectation can be solved analytically as (2.4.9). Then we can derive a new set of sufficient c conditions for identification. For W = W , X c = X , suppose that X p has a finite support, 0 0 0 0 0 X p,s = (X p,1 , · · · , X p,m ) , where X p,1 , · · · , X p,m are m possible values that X p many b = lnm , lm ⊗ X c , X p,s ⊗ ln . take. The corresponding transition matrix is P . Define X 38 b (P 2 ⊗ W )X b has full column rank. When Assumption 2.5.7 The matrix (P ⊗ In )X, c c W is row-normalized, lnm , lm ⊗ X , (P X p,s ) ⊗ ln , lm ⊗ (W X ), (P 2 X p,s ) ⊗ ln has full column rank. Proposition 2.5.3 Suppose that Xip is only revealed to i, for i = 1, · · · , n, and the distribution of Xip ’s is exchangeable with a finite support, X p,s , and a transition matrix, P . For c X in the support of X c , and a social network matrix for a single group, W = W , if Asc sumptions 2.3.8, 2.5.1, 2.5.2, 2.5.3, and 2.5.7 hold, (β0∗ , β1∗ , λ∗ , F∗ ) is identified at (W , X ) for the linear model. Proof. See Appendix A.3. In Section 2.4, we discussed another special case when Xip ’s are continuously distributed. According to Proposition 2.4.1, when E[Xip |Xjp , z] = µ+CXjp , the equilibrium expectations, 0 ψie (x) = ai + bi x, i = 1, · · · , n, are linear in private characteristics Xip , which is random. c c 0 0 For X c = X and W = W , a = (a1 , · · · , an ) = (In − λ∗ W )−1 (β0∗ ln + X β1∗ + (µ β2∗ )ln + 0 0 0 0 0 0 λ∗ (W ⊗ µ )b), and b = (b1 , · · · , bn ) = (In ⊗ Ikp − λ∗ (W ⊗ C ))−1 (ln ⊗ C β2∗ ). We provide sufficient conditions for identification in this special case. Denote B = (b1 , · · · , bn ), 0 0 Gn = (In − λ∗ W )−1 W and Gc,n = (In ⊗ Ikp − λ∗ W ⊗ C )−1 (W ⊗ C ). We consider the following two conditions. 0 0 c Assumption 2.5.8 ln ⊗ C , Gc,n (ln ⊗ C )β2∗ and ln , X have full column ranks. 0 0 0 c c Assumption 2.5.9 ln ⊗C and ln , X , Gn [ln β0∗ + X β1∗ + (ln β2∗ + B )µ] have full column ranks. Proposition 2.5.4 Suppose that Xip is only revealed to i, for i = 1, · · · , n, and the distribution of Xip ’s is exchangeable with a continuous density satisfying Assumption 2.4.5. For 39 c X in the support of X c , and a social network matrix, W = W , (β0∗ , β1∗ , λ∗ , F∗ ) is identified c p at (W , X , X ) for the linear model, if Assumptions 2.3.8, 2.5.1, 2.5.2, 2.5.3, and 2.5.8 hold, or Assumptions 2.3.8, 2.5.1, 2.5.2, 2.5.3, and 2.5.9 hold. Proof. See Appendix A.3. Those rank identification conditions, 2.5.8 and 2.5.9, can be simplified when W is rownormalized. In that case, W ln = ln , so Gn ln = ln /(1 − λ∗ ), and 0 0 0 0 Gc,n (ln ⊗ C ) = (In − λ∗ W ⊗ C )−1 (W ⊗ C )(ln ⊗ C ) 0 0 = (In − λ∗ W ⊗ C )−1 (ln ⊗ C )C 0 0 0 0 = (ln ⊗ C )(Ikp − λ∗ C )−1 C , because ∗ 0 (In − λ W ⊗ C ) −1 0 (ln ⊗ C ) =( ∞ X 0 0 (λ∗ W ⊗ C )m )(ln ⊗ C ) m=0 0 =(ln ⊗ C ) ∞ X 0 (λ∗ C )m m=0 0 0 =(ln ⊗ C )(Ikp − λ∗ C )−1 . When W is row-normalized, the first part in Assumption 2.5.8 would not hold. This is 0 0 0 because the column rank of (ln ⊗ C ) Ikp , (Ikp − λ∗ C )−1 C β2∗ cannot exceed kp and does not have full column rank. In this case, identification depends on Assumption 2.5.9, which 0 is simplified to ln ⊗ C and 0 0 c c ln , X , ln (β0∗ + µ β2∗ )/(1 − λ∗ ) + Gn (X β1∗ + B µ) have full column ranks. In the special case that the Xjp ’s are independent of each other, B = 0 and Gc,n = 0. The identification conditions, 2.5.8 and 2.5.9 would be invalid. This is because there are only some linear combinations of parameters that can be identified through the equilibrium conditional expectations. In the proof of Appendix A.3, it is obvious that in this case, (A.3.1) is not useful. Through (A.3.2), one can identify β1∗ , λ∗ and a linear combination 0 β0∗ + µ β2∗ . This becomes more transparent in the implied equilibrium, (2.4.6), which has 40 c 0 c 0 ψ e = (I − λ∗ W )−1 (ln β0∗ + X β1∗ + ln µ β2∗ ) = (I − λ∗ W )−1 [ln (β0∗ + µ β2∗ ) + X β1∗ ]. However, identification of each parameter can be achieved by considering (2.5.10 ), because in that case c p c p Yn = ln β0∗ + X β1∗ + X β2∗ + λ∗ W ψ e − c 0 = ln β0∗ + X β1∗ + X β2∗ + λ∗ Gn [ln (β0∗ + µ β2∗ ) + X β1∗ ] − . Under the condition that c p c 0 ln , X , X , Gn (ln (β0∗ + µ β2∗ ) + X β1∗ ) (2.5.3) has full column rank, parameters can be identified. We note that in this case, the presence of X c plays a key role in identification. If X c were not relevant, i.e., β1∗ = 0, then when either λ∗ = 0, or W is row-normalized, the equilibrium expectations become a constant 0 0 vector, ψ e = (In − λ∗ W )−1 ln (β0∗ + µ β2∗ ) = ln (β0∗ + µ β2∗ )/(1 − λ∗ ). Then the model is simply 0 Y = ln [β0∗ + λ∗ (β0∗ + µ β2∗ )/(1 − λ∗ )] + X p β2∗ − . It can be seen that λ∗ would not be revealed in this case. When the whole sample is formed by many independent groups, social relationships of the whole sample can be represented by a block-diagonal matrix, W . β3∗ can be identified from the variation of X g among different groups. Binary Choices For the binary choice model, since a lot of information about y ∗ is missed, identification is more difficult. Our discussion will focus on the case with a parametric distribution of the location-scale family, which is used frequently in empirical studies. Assumption 2.5.10 The idiosyncratic shocks, i ’s, are i.i.d. with support <. The distri bution belongs to a location-scale family, f () = σ1 fs ( −µ σ ), where µ and σ are known and normalized respectively to be equal to 0 and 1. fs (·) is a known standard density function in this family. 41 Since fs (·) is a known function, Fs (·) is known. By taking the inverse, Fs−1 , on choice probabilities which are identifiable from the data set, we can derive a linear equation. That helps us derive sufficient conditions for identification on model parameters, β ∗ and λ∗ . Proposition 2.5.5 Suppose that all exogenous variables are public information. For X c in the support of X c , and a social network matrix, W = W , if Assumptions 2.3.8, 2.5.1, c c 2.5.2, 2.5.3, and 2.5.10 hold and ln , X , W E[Y |W = W , X c = X ] has full column c rank, (β0∗ , β1∗ , λ∗ ) is identified at (W , X ) for the binary choice model. Proof. See Appendix A.3. c Proposition 2.5.6 Given a structure of incomplete information, J, X in the support of X c , X p in the support of X p , and a social network matrix, W = W , if Assump- tions 2.3.8, 2.5.1, 2.5.2, 2.5.3, 2.5.5, 2.5.6, and 2.5.10 hold, (β0∗ , β1∗ , β2∗ , λ∗ ) is identified c p at (J, W , X , X ) for the binary choice model. Proof. See Appendix A.3. In addition to identification for classical reference, one may also consider identification via Bayesian inference. Bayesian analysis is concerned with how sample information can improve the posterior distribution of parameters over the prior distributions. The posterior distribution can be simulated by the Metropolis-Hastings algorithm. Details are in Appendix A.5. 2.5.2 Estimation For identification and estimation, we consider a parametric model where the direct utility, u(Xi ; β), is a parametric function with parameter, β, and f (; σ) and fp (X p ; η, Z) are parametric density functions with parameters σ and η. 42 From the econometrician’s point of view, exogenous characteristics are observable, and observed outcomes, yi = hi (u(Xi ; β) + λ p j6=i Wi,j E[yj |XJi , Z] − i,g ), P are stochastic due to the unobserved i . As i ’s are i.i.d. within a group, conditional on exogenous characteristics, likelihoods of individual outcomes are independent of each other. Therefore, the log likelihood is log Lc (β, λ, σ, η|y1 , · · · , yn , X1 , · · · , Xn ) = X log L(yi |X, β, λ, σ, η). (2.5.4) i In the case that all X’s are public information in the group, an agent does not need to integrate over unknown characteristics when making predictions. Then, the distribution of yi , for i = 1, · · · , n, conditional on X, does not depend on η. When Xip ’s are privately known, the conditional likelihood function will depend on η. For econometricians, all Xip ’s are observable. η can be correctly inferred from the sample of X p . Hence, econometricians can plug the estimated value of η into the likelihood (2.5.4). Alternatively, one may set up the full likelihood function of Y and X p for estimation, if the distribution of X p is known, P i log L(yi |X, β, λ, σ, η) + log L(X1p , · · · , Xnp |η). The full likelihood may achieve efficiency, but optimization will be over a higher dimensional parameter space. The twostep estimation procedure can be computationally simpler. The estimate will be consistent but there may be some loss of efficiency. When calculating the likelihood (2.5.4), we need to solve for equilibrium expectations, E[yj |XJpi , Z], as a fixed point. For estimation, we nest the fixed point solution within an ML estimation. We solve for the unique fixed point with a given parameter vector and calculate likelihood at that parameter vector. Then we search for the vector which maximizes the likelihood. This method is a “nested fixed point” (NFXP) algorithm, frequently used in estimating dynamic discrete choice models, such as in Rust (1987). By using NFXP, we fully solve (conditional) expectation functions. Since there is a one-to-one correspondence 43 between (conditional) expectation function and a BNE in the underlying simultaneousmove game with incomplete information, we fully solve for equilibrium when estimating the model.12 Since the likelihood of yi ’s are independent of each other, consistency and asymptotic normality of the estimator can be derived in a conventional way as for crosssectional data. 2.6 Monte Carlo Experiments We investigate finite sample properties of the MLE by Monte Carlo experiments. As usual, u(·) is assumed to be a linear function, u(Xi ) = β0 + Xic β1 + Xip β2 , where the observed group feature, X g , is absent. The idiosyncratic shocks, i ’s are i.i.d. distributed with a pdf of φ(·; σ), where φ(·; σ) is a normal density with zero mean and standard deviation σ > 0. The true parameter values are β0 = 0, β1 = 1, β2 = 1, λ = 0.3, and σ = 1. The social weighting matrix, W , represents network relations, and is composed of 0’s and 1’s and then row-normalized. We consider two types of networks. In the first case, W can be organized as a block diagonal matrix, as the whole sample is a collection of a number of independent groups. In our experiments, each group has the same size of n = 20. Within a group, for every agent, 3 other agents are randomly selected to be linked to her, represented as F = 3. The number of independent groups, G, increases from 100 to 500. The second case has a single network with nodes of either n = 200 or n = 1000. We try two different settings on the number of links an agent can make. Under one assumption, everyone links with a fixed number of agents, F . We choose F = 30 for n = 200. When 12 Our method differs from some indirect approaches for estimating incomplete information games. For the discrete choice case, first, some semi- or non-parametric methods are used to derive a consistent estimate of choice probabilities, and then those estimates are substituted back to estimate model parameters, such as Bajari et al (?) and Aradilla-Lopez (2010). However, for such indirect approached, a consistent estimate of choice probabilities often requires a large repetition of the same game played by the same people. It is hard to meet that requirements in the context of social interactions, as network links are specific for a group of players. For panels, networks might change over time. 44 n = 1000, we look at F = 30 and F = 150. In another case, the number of associations an agent can make is randomly determined. We allow that number to take any integer from 0 to an upper bound, U F with equal probabilities. U F = 59 for n = 200. For n = 1000, we investigate cases when U F = 59 and 299. The commonly known individual characteristics, X c , are generated as independent standard normal variables. For Xip , we consider two cases. In one case, it is a discrete variable. In the other case, it is a continuous variable. In the discrete case, Xip is dichotomous, taking values 0 or 1. The realizations of Xip ’s are determined in the following way: 40% of agents in a group (or block if W is block diagonal) are picked randomly to have Xip = 1; otherwise, Xip = 0. For the design, the distribution of Xip ’s is “exchangable”, with a transition (conditional) probability matrix: P (X2p = 1|X1p = 1) P (X2p = 0|X1p = 1) = P = P (X2p = 1|X1p = 0) P (X2p = 0|X1p = 0) n1 −1 n−1 n1 n−1 n−n1 n−1 n−n1 −1 n−1 ! , where n is the number of agents in the group and n1 = 0.4n. Members know the joint distribution of Xip ’s. Because both n and n1 are observable, econometricians can infer that distribution from data. Thus, we focus on the likelihood of yi ’s conditional on regressors for estimation. p For the case with continuous Xip ’s, we adopt the framework that Xi,g = αg + pi,g , for g = 1, · · · , G and i = 1, · · · , n. Suppose that αg ’s are i.i.d. normal with mean µ and variance σ12 across g. pi,g ’s are i.i.d. normal with zero mean and variance σ22 for all i and g. They are 0 p p also independent of αg ’s. Then within a group, g, (X1,g , · · · , Xn,g ) is jointly normal with 0 mean (µ, · · · , µ) and variance-covariance 1 ρ η2 . .. matrix ρ ··· ρ 1 · · · ρ .. . . .. , . . . ρ ρ ··· 1 45 (2.6.1) where η 2 = σ12 + σ22 and ρ = σ12 2 σ1 +σ22 p p . For any i 6= j and given Xj,g = x, Xi,g is normally distributed with mean ρx + (1 − ρ)µ and variance (1 − ρ2 )η 2 . We choose µ = 1, η = 2 and ρ = 0.4. While values of those parameters are known to agents, econometricians need to estimate them from observed data. In this situation, we adopt a two-step algorithm in p estimation. We first estimate µ, η and ρ from Xi,g ’s. Those estimates can be consistent, when the number of independent groups, G, increases to ∞. We then plug those estimates into the likelihood of y and use the NFXP method to estimate other parameters. If there is only one single group, instead of estimating µ, η and ρ, we directly use their true values when maximizing the likelihood of y.13 We consider two models, the linear one for continuous choices with hi (z) = z and the binary choice one with hi (z) = I(z > 0) for all i. Each model is estimated by the NFXP ML method. To investigate the influence of information structures on estimation results, for each model we conduct the following experiment. First, simulate the data for the case that all Xip ’s are publicly known, and estimate the model under the true information structure, as well as using a false information structure, self-known characteristics. The latter is intended to investigate the consequence of estimation if a researcher has hypothesized a false information structure for agents’ behaviors in estimation. Second, simulate the data for the case that all Xip ’s are self-known, and estimate the model under both the true information structure and a model where characteristics are mistakenly publicly known. Results are tabulated below. Tables A.1 to A.4 present the estimation results for W being block-diagonal. In each of those tables, parameter names and true values are listed in the first and second columns. 13 Implicitly, in practice, we assume that those parameters might be recovered from other sources and used for a study on hand. 46 Each table is divided into two panels. The first panel corresponds to the data generatp ing process where all Xi,g ’s are known to every agent in the group g. The second panel p presents the results when each generated Xi,g is known only to (i, g). Estimation results under true information structures and those under misspecified information structures are in columns of corresponding panels. We calculate the empirical mean and corresponding empirical standard deviations for every estimated parameter. We also calculate sample average maximized log likelihood values. Additionally, we denote by rtrue , the proportion of simulations for which maximized sample log likelihood under a true information structure is larger than that under a corresponding misspecified information structure. From the results in Tables A.1 and A.2, the NFXP ML estimation works well for both linear and binary choice models. The bias for estimates is small, and the standard deviation decreases as sample size increases. Comparing the two models, the performance of estimates for the linear model is better than that of the binary choice model. That is intuitive, because compared to the linear model, some information on the latent dependent variables is lost in the binary choice indicator. It is interesting that if data is generated for the p case that all Xi,g ’s are publicly known, there is only a small difference between estimates under the true information structure and the wrongly hypothesized information structure in terms of bias and standard deviation. On the other hand, if data is generated from p p all Xi,g ’s being self-known, but the estimated model mistakenly treats all Xi,g ’s as public information, there will be a larger bias in the estimates of the constant term and the social interaction intensity. Additionally, that bias cannot be reduced even when sample size is increased. For each model, comparing estimation under a true information structure and a falsely hypothesized information structure, we see that estimation under the true information structure can achieve a larger likelihood value in general. Therefore, information 47 criterion in terms of maximized log likelihood can be valuable for model selection. Results p have similar features when Xi,g ’s are continuously distributed, as shown by Tables A.3 and p A.4, but the biases of estimates under false information structure are larger when Xi,g ’s are p continuously distributed than those when Xi,g ’s are discrete. Results for a single group when X p is discrete are tabulated in Tables A.5 to A.12. We can see that our estimation method still works well in this case. One interesting thing is that when the group size and number of links for each member are both multiplied by 5, the standard deviations of parameter estimates decrease except for those for β0 and λ. That is because as sample size increases, interactions between the expectations about yi ’s also goes up, which increases the multicollinearity in the equation of y ∗ . If we hold the number of links for each individual constant, the standard deviations for the estimates of β0 and λ also decreases. Analogous results are derived when the number of links is random, but then standard deviations of parameter estimates decrease faster as sample size increases. To assume that all personal characteristics are publicly known is equivalent to assuming rational expectations. From the above experiments, we can see that if data generating process is incompatible with that assumption, imposing it can bring in estimation bias. 2.7 An Empirical Application Public safety provided by local governments is a public good. A safer city can benefit not only its own residents but also people living in neighboring cities. As a result, a city might want to be a “free-rider” and spend less on police, fire, and emergency. That is, there is a substitution effect between two nearby cities in terms of public safety expenditure. However, a complementary effect may also exist. That is because an improvement in public safety service in one city may drive criminals to other places not faraway from that city. For 48 a group of cities in a region, those effects imply spatial correlation. We empirically analyze this correlation by applying our model to municipalities in the state of North Carolina. We collect data about government finance in the 2012 fiscal year from North Carolina Department of State Treasurer. From the same data source, we also get county public safety spending in the same fiscal year. Since residents’ earnings may also influence public safety spending, we collect data about city median household income from “FindtheData.org”14 , which is based on the American Community Survey. To construct spatial relation, we first collect information about municipal latitudes and longitudes from “CityLatitudeLongitude.com”15 and then calculate the distance using the Haversine formula16 . After that, we choose a cutoff value. Two cities whose distance is no bigger than that cutoff value is defined as “neighbors”. In empirical analysis, we choose three different cutoff values and run regressions for each case. Those three cutoff values are 30 kilometers, 50 kilometers and 100 kilometers. Sample statistics are summarized in Table A.14. Because expenditure on public safety is a continuous choice variable, we use the following model for empirical study: 0 0 yi = Xic α + Xip β + λ X Wi,j E[yj |XJpi , Z] − i , (2.7.1) j6=i where yi stands for the public safety spending for city i, Xic contains municipal financial and demographic information that are known to all cities in the state, such as population and total revenue. Xip represents city features that are known to i and not to other cities. 14 http://acs-economic-city.findthedata.org. 15 The latitudes and longitudes of most municipalities in our sample are listed on the web page, http://citylatitudelongitude.com/NC/Matthews.htm. Data about the rest 14 cities are found by searching on Google individually. 16 r ι2 − ι 1 ξ2 − ξ1 ) + cos(ι1 ) cos(ι2 ) sin2 ( )), 2 2 where d is the distance, r is the radius, ι1 and ι2 are latitudes, and ξ1 and ξ2 are longitudes. d = 2r arcsin( sin2 ( 49 In this paper, the variable chosen to represent them is the median household income in the current period when decisions about government spending is to made. Specifically, we assume that city median household income is composed of the state average level income and city idiosyncratic shocks. M HHIi = α + pi , (2.7.2) where M HHIi denotes the median household income of municipality i. The random variable, α, represents the state median household income. City-specific shock is denoted by pi . α is normal with mean µ and standard deviation, ω. pi ’s are i.i.d. normal with zero mean and standard deviation, γ. pi ’s are also independent of α. Then M HHIi ’s have an exchangeable joint normal distribution, with mean µ, variance η 2 = ω 2 + γ 2 and correlation coefficient, ρ = ω2 .17 ω 2 +γ 2 We estimate the model under two different information structures. In the firse scenario, the median household income is publicly known to related cities. But they are assumed to be self-known information in the second case. Results for regression using nested fixed point maximum likelihood under different settings are tabulated in Table A.15. We can find that there is a significant negative spatial correlation between neighboring municipalities, when the distance between two “neighbors” is less than 30 or 50 kilometers, although the magnitude of the effect is small. If we allow the cutoff distance for “neighbors” to be 100 kilometers, that effect is still negative but insignificant. Therefore, if two cities are close to each other, the spending on public safety by one of them has externality effect on another, resulting in a “free-riding” effect. Additionally, for the same cutoff distance, estimates when median household income is public information are very close to those when the true value of its own median household income 17 We collect the time series of the median household income for the state of North Carolina, αt , from 1984 to 2012, and estimate µ and ω by sample mean and standard deviation. 50 of a city is private information. However, in terms of estimated sample log likelihoods, the regressions under public information outperform those under private information. That is to say, when making budget decisions, municipalities in a state know quite well about the factors influencing the decisions taken by nearby cities. 2.8 Conclusion and Remarks In this paper, we analyze social interactions under incomplete information in a framework allowing heterogeneity in expectations about agents’ behaviors. Due to heterogeneous individual exogenous characteristics, one agent’s expectations on another two agents’ behaviors generally differ. With asymmetric private information, two agents may have different expectations on the behavior of a third agent in the group. Incorporating these two types of heterogeneity, especially the second one, we can analyze the influence of various forms of private information on social interactions, which extends the current literature. We relate our model to a simultaneous move game with incomplete information, in which agents’ behaviors can be viewed as best responses to their expectations on other agents’ behaviors. We adopt the concept of BNE. As a result, expectations vary with the private information used to make predictions. We find that in this case conditional expectations of agents’ behaviors in a group can be viewed as a vector-valued function of private information about group members. As a BNE corresponds to a consistent conditional expectation function, it is possible to use methods in functional analysis to investigate the existence and uniqueness of an equilibrium conditional expectation function. We also show that such a model can be estimated using the nested fixed point ML estimation. The essence of the equilibrium analysis is the use of contraction mapping. An equilibrium of the model corresponds to a fixed point of a contraction mapping in a complete function 51 space. Contraction mapping has the advantage that it not only ensures existence and uniqueness but also supplies a solution algorithm. However, it may be demanding to ensure a contraction mapping in some cases. Inspecting our sufficient condition for equilibrium uniqueness, the structure of a network and its interactions play an important role. Utilizing the row-normalizing technique in the literature of spatial econometrics, for the linear model and probit model, the sufficient conditions do not impose restrictive parameter ranges on interactions. However, the bounds on interaction intensity can be more stringent for other models or without row-normalization. It is shown by Monte Carlo experiments that estimation under a misspecified information structure may result in a loss of efficiency or additional bias. In an empirical investigation, econometricians may not know the true information structure underlying the data sets. However, the maximized log likelihood provides a valuable model selection criterion. Comparing Monte Carlo experiments of linear and binary choice models, we can see that the performance of NFXP maximum likelihood estimation for the Binary choice model is not as good as that for linear model, due to a loss of information. Hence, it will be interesting to examine the Tobit model in this framework, as it is of a nonlinear form connecting linear and binary choice structures. 52 Chapter 3: A Tobit Model with Social Interactions under Incomplete Information 3.1 Introduction Truncated and/or censored outcomes are frequently observed in practice, when agents face binding constraints in decision, in particular, the nonnegative constraint. As the most popular way to model this kind of limited dependent variables, the Tobit model has gained much attention for both empirical and theoretical reasons. For example, Kumar(2012) proposes an extension of nonparametric estimation methods for nonlinear budget-set models to censored dependent variables. Abreyava and Shen (2014) consider estimation of censored panel-data models with individual-specific slope heterogeneity. In this paper, we are interested in modeling the censored outcomes associated with social interactions where an agent’s behavior may be affected by other group members. There are two types of models about social interactions. In one stream of the literature, agents who move simultaneously have complete information about all exogenous characteristics and shocks. As a result, each agent’s behavior is influenced by the actual behaviors of others in that social group. See Lee(2007) and Boucher et al.(2014) for example. Other models are built on incomplete information, so an agent’s actions are affected by her expectations on the behaviors of other 53 agents in a social group. For instance, suppose that all exogenous characteristics are public information and idiosyncratic shocks are privately known, Manski (1993) studies linear models about socially interacted continuous choices; and Brock and Durlauf (2001) and Lee et al.(2014) investigate binary choices for socially linked agents. In their recent work, Yang and Lee (2014) extend previous research to a general form of incomplete information. They find that by allowing incomplete information in not only unobserved idiosyncratic shocks but also exogenous individual characteristics, conditional expectations about agents’ behaviors can be functions of private information. Their discussions are mainly about continuous choices and binary choices. In this paper, we analyze social interactions of censored outcomes under a general form of incomplete information and study the tax rate competition in local government of North Carolina. The basis of our model is a simultaneous-move Bayesian game when players’ actions are bounded from below by no action so that a corner solution to the expected utility maximization problem is possible. We find that a correspondence between a BNE and a profile of conditional expectation functions consistent with strategies. Similar to Yang and Lee(2014), we can embed those conditional expectation functions into a normed space and transform an equilibrium conditional expectation function into a fixed point of an operator. Additionally, a sufficient condition that ensures the operator to be a contraction mapping holds for the Tobit model in terms of a reasonable range of the parameter of interest. That range corresponds to weak or moderate social interactions, but not strong ones. Strong social interactions might demonstrate multiple (expectation) equilibria which generate stable and unstable systems as it is shown for the binary choices models by Brock and Durlauf(2001). Focusing on the case of a unique equilibrium, we discuss computation, identification, and estimation of the model. 54 We explore the information which is unique in the Tobit model to help identify the variance of the idiosyncratic shocks. Different from models with continuous and binary choices where either continuous or discrete outcomes are observed, for the Tobit model with censored outcomes, two types of information are available from data: whether an outcome is censored or observed. With both types of information, in addition to coefficients of explanatory variables, it is possible to identify the variance of idiosyncratic shocks; while that is not possible for binary choice models. By investigating interacted behaviors under a general form of incomplete information, in addition to models with networks and social interactions, our model is also related to the literature of estimation of games, such as the work of Bajari et al(2010a) and Aradillas (2010). Bajari et al (2010a) focus on estimation in a framework where all variables are public information except that each agent only gets the realization of her own shocks. Incomplete information about both exogenous characteristics and idiosyncratic shocks is discussed by Aradillas (2010). However, only a two-agent binary choice game is considered and only equilibria based on public information are discussed. Therefore, heterogeneity in social relations and in private information is missing there. For estimation, similar to Rust (1987), we nest the fixed point iteration in the maximum likelihood estimation to derive parameter estimates. That nested fixed point algorithm works well in Monte Carlo experiments. For a sample with a large number groups, we also consider group common features (unobserved by econometricians) as random effects to investigate unobserved group heterogeneity. The incorporation of group common features is important in social interactions in order to capture correlation effects as in Manski(1993) and Moffitt(2000). Since group common features are observed by group members, given their realizations, equilibrium conditional expectations can be solved in a similar way. 55 However, econometricians need to integrate over those random effects in estimation. Utilizing stochastic simulation estimation method, we can calculate a stochastic integration and estimate parameters using the nested fixed point maximum likelihood method. As an empirical application, we study the property tax rates for contiguous municipalities in North Carolina. Tax competition among local governments has been theoretically and empirically studied (See Brueckner (2003) for a comprehensive review). Most research considers the tax rate as a continuous variable. However, it is more appropriate to adopt the Tobit model as tax rates are non-negative and local governments’ choices are subject to this non-negative constraint. More recently, Porto and Revelli (2013) evaluate three empirical approaches to the analysis of spatially dependent limited tax policies. Their Tobit type models are based on interactions with latent variables and/or spatial time lags, which are different from ours. We model the property tax rate as an outcome of the Bayesian Nash Equilibrium from a static simultaneous-move game with incomplete information and estimate the corresponding parameters using the nested fixed point maximum likelihood method proposed in this paper. For the sample of municipal property tax rates, we find the existence of strong competition among near-by municipalities in North Carolina. The paper proceeds as follows. In Section 3.2, we build the model and explain it as a reduced-form equilibrium in a simultaneous-move game under incomplete information with censored outcomes. In Section 3.3, we provide sufficient conditions for the existence of a unique equilibrium and discuss its computation under different forms of incomplete information. In the subsequent two sections, Section 3.4 and Section 3.5, we discuss identification and estimation of model parameters. Following an extension to allow group unobservables 56 in Section 3.6, we investigate the finite sample performance of the nested fixed point maximum likelihood estimation by Monte Carlo experiments in Section 3.7 and study the tax competition among municipalities in North Carolina in Section 3.8. Section 3.9 concludes. 3.2 3.2.1 The Model The Model Framework Consider a group of n agents, i = 1, · · · , n, who are socially linked. Their social relations are represented by an n × n weighting matrix, Wn , such that for all i 6= j, its (i, j) entry, Wn,ij > 0, if i connects with j; and Wn,ij = 0 otherwise. The diagonal elements, Wn,ii = 0 for all i = 1, · · · , n. Wn may be either symmetric or asymmetric. For example, Wn may represent a friendship network. Then Wn,ij = 1 if i views j as one of her friends; and Wn,ij = 0 otherwise. When friendship is mutual, the network is undirected and Wn is symmetric. However, if friendship is not mutual, it is possible that i regards j as one of her friends, but j does not think i is among her good friends. Then Wn,ij 6= Wn,ji and Wn is asymmetric. In spatial econometrics, an example is the relative strength of spatial interactions among counties. In that case, Wn,ij may be the reciprocal of the geographic distance between two different counties, i 6= j. Formulated in this way, Wn is symmetric. However, once it is row-normalized, Wn would in general be asymmetric. We model the social interactions of agents’ outcomes subject to nonnegative constraint, yi ’s, under incomplete information: X 0 0 0 yi = max β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij E[yj |XJpi , Z] − i , 0 . (3.2.1) j6=i This model distinguishes our work from the classical Tobit model by incorporating social interactions. From (3.2.1), we can see that j’s behavior can affect i only if i links to j, i.e., Wn,ij 6= 0. Moreover, that impact is through i’s expectation on j based on her information 57 about various personal characteristics represented by X. That is suitable for the case when agents take actions simultaneously while some information is unknown. Therefore, similar to Yang and Lee (2014), we classify observable exogenous characteristics into three categories, 0 0 0 the group features, X g ; commonly known individual characteristics, X c = (X1c , · · · , Xnc ) ; 0 0 0 and some personal traits which may be privately known, X p = (X1p , · · · , Xnp ) . To describe information structures in a general form, for any i = 1, · · · , n, we use an n × 1 vector Ji to represent her private information about X p . That is, Ji (j) = 1 if Xjp is known by i; and Ji (j) = 0 otherwise. Xjpi then represents the random vector composed of those Xjp ’s that i knows. In (3.2.1), for simplicity, Z collects all public information, including the group features, X g , commonly known individual traits, X c , the social relations, Wn , as well as the information structure, J1 , · · · , Jn . We assume that idiosyncratic shocks i ’s are i.i.d. with the pdf, f (·), and the corresponding CDF, F (·). These idiosyncratic shocks are also independent of all exogenous characteristics and network connections. 3.2.2 Game Theoretical Foundation Our model is related to a simultaneous-move game under incomplete information where the values of actions are bounded below by zero to satisfy the nonnegative constraint. For example, for investment on community services, although a resident may prefer to disinvest, a negative amount of investment is not possible. Similarly, for competitive stock brokers, portfolio choices will also be restricted if short-sales are not allowed. We use the n × n matrix Wn to represent social relations among a group of n players. Denote the action taken by agent i by ai . Assume that ai ≥ 0. Her payoff is determined by the following 58 equation:18 0 0 0 r(ai , a−i , X g , Xic , Xip ) = α−γ(ai −β0 −Xic β1 −Xip β2 −X g β3 −λ X Wn,ij aj +i )2 . (3.2.2) j6=i As before, Xg denotes group features, Xic refers to publicly known individual character- istics, Xip stands for privately known personal traits. The idiosyncratic shocks, i , is revealed only to i. They are i.i.d. and are independent of observable exogenous variables, i.e., X and W . The private information structure about X p is also represented by vectors, J1 , · · · , Jn . Following Harsanyi (1967a; 1967b), we interpret incomplete information by unknown “types”. Let Si represent the support of Xip and T denote the common support of i ’s. The set of states, Si × T n , is defined as the set of possible values of Xip ’s and i ’s Qn i for all players. In this case, player i’s “type” is her private information about exogenous characteristics and shocks, XJpi and i . Hence, her type set is the corresponding support, Ri = τi : Q Qn i k:Ji (k)=1 Sk Si × T n → × T . The signal function is a mapping from the states to her type, Q k:Ji (k)=1 Sk × T . Her prior belief on the set of states is the joint distribution of Xip ’s, and the distribution for shocks. The prior belief is the same across all players. A strategy is a plan specifying an action for each possible realization of types. That is, si : Ri → Ai , where Ai is i’s set of actions. Because players move at the same time, they do not know which actions others will take when they are making their own decisions. They can only form expectations based on their private information. The expected payoff by taking action ai is as follows: E[r(ai , a−i , X g , Xic , Xip )|XJpi , Z, i ] 0 0 0 =α − γ(ai − β0 − Xic β1 − Xip β2 − X g β3 − λ X Wn,ij E[aj |XJpi , Z] + i )2 j6=i (3.2.3) X X + γλ2 ( Wn,ij E[aj |XJpi , Z])2 − E[( Wn,ij aj )2 |XJpi , Z] , j6=i j6=i 18 There are some other specifications which can imply linear equations for socially interacted outcomes, see Durlauf(2000), Ballester, Calvó-Armengol and Zenou(2010), and Tao and Lee(2014). 59 where the expectation on aj does not depends on i , because i ’s are independent of each other and also independent of X and Wn . Suppose that γ > 0, if there is no restrictions on ai , the best response is to choose the ideal value, 0 0 0 a∗i = β0 + X c β1 + Xip β2 + X g β3 + λ X Wn,ij E[aj |XJpi , Z] − i . j6=i However, with the constraint, ai ≥ 0, it is possible to have corner solutions. The optimal action is ai = max {a∗i , 0}. Thus, (s1 (XJp1 , 1 ), · · · , sn (XJpn , n )) is a Bayesian Nash Equilibrium (BNE) if X p p0 p p c0 g0 si (XJi , i ) = max β0 + X β1 + Xi β2 + X β3 + λ Wn,ij E[sj (XJi , i )|XJi , Z] − i , 0 . j6=i (3.2.4) Therefore, yi ’s in our model, (3.2.1), can be viewed as the realizations of actions in a BNE. 3.3 3.3.1 Equilibrium Analysis Equilibrium and Expectations Similar to Yang and Lee(2014), we employ the conditional expectations on agents’ behaviors as tools to analyze the model. From (3.2.1), fix Z = z, for any k 6= i, we have that 0 0 0 E[yi |XJpk , z] = E[H(β0 + X c β1 + Xip β2 + X g β3 + λ X Wn,ij E[yj |XJpi , Z])|XJpk , z], (3.3.1) j6=i where H(·) is a real-valued function such that for any x ∈ <1 , Z +∞ Z H(x) = I(x > c)(x − c)f (c)dc = xF (x) − −∞ cf (c)dc. (3.3.2) c<x 0 We can see that for any two different agents, k 6= k , their predictions on the behavior of a 0 third agent, i, with i 6= k and i 6= k , will be the same, if they have the same private information about X p , i.e., Jk = Jk0 . Given an information structure, Jk , XJpk is a random vector, composed of Xjp ’s such that Jk (j) = 1. Denote the underlying sample space for X p , by 60 (Ω, F, P), each Xjp is a mapping from the sample space to (<kp , Bkp ).19 So XJpk is a function P from the sample space to (<Mk , Bmk ), where Mk = ( nj=1 Jk (j))kp . Therefore, as a function of XJpk , given public information Z = z, for any realization ω, E[yi |XJpk , z] will return a real number, E[yi |XJpk = XJpk (ω), z]. That is to say, fixing an information structure, Jk , the conditional expectation is a random variable. Therefore, summarizing all possible information structures that are relevant to predict i’s behaviors, we can describe conditional expectations on i as a mapping which returns a random variable for each relevant vector of privately o n known characteristics, ψi : Ai → C, where Ai = XJpk : Wn,ki 6= 0 and C is the set of all random variables on (Ω, F, P ). Collecting expectations for all group members, we get a vector of those functions, ψ : 0 for any (A1 , · · · , An ) ∈ Qn i Qn i 0 Ai → Cn , such that ψ(A1 , · · · , An ) = (ψ1 (A1 ), · · · , ψn (An )) , Ai . We call it the “conditional expectation function”.20 In a BNE, agents’ predictions are consistent with others’ strategies. As a result, (3.3.1) implies the following “consistency condition”: 0 0 0 ψi (Ai ) = E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ψj (XJpi ))|Ai , z], (3.3.3) j6=i 0 for all i = 1, · · · , n, A = (A1 , · · · , An ) ∈ Qn i Ai . If ψ satisfies (3.3.3), we say that it is an equilibrium conditional expectation function and denote it as ψ e . Now we related the conditional expectation function to the game on which our Tobit model is based. Proposition 3.3.1 Conditional on public information, Z = z, if (s1 (XJp1 , 1 ), · · · , sn (XJpn , n )) is a BNE of the Bayesian game in Section 3.2.2, i.e., it satisfies (3.2.4), there is a conditional expectation function, ψ : 19 Qn i Ai → Cn , such that (3.3.3) kp is the dimension of Xip . 20 If Wk,i = 0 for all i, the expected value of i’s behavior does not affect those of other agents. Then expectations about yi is redundant in determining the equilibrium. To facilitate discussion in general, we define conditional expectation function for every agent, keeping in mind that only agents who are connected with others are relevant in the system. 61 holds. Conversely, if ψ : Qn i Ai → Cn satisfies (3.3.3), there is a BNE, (s1 (XJp1 , 1 ), · · · , sn (XJpn , n )). Proof. See Appendix B.1. Proposition 3.3.1 indicates a correspondence between a BNE and an equilibrium conditional expectation function, which characterizes the equilibrium outcome distribution. In this paper, investigations of estimation methods are based on the analysis of existence and uniqueness of the equilibrium conditional expectation function. 3.3.2 Unique Equilibrium According to Yang and Lee (2014), we can view each conditional expectation function as a point in a function space and transform an equilibrium conditional expectation function into a fixed point of an operator in that space. Particularly, let Ξ be a set of functions such that any ξ ∈ Ξ maps a profile of random vectors A = (A1 , · · · , An ) ∈ Qn i=1 Ai to a random vector in Cn with two properties, 0 ξ(A) = (ξ1 (A1 ), · · · , ξn (An )) , Z max max 1≤i≤n {j:Wj,i 6=0} |ξi (xpJj )|dFp (xp ) < ∞, (3.3.4) where Fp is a simplified notation for the conditional distribution of X p given public information Z = z. In Yang and Lee(2014), it is shown that the function, k · k : Ξ → <+ , with Z kξk = max max 1≤i≤n {j:Wj,i 6=0} |ξi (xpJj )|dFp (xpJi ), for any ξ ∈ Ξ is a well-defined norm and (Ξ, k · k) is a complete metric space. 62 Based on our model for censored behaviors, define an operator on Ξ, T , such that for any ξ ∈ Ξ, and A = (A1 , · · · , An ) ∈ Qn i=1 Ai , 0 0 0 T (ξ)(A)i = T (ξ)i (Ai ) = E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ξj (XJpi ))|Ai , z], j6=i (3.3.5) where H(·) is given in (3.3.2). To disuss equilibrium conditional expectations in Ξ, we need the image of T to be always in Ξ. To ensure that property, we make the following assumption on the distribution of the idiosyncratic shocks. Assumption 3.3.1 E[i ] < ∞ for any i = 1, · · · , n. For any i = 1, · · · , n, denote 0 0 0 gi (XJpi ) = β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ξj (XJpi ). j6=i By (3.3.4), R |ξj (XJpi )|dFp < ∞ uniformly for all j 6= i, given public information Z = z. gi (XJpi ) is a sum of constant and several integrable functions. Thus, R |gi (XJpi )|dFp < ∞ uniformly for all i. For any k 6= i with Wn,ki 6= 0, we have that Z |T (ξ)i (XJpk )|dFp Z Z p p = |gi (XJi )F (gi (XJk )) − cf (c)dc|dFp c<gi (XJp ) i Z ≤ |gi (XJpi )|dFp Z + Z | c<gi (XJp ) cf (c)dc|dFp i Z ≤ |gi (XJpi )|dFp + E[||], where the second inequality follows from Assumption 3.3.1.21 Therefore, maxi maxk6=i R |T (ξ)i (XJpk )|dFp < ∞. Thus, whenever ξ ∈ Ξ, T (ξ) ∈ Ξ. We can see that if ξ ∈ Ξ is a fixed point of T , it is an equilibrium conditional expectation function. Additionally, for equilibrium conditional expectation functions which are integrable with respect to the conditional distribution of X p given public information Z, 21 E[i ] < ∞ if and only if E[|i |] < ∞. 63 such a function must be a fixed point of T . That is, focusing on integrable functions, there is a one-to-one correspondence between a fixed point of T and an equilibrium conditional expectation function. Proposition 3.3.2 T : Ξ → Ξ is a contraction mapping if |λ|kWn k∞ < 1, where kWn k∞ = max1≤i≤n Pn j=1 Wn,ij . (3.3.6) Then there is one and only one BNE. Proof. See Appendix B.1. If Wn is row-normalized, kWn k∞ = 1. Hence, a sufficient condition for the existence of a unique equilibrium is |λ| < 1. Namely, the magnitude of influence from socially associated agents is less than 1 (in absolute value). This assumption is conventional for linear spatial autoregressive models in the literature of spatial econometrics and social interactions. With (3.3.6) being satisfied, we investigate solution, identification, and estimation for the model. 3.3.3 Equilibrium Computation Owing to the properties of a contraction mapping in a complete metric space, when (3.3.6) holds, there is an algorithm to solve the unique equilibrium. To be specific, since T is a contraction mapping under (3.3.6), liml→∞ T l (ξ 0 ) = ψ e , for any ξ 0 ∈ Ξ, according to the norm of (Ξ, k · k). Thus, beginning with any initial guess, by iterating the operator T , we can approximate the equilibrium conditional expectation function, ψ e . Then by applying Proposition 3.3.1, we can derive the corresponding BNE. However, in general, since ψ e : Qn i=1 Ai → C is a function of random vectors, to solve it means to derive its realizations for every point on the underlying sample space, which is by no way a trivial task. 64 Nonetheless, ψ e changes with elements in the sample space indirectly through the random vector X p and a predetermined information structure J. Therefore, if Xip ’s are discrete random vectors with a finite support, it suffices to characterize ψ e by its values on those points, which makes it possible to represent ψ e by a vector of finite dimensions. Although that representation is not applicable when Xip ’s vary continuously on a continuum support, due to the use of quadrature order stochastic integrals as an approximation of the integrals generated by expectations, we can approximate every possible realization of ψ e by a finitedimension vector. To elucidate the idea, we begin with the simplest case and then move on to more complicated ones. Publicly Known Characteristics When all exogenous characteristics, X g , X c , X p , are public information in a group, there is no uncertainty other than the idiosyncratic shocks, which are i.i.d. and independent of all exogenous characteristics. Given Z = z, ψ e reduces to an n × 1 vector, satisfying 0 0 0 ψie = H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ψje ), (3.3.7) j6=i for all i = 1, · · · , n. Although we cannot get an analytical solution, due to the nonlinearity of the function H(·), we can solve ψ e numerically by contraction mapping iteration at each given vector of parameters. Self-Known Characteristics If Xip is realized only to i(and the econometricians), we call the information structure as “self-known characteristics”, in which case Ji (k) = 0 for all k 6= i. Then any two different agents do not share private information. For i 6= k, we have that 0 0 0 ψie (Xkp ) = E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X j6=i 65 Wn,ij ψje (Xip ))|Xkp , z]. (3.3.8) Inspecting (3.3.8), if all Xip ’s are independent of each other conditional on Z = z, the realization of Xkp does not provide new information on Xip given Z = z. That is to say, 0 0 0 ψie (Xkp ) = E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ψje (Xip ))|z], j6=i for any i 6= k. Since the right-hand side does not depend on the random vector Xkp , we can view expectations on i’s behaviors as a constant. Therefore, with independent self-known characteristics, the equilibrium conditional expectation is still a vector, such that 0 0 0 ψie = E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ψje )|z], (3.3.9) j6=i Comparing (3.3.7) and (3.3.9), we can see that although in both cases every two agents 0 k 6= k have the same expectation on the behavior of a third person, i, when all exogenous 0 characteristics are public information, k and k just integrate over unobserved idiosyncratic shocks in (3.3.7). In contrast, when they know just their only realizations for X p , they have to integrate over Xip to predict i’s behaviors in (3.3.9), for Xip is not included in Z. If Xip ’s are correlated, however, conditional expectations depend on the private infor0 mation used to make predictions. Scrutinizing (3.3.8), if there are two agents k and k with Wn,ki 6= 0 and Wn,k0 i 6= 0, their private information influences predictions on i’s behaviors through the conditional distributions Fp (Xip |Xkp , z) and Fp (Xip |Xkp0 , z). Therefore, when the 0 two conditional distributions differ, k and k will form different conditional expectations. However, if we can provide conditions assuring that the two conditional distributions are the same, those two agents’ predictions on i’s behaviors will be identical once they get the same realizations. One sufficient condition is “exchangeability”, cited below from Yang and Lee (2014): 66 Assumption 3.3.2 Conditional on public information, Z = z, Xip ’s have the same support, Sp . Their conditional joint distribution, f p (X1p , · · · , Xnp |Z = z),22 is exchangeable, i.e., for any permutation, s : {1, · · · , n} → {1, · · · , n}, p p f p (X1p , · · · , Xnp |Z = z) = f p (Xs(1) , · · · , Xs(n) |Z = z). Under “exchangeability”, if Xkp = Xkp0 = x, 0 0 0 Xic β1 0 Xip β2 0 X g0 X ψie (Xkp = x) =E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij ψje (Xip ))|Xkp = x, z] j6=i =E[H(β0 + + + X β3 + λ Wn,ij ψje (Xip ))|Xkp0 = x, z] j6=i =ψie (Xkp0 = x). (3.3.10) 0 for any k, k with Wn,ki 6= 0 and Wn,k0 i 6= 0. Therefore, we can directly define ψie on the common support of Xip ’s Sp , and characterize ψ e by 0 0 0 ψie (x) = E[H(β0 + Xic β1 + y β2 + X g β3 + λ X Wn,ij ψje (y))|x, z], (3.3.11) j6=i for all i = q, · · · , n and x ∈ Sp . Now we consider two classes of joint distributions that satisfy Assumption 3.3.2. 1. (Discrete X p ) Suppose that Xip can only take one of m values in xl : 1 ≤ l ≤ m , where each xl is a vector with specific values. Given public information Z = z, the conditional probability function, fp (y|x, z), is fully captured by the transition matrix, p11 p12 · · · p1m .. .. , .. P = ... . . . pm1 pm2 · · · pmm 0 where pll0 = prob(Xip = xl |Xkp = xl , Z = z), for k 6= i. We can represent ψ e by an (nm) × 1 vector, 0 ψ e = (ψ1e (x1 ), · · · , ψ1e (xm ), · · · , ψne (x1 ), · · · , ψne (xm )) , When Xip ’s are discrete random variables, f p (·|Z = z) is the probability mass function. If Xip ’s are continuous, f p (·|Z = z) represents the density. 22 67 and characterize it by the following system of nonlinear equations: ψie (xl ) = m X 0 e0 0 plel H(β0 + Xic β1 + xl β2 + X g β3 + λ X Wn,ij ψje (xl )), e (3.3.12) j6=i e l=1 for i = 1, · · · , n and l, e l = 1, · · · , m. Beginning with any (nm) × 1 vector and iterating the contraction mapping, we can derive ψ e . 2. (Continuous X p ) Consider the case when X1p , · · · , Xnp are jointly normal with mean 0 0 0 (µ , · · · , µ ) , and variance-covariance Σ1 Σ0 2 .. . 0 Σ2 matrix Σ2 · · · Σ2 Σ1 · · · Σ2 .. . . .. . . . . 0 Σ2 · · · Σ1 Then for any i 6= j, conditional on Xjp = X, Xip is normal with mean µ+Σ2 Σ−1 1 (X −µ) 0 p kp and variance Σ1 −Σ2 Σ−1 1 Σ2 . In this case, each Xi can take any value in < . Thus, it is impossible to represent ψ e by a finite-dimension vector. However, scrutinizing (3.3.11), we find that for any i and x ∈ <kp , ψie (x) is determined by an integral, which can be approximated by the values of the function on a fixed number of values (quadrature points) using the quadrature method. To illustrate the idea, let us consider the special case that each Xip is a single random variable, i.e., its dimension kp = 1. In this case, denote Σ1 = η 2 and Σ2 = ρη 2 . We first transform integration in <1 into integration over a finite interval, [−1, 1], and then apply the Gauss-Legendre quadrature. 68 ψie (x) Z +∞ = −∞ 0 0 H(β0 + Xic β1 + x eβ2 + X g β3 + λ X Wn,ij ψje (e x)) j6=i (e x − ρx − (1 − ρ)µ)2 1 ·p )de x exp(− 2 2 2(1 − ρ2 )η 2 2π(1 − ρ )η Z 1 X 0 0 z z = H(β0 + Xic β1 + log( )β2 + X g β3 + λ Wn,ij ψje (log( ))) 1−z 1−z 0 j6=i z (log( 1−z ) − ρx − (1 − ρ)µ)2 1 1 ·p dz exp(− ) 2(1 − ρ2 )η 2 z(1 − z) 2π(1 − ρ2 )η 2 s Z 1 X 0 0 2 ze + 1 ze + 1 = H(β0 + Xic β1 + log( )β2 + X g β3 + λ Wn,ij ψje (log( ))) π(1 − ρ2 )η 2 −1 1 − ze 1 − ze j6=i 2 z e+1 (log( 1−e ) z − ρx − (1 − ρ)µ) 1 · exp(− de z ) 2(1 − ρ2 )η 2 (e z + 1)(1 − ze) s K X X 0 0 2 zek + 1 zek + 1 ≈ ωk H(β0 + Xic β1 + log( )β2 + X g β3 + λ Wn,ij ψje (log( ))) π(1 − ρ2 )η 2 1 − zek 1 − zek k=1 · exp(− j6=i z ek +1 (log( 1−e ) − ρx − (1 − ρ)µ)2 zk 2(1 − ρ2 )η 2 ) 1 . (e zk + 1)(1 − zek ) (3.3.13) In (A.2.2), the second equality is derived by a change of integration variable, x e = z log( 1−z ) and the third equality comes from a transformation of z = ze+1 2 . At last, the approximation is based on standard Gauss-Legendre quadrature, where ωk ’s are the weights, zek ’s are the corresponding abscissae, and K is the number of abscissae. ek +1 Define accordingly, xpk = log( z1−e zk ), for k = 1, · · · , K, we get nK equalities, s K X X 2 0 0 Wn,ij ψje (xpk )) ωk H(β0 + Xic β1 + xpk β2 + X g β3 + λ ψie (xpk0 ) = 2 2 π(1 − ρ )η j6=i k=1 · exp(− (xpk − ρxpk0 − (1 − ρ)µ)2 2(1 − ρ2 )η 2 ) 1 , (e zk + 1)(1 − zek ) 0 for all i = 1, · · · , n and k = 1, · · · , K. This is very similar to (3.3.12). Hence, we can solve ψie (xpk0 )’s by contraction mapping iterations. After that, for any x ∈ <1 , we can approximate ψie (x) by (A.2.2). Owing to the fast convergence of the Gauss-Legendre quadrature, we only need to take a small number of abscissae. In our Monte Carlo experiments, it is shown that good performance can be achieved in estimation when choosing K = 8. 69 When Xip ’s are of multiple dimensions, multiple-dimension quadrature methods are available but quite computationally intensive. Alternatively, we can use the stochastic integral approximation with importance sampling. Let h(ai ) be a density with its support containing the support of Xip such that fp (xp |x)/h(xp ) is well defined. Then we can generate K random draws, xpk , from h(·). The stochastic approximation will be ψie (x) K p X 1 X p c0 g0 e p fp (xk |x) H(β0 + Xi β1 + xk β2 + X β3 + λ Wn,ij ψj (xk )) . ≈ K h(xpk ) k=1 j6=i Analogous to previous discussion, we first solve ψ e (xpk )’s by contraction mapping and then approximate the function ψie (x) at any point x. For general information structures, the unique equilibrium can be calculated in a similar way. When all Xip ’s are discrete random vectors with a finite support, we can fully solve ψ e directly via contraction mapping iteration. When Xip ’s are continuous random variables, we choose a finite number of points and approximate the integration in conditional expectations by a weighted sum. If continuous Xip ’s are not exchangeable, for the stochastic simulation, the simulated values from important densities can be different for each agent. However, if the number of simulated points is the same, say K, for each i, the total number of equations for solution can remain to be nK. Since values of ψ e on those finite number of variables can be solved as a vector by contraction mapping iteration, the values of ψ e for all realizations can be approximated. 3.4 Identification For identification, we aim at recovering the model primitives, β, λ, Fx and F , from observations on X, Y , and Wn . e λ, e Fex , Fe ) for social conDefinition 3.4.1 (β, λ, Fx , F ) is observationally equivalent to (β, nections Wn at W n and information structure J at J, if they generate the same distribution 70 of the observables, e λ, e Fex , Fe ). FY,X|W n ,J (·, ·|β, λ, Fx , F ) = FY,X|W n ,J (·, ·|β, (3.4.1) e λ, e Fex , Fe ) at If (3.4.1) holds for X at X, (β, λ, Fx , F ) is observationally equivalently to (β, W n , J and X. The observational equivalence in terms of distribution, (3.4.1) implies, in particular, the same conditional expectations, i.e., e λ, e Fex , Fe ]. E[yi |XJp , z, β, λ, Fx , F ] = E[yi |XJp , z, β, k (3.4.2) k Definition 3.4.2 Suppose that (β ∗ , λ∗ , Fx∗ , F∗ ) is the true parameters for Y, X at W n and e λ, e Fex , Fe ) 6= (β ∗ , λ∗ , F ∗ , F ∗ ) cannot be observaJ. (β ∗ , λ∗ , Fx∗ , F∗ ) is identified if any (β, x tionally equivalent to (β ∗ , λ∗ , Fx∗ , F∗ ) at W n and J. Our discussion about identification is based on the hypothesis: Assumption 3.4.1 The distribution of exogenous characteristics, Fx (·), can be inferred from data about X. So we can focus on β, λ, and F . Additionally, we parametrize the distribution of i ’s. Assumption 3.4.2 i ’s are i.i.d. with the full support, <1 , according to a parametric pdf, f (·; σ), where the functional form, f (·; ·) is known but the parameter value of σ is unknown. The corresponding CDF is F (·; σ), which is strictly increasing in its argument. In the following Lemma 3.4.1, we show that σ can be identified from the relationship between the mean outcomes, E[yi |X p , z], and the average amount of censoring, E[I(yi > 0 0 0 0)|X p , z], where X p refers to the matrix of all privately known characteristics, (X1p , · · · , Xnp ) . 71 Lemma 3.4.1 Given public information, Z = z, for all i = 1, · · · , n, Z p p −1 p E[yi |X , z] = E[I(yi > 0)|X , z]F (E[I(yi > 0)|X , z]; σ)− c<F −1 (E[I(yi >0)|X p ,z];σ) cf (c; σ)dc. (3.4.3) Proof. See Appendix B.2. Since (3.4.3) holds for an arbitrarily chosen agent, we can suppress the subscript and simply write Z E[y|X p , z] = E[I(y > 0)|X p , z]F−1 (E[I(y > 0)|X p , z]; σ)− c<F−1 (E[I(y>0)|X p ,z];σ) cf (c; σ)dc. (3.4.30 ) In addition to the relation (3.4.30 ), we impose the following two assumptions for the identification of σ. (c;σ) = Assumption 3.4.3 f (c; σ) is differentiable with respect to σ. Moreover, limc→−∞ c ∂Fdσ 0. Assumption 3.4.4 The ratio, ∂F (c;σ) ∂σ f (c;σ) , is strictly monotonic with respect to c. Proposition 3.4.1 For any network, Wn , and information structure, J, if Assumptions 3.4.1 to 3.4.4 are satisfied, σ can be identified from moments, E[y|X p , z] and E[I(y > 0)|X p , z]. Proof. See Appendix B.2. The proof of Proposition 3.4.1 depends on the relationship, (3.4.30 ), which is valid with any information structure on X. In principle, with appropriate empirical observations, E[y|X p , z] and E[I(y ∗ > 0)|X p , z] can be identified nonparameterically from empirical observations. Then we can identify σ for any information structure. As an example of Assumptions 3.4.3 and 3.4.4, consider the case that i is normally distributed with zero mean and standard deviation σ. Then, F (c; σ) = Φ(c/σ) and f (c; σ) = 72 c 1 σ φ( σ ), where Φ(·) and φ(·) are respectively the CDF and pdf of the standard normal distri- (c;σ) bution. limc→−∞ c ∂F∂σ = limc→−∞ −( σc )2 φ( σc ) = 0 and ∂F (c;σ) ∂σ f (c;σ) = c φ( σc ) σ2 1 φ( σc ) σ − = − σc , which is decreasing in c. Therefore, the sufficient conditions in Assumptions 3.4.3 and 3.4.4 are satisfied. Actually, it is more transparent to see the identification of the standard deviation in the normal disturbance case through (3.4.30 ). By calculation, we can get that E[y|X p , z] = σ Φ−1 (E[I(y > 0)|X p , z])E[I(y > 0)|X p , z] + φ(Φ−1 (E[I(y > 0)|X p , z])) , which implies that σ= E[y|X p , z] . Φ−1 (E[I(y > 0)|X p , z])E[I(y > 0)|X p , z] + φ(Φ−1 (E[I(y > 0)|X p , z])) (3.4.4) Now we turn to identification of other parameters. For a single group, any group characteristics, X g , is absorbed by the constant term. So we focus on the identification of β0 , β1 , β2 and λ. We impose two additional assumptions. c p Assumption 3.4.5 Given an information structure, J = J, X c = X , X p = X , and W = c p W n , E[Yi |J = J, W = W n , X c = X , XJpk = X Jk ] can be identified (nonparametrically), for any i, k = 1, · · · , n, i 6= k, and Wn,ki 6= 0. c p Assumption 3.4.6 ln , X , X , E p has full column rank, where ln is an n × 1 vector of 1’s and E p is an n × 1 vector, whose i-th component is c P j6=i W n,ij E[Yj |J = J, W = p W n , X c = X , XJpi = X Ji ]. The full rank condition in Assumption 3.4.6 is essential to rule out multicollinearity of regressors, similar to conventional linear regressions. Proposition 3.4.2 For a single group, for a social network matrix, W n , information strucc p ture, J, and personal characteristics, X and X , if Assumptions 3.4.1 to 3.4.6 hold, we can identify β0 , β1 , β2 , λ, and σ. 73 Proof. See Appendix B.2. When there are multiple independent groups in the sample, we can identify β3 from the variation of group features, X g . 3.5 Estimation Because all exogenous characteristics are observed in the data, from econometricians’ point of view, randomness in agents’ behaviors come from the idiosyncratic shocks, i ’s. As those shocks are independent across agents, the likelihood for behaviors of agents are independent of each other. Therefore, the sample log likelihood can be written as follows: log L(Y |X c , X p , X g , Wn ) = n X 0 0 0 g0 X I(yi > 0) log f (yi − (β0 + Xic β1 + Xip β2 + X g β3 + λ i=1 + I(yi = 0) log F (β0 + X Wn,ij E[yj |XJpi , z]); σ) j6=i 0 Xic β1 + 0 Xip β2 + X β3 + λ Wn,ij E[yj |XJpi , z]; σ) . j6=i (3.5.1) In (3.5.1), the conditional expectations, E[yj |XJpi , z]’s, are determined by the equilibrium condition, (3.3.1). Therefore, we need first solve those conditional expectations in order to calculate the sample likelihood. Since the conditional expectation for the whole group is the fixed point of a contraction mapping, we can solve it by contraction mapping iteration for every parameter vector and then choose the parameter vector to maximize the sample log likelihood.23 That is, we nest a fixed point solution algorithm in the maximum likelihood estimation, similar to Rust (1987). Because the conditional likelihoods of agents’ behaviors are independent of each other and the joint likelihood of a whole sample is the product of marginal likelihoods, we can derive large sample properties of the estimator in a conventional way for independent observations. 23 In Section 3.3.3, we discuss in detail how to solve the unique equilibrium under various circumstances. 74 3.6 Extensions In practice, there are some possible common factors which can influence all group members but are unknown to econometricians. For example, when making decisions on tax rates, municipal officers know the lobbying power of different parties. Nonetheless, there might not be a measure about that from data sets. Such factors are group unobservables. In this paper, we model group unobservables as random effects. To be specific, consider G independent groups. For a group g, in addition to X g , X c,g , and X p,g , there is another set of group features lumped together into a variable, ω g , which is unobserved. Assume that ωg is independent of other variables and has a zero mean. The observed censored outcomes satisfy X 0 p0 c0 yi,g = max β0 + Xi,g β1 + Xi,g β2 + X g β3 + ω g + λ Wng ,ij E[yj,g |XJp,g , Z] − , 0 . i,g i,g j6=i (3.2.10 ) Because ω g is public information for agents, its presence will be similar to X g in the analysis of equilibrium expectation and behaviors. The distribution of ω g can be identified through variation across different groups. However, when unobservable group random effects are taken into account, our estimation method will need to be modified. Since ω g is unobservable to econometricians, we integrate over it to construct the sample likelihood function: log L(Y |X c , X p , X g , W ) = G X g=1 log ng hZ Y 0 0 0 p c f (yi,g − (β0 + Xi,g β1 + Xi,g β2 + X g β3 + ω g + λ i=1 0 X Wng ,ij E[yj,g |XJp,g , z]); σ)I(yi,g >0) i,g j6=i 0 0 p c · F (β0 + Xi,g β1 + Xi,g β2 + X g β3 + ω g + λ X (3.6.1) i Wng ,ij E[yj,g |XJp,g , z]; σ)I(yi,g )=0 fω (ω g )dω g . i,g j6=i In estimation, we use stochastic integration to approximate the integration over the unobserved group random effects, ω g ’s. That is, we derive S independent draws, ω g,s , for s = 1, · · · , S for each group g = 1, · · · , G from the density fω (·; γ), to construct a simulated 75 sample log likelihood: log L(Y |X c , X p , X g , W ) = G X g=1 log ng S Y h1 X S 0 0 0 p c f (yi,g − (β0 + Xi,g β1 + Xi,g β2 + X g β3 + ω g,s + λ s=1 i=1 0 X Wng ,ij E[yj,g |XJp,g , z]); σ)I(yi,g >0) i,g j6=i 0 0 p c · F (β0 + Xi,g β1 + Xi,g β2 + X g β3 + ω g,s + λ X Wng ,ij E[yj,g |XJp,g , z]; σ)I(yi,g )=0 i,g i . j6=i (3.6.10 ) In estimation, we will still nest fixed point iteration in a maximum likelihood estimation algorithm, replacing the true likelihood, (3.6.1), with the simulated one according to (3.6.10 ). 3.7 Monte Carlo Experiments We investigate finite sample performance of the nested fixed point maximum likelihood estimation via Monte Carlo experiments. In our experiments, the observed group feature, X g , is absent, for simplicity. The idiosyncratic shocks, i ’s, are i.i.d. with a pdf, (1/σ)φ(·/σ), where φ(·) is the standard normal density. We focus on the estimation of the coefficient of the intercept, β0 , the individual commonly known characteristics, β1 , private personal features, β2 , the interactions from socially associated agents, λ, and the standard deviation of the idiosyncratic shocks, σ. Their true values are β0 = 0, β1 = 1, β2 = 1, λ = 0.3, and σ = 1. We suppose agents are linked in social networks, which are represented by the matrix Wn , where n is the population size of the whole network. For any two agents, i 6= j, Wn,ij = 1 if i links to j; and Wn,ij = 0 otherwise. Wn,ii = 0 for all i = 1, · · · , n. Then we row-normalize Wn such that the sum of each row is equal to 1. Two types of network designs are considered. In the first case, the whole sample is composed of a collection of independent groups. Consequently, Wn can be organized as a block-diagonal matrix, with each block representing social relations in one group. We assume that all groups have the same size, 20. Within a group, for every agent, 3 other agents are randomly selected to be linked to her, which is represented as F = 3 in reported tables of estimates. The number of groups, G, is either 100 or 500. In the second network design, there is a single group of socially associated agents with its population size being either n = 200 or n = 1000. 76 As for social relations, we make experiments on two settings. First, we consider the case that the number of friends an agent can make, F , is fixed. We begin with F = 30 for the network of size n = 200. When the population size increases to n = 1000, we look at F = 30 and F = 150. That is, we compare the estimates when the number of social links is kept constant with that when social links increase proportionally with population size. After that, we turn to the case where the number of social links an agent can make is random. It can take any integer value between 0 and an upper bound U F with equal probabilities. That is, we use social links generated by a uniform discrete distribution. For population size n = 200, the upper limit is U F = 59 so that the mean is near 30, the same as the fixed number of social links for group size n = 200 in the previous case. For n = 1000, we investigate two cases, U F = 59 and U F = 299, corresponding to the settings for fixed number of social links with a group of size 1000. In this way, we try to find how the randomness of social links influences the correlation among agents’ behaviors and the performance of estimators. The commonly known individual characteristics, X c , are generated as independent variables. For Xip , we consider two cases, corresponding to our discussion in Section 3.3.3. That is, they are either discretely distributed with a finite support or continuously distributed with a continuum support. In both cases, Xip ’s are correlated with an “exchangeable” joint distribution. For the first case, we adopt the simplest setting that each Xip is dichotomous, taking a value of 0 or 1. The realizations of Xip ’s are determined as follows: 40% of the agents in a group are picked randomly. People who are picked have Xip = 1; and those who are not selected have Xip = 0. For this design, the distribution of Xip ’s does not depends on the identities of agents, with a transition (conditional) probability matrix: p p p p n−n1 n1 −1 P (X2 = 1|X1 = 1) P r(X2 = 0|X1 = 1) n−1 n−1 P = , = 1 n n−n1 −1 P (X2p = 1|X1p = 0) P r(X2p = 0|X1p = 0) n−1 n−1 where n is the number of agents in the group and n1 = 0.4n is the number of agents who are picked in the group. Members know the joint distribution of Xip ’s. Because both n and 77 n1 are observable, econometricians can infer Xip ’s joint distribution from data. So we focus on the likelihood of yi ’s conditional on regressors for estimation. p When Xip ’s are continuously distributed, we adopt the framework that Xi,g = αg + pi,g , for g = 1, · · · , G and i = 1, · · · , n. Suppose that αg ’s are i.i.d. normal with mean µ and variable σ12 . pi,g ’s are i.i.d. normal with zero mean and variance σ22 . They are also 0 p p independent of αg ’s. Then within a group g, (X1,g , · · · , Xn,g ) is jointly normal with mean 0 (µ, · · · , µ) and variance-covariance matrix, 1 ρ ρ 1 η2 . . . . . . ρ ρ where η 2 = σ12 + σ22 and ρ = σ12 2 σ1 +σ22 ··· ··· .. . ··· ρ ρ , .. . 1 p p . For any i 6= j, given Xj,g = x, Xi,g is normally distributed with mean ρx + (1 − ρ)µ and variance (1 − ρ2 )η 2 . We can see this is just the example we discussed in Section 3.3.3. Hence, we can apply the Gauss-Legendre quadrature approximation in calculating an equilibrium. For the Monte Carlo study, we choose µ = 1, η = 2, and ρ = 0.4. While values of those parameters are known to agents, econometricians need to estimate them from observed data. In this situation, we adopt a two-step algorithm p in estimation. We first estimate µ, η, and ρ from Xi,g ’s. Those estimates are consistent, when the number of independent groups, G, increases to ∞. We then plug those estimates into the likelihood of y and use the nested fixed point maximum likelihood method to estimate model parameters.24 The experiment results are tabulated in Tables B.1 to B.6. We can see that the nested fixed point maximum likelihood algorithm works well in general. Estimates have small biases and their (empirical) standard deviations decrease as sample size increases. Comparing the estimates when the sample is composed of many independent groups with those when there is just a single group of socially related agents, we find that the former ones are better 24 When the first stage estimator is consistent, estimates of other parameters will be consistent. However, the standard errors of the first stage estimates would have effects on the standard errors of the second stage estimates. Then we need to make adjustments on standard errors of the second stage estimates. 78 than the later ones in terms of smaller biases and standard deviations. That is intuitive. Although agents make decisions independently, their expectations are correlated with each other. Hence, the more intensive the social relationships, the higher the collinearity of the regressors and the less variant the generated regressors. When there are many independent groups in the sample, as the number of group increases, the independence among the sample regressors increases, which implies better performance of estimators. Additionally, when we look at the Monte Carlo experiment results for a single group, we see that when the number of social links is fixed, the standard deviations of the estimates for the intercept, β0 , and the social relation intensity, λ, do not decrease when the sample size and the number of social links increase proportionally. But that situation alleviates if the number of social links is randomly determined. That is because the additional randomness helps reduces collinearity and increases variations of the generated regressors. In Table B.7, we present the results when there is unobserved group random effects. Due to the presence of group unobservables, we need to use simulated likelihood for estimation(with simulation size S = 500). As a result, estimation biases are bigger than those without any unobserved group random effect. The performance of the nested fixed point ML estimator improves when the number of independent groups is raised. For all the experiments, we tried two different information structures. In the first case, all Xip ’s are assumed to be public information to group members. In the second case, however, Xip is only revealed to i herself. We generate data under one information structure and estimate the model under both the correct one and the corresponding misspecified one. We can see that the estimation under the true information structure implies in general a higher maximized sample log likelihood than that under the misspecified one. Therefore, maximized sample likelihood is useful for model selection. 3.8 Empirical Application We apply our model to property tax for municipalities in North Carolina. Municipal tax revenue depends on both tax rates and the amount of properties upon which tax is 79 levied. A higher rate of property tax can increase tax revenue per unit properties owned by its residents. At the same time, it may provide incentives for residents to move out to nearby municipalities which offer lower tax rates. Therefore, there is a trade-off for setting a high tax rate due to the competition between nearby municipalities. We model this tax competition by a game with incomplete information. This tax competition can be modeled as a simultaneous-move game with incomplete information in Section 3.2.2. Consider n municipalities in a state. They are related according to geographic vicinity, represented by an n × n matrix, Wn . For any two different cities, i 6= j, Wn,ij = Wn,ji = 1 if the distance between them is less than some cutoff value, e > 0; and Wn,ij = Wn,ji = 0 otherwise. as usual, Wn,ii = 0, for all i.25 We use ai to denote the rate of property tax set by city i. A tax rate must be non-negative, i.e., ai ≥ 0 for all i = 1, · · · , n. i’s payoffs by choosing ai when other municipalities choose a−i is given by (3.2.2). When a city chooses tax rates without knowing others’ decisions, the expected payoff by choosing ai is given by (3.2.3). In this equation, if λ > 0, the higher the tax rates of near-by municipalities, the higher the ideal tax rate that i wants to set, showing competition between cities. Without the non-negative constraint, the ideal tax rate is determined by some city-level demographics known to all local governments, such as the population, some city features which may not be known by other cities, say how rich residents in the city are in the current period, and the tax rates set by contiguous municipalities. For policy-making on property tax, the tax rates are non-negative.26 Therefore, if the ideal rate is negative, we will get a corner solution. When the observed tax rates, (a1 , · · · , an ), is an outcome of such an equilibrium, we have that X 0 0 ai = max β0 + X c β1 + Xip β2 + λ Wn,ij E[aj |XJpi , Z] − i , 0 . (3.8.1) j6=i 25 We can see that in this special case, Wn is a symmetric matrix with zero diagonal elements. In estimation, we row-normalize Wn such that the sum of each row is equal to 1. 26 Local governments have other ways to subsidize residents. However, for property tax, the rates are non-negative. 80 Therefore, we may use our model to investigate the problem of tax competition. Assume that condition (3.3.6) is satisfied. Hence, the observed data comes from the unique equilibrium in this model, which can be solved as a fixed point. We look at municipalities in North Carolina, collecting data on property tax rates, government finance, and demographics in the 2012 fiscal year as well as geographic statistics (latitudes and longitudes). Since the total property tax a household pays is the sum of city tax and county tax, we also collect data on county property tax rates in 2012. Data of county and city property tax rates are from North Carolina Department of Revenue. Information about municipal government finance comes from North Carolina Department of State Treasurer. Data about city median household income is found from “FindtheData.org”, which is based on the American Community Survey. Latitudes and longitudes are found from “CityLatitudeLongitude.com”27 We calculate distance between any two cities based on latitudes and longitudes, using the Haversine formula28 . Sample statistics are summarized in Table B.8. From the table, we can find a big variety among the 506 municipalities in the sample in terms of demographics and financial status. Among those municipalities, the rates of property tax is strictly positive except for 29 of them who levy no property taxes. In defining geographic vicinity, we tried two different cutoff values for distance between two municipalities, 30 kilometers and 50 kilometers, and estimate model parameters under the two associated social weighting matrices respectively. As for the public information about a municipality, we include the population and the property tax rates of related counties.29 Since it is possible that a municipal government knows more about the financial situation of people living in its own territory than other governments do, it is reasonable to include in X p some of the residents’ financial data in the current period. Here we choose 27 The latitudes and longitudes of most municipalities in our sample are listed on the webpage, “CityLatitudeLongitude.com”. Data about the rest 14 cities are found by searching on Google individually. 28 r ι2 − ι 1 ξ2 − ξ1 ) + cos(ι1 ) cos(ι2 ) sin2 ( )), 2 2 where d is the distance, r is the radius, ι1 and ι2 are latitudes, and ξ1 and ξ2 are longitudes. d = 2r arcsin( sin2 ( 29 When a municipality shares its border with several counties, we use the population-weighted average property tax rate. 81 the median household income. Specifically, we assume that city median household income depends on two factors, state average level income and city idiosyncratic shocks. M HIi = α + pi , (3.8.2) where M HIi is the median household income of municipality i in the sate (North Carolina). The random variable, α, represents state average median household income and pi , is a municipal-specific shock. Suppose that α is normal with mean µ and standard deviation, ω; pi ’s are i.i.d. with zero mean and standard deviation, γ; and pi is independent of α. Then Xip ’s have an exchangeable joint normal distribution, with mean µ, variance η 2 = ω 2 + γ 2 , and correlation coefficient, ρ = ω2 . ω 2 +γ 2 30 We estimate the model under two different information structures. The median household income is publicly known to related cities in the first scenario and is self-known in the second case. In Table B.9, we report regression results of the model under different vicinity cutoffs and information structures. We see that all estimates are significant at the 5% level with signs consistent across different regressions. The tax rate of a municipality is positively related to the tax rate set by the county to which it is affiliated and is negatively related to the median household income of the residents. The wealthier the residents in a city, the lower the rate of property tax. More importantly, the intensity of interactions between near-by cities, λ, is siginificantly positive, which supports the competition effects among municipalities. Comparing the maximized sample likelihood, we can see that models with spatial interactions outperform the traditional Tobit model. Moreover, among the four regressions with spatial interactions, we can see the magnitudes of parameters are similar to each other. The estimated competition effects between neighboring municipalities when the median household income is self-known are a little bit stronger than those under the hypothesis of public information. 30 We collect the time series of the median household income for the state of North Carolina, αt , from 1984 to 2012, and estimate µ and ω by sample mean and standard deviation. 82 3.9 Conclusion We consider social interactions for censored outcomes under incomplete information, which incorporates various types of nonlinearity. First, outcomes of a social group depends nonlinearly on exogenous variables and model parameters due to expectations of peer’s performances among socially linked agents. Second, the outcome of an agent changes with her own features in a nonlinear way due to censoring. Applying the theoretical results about socially interacted behaviors under incomplete information in Yang and Lee (2014), we relate the model with a simultaneous-move game under incomplete information with binding nonnegative constraint. We transform solution of an equilibrium conditional expectation function into calculating a fixed point of a function mapping and derive sufficient conditions for that mapping to be a contraction, which ensures the existence of a unique equilibrium. Under this scenario, we can solve the unique equilibrium, and identify and estimate model parameters. In this paper, we solve and estimate the model under the hypothesis that parameters are within the range that ensures a unique equilibrium. This corresponds to weak or moderate social interaction scenario, under which it is possible to derive clear implications from model estimation. The situation of multiple equilibria corresponds to strong social interaction. In the literature, there are also methods which allows for multiple equilibria. Cases in point are the nonparametric two-step method by Bisin et al (2011) and Leung (2013). Their basic idea is to assume that agents of identical characteristics play the same strategy ex ante and exactly the same equilibrium is played for any repetition of the game. In that way, it is possible to identify choice probabilities consistently by non-parametric method. Plugging those choice probabilities back into the likelihood function, we can derive consistent estimates.31 . For social interactions with large group sizes, a similar approach is also possible such as in Shang and Lee(2011). However, with small or moderate group sizes or in a general network 31 In Bisin et al (2011), an alternative estimation method is to solve all equilibrium for each parameter value and choose the one to maximize log likelihood. 83 setting without repetitions, computationally tractable estimation approaches remain to be found. All those are of interest for future research. In the Tobit model, there are social interactions for only one type of behaviors. The value of observed outcomes and the censoring result are determined by just a single equation. A natural extension will be the sample selection model with social interactions incorporated. With social interactions in both the outcome equation and selection equation, as well as connections between the two equations, it is possible to analyze social interactions for two classes of related choices, one continuous and another discrete. That is a concise case with multiple equilibria for future research. 84 Chapter 4: Social Interactions under Incomplete Information with Multiple Equilibria 4.1 Introduction Estimating models for social interactions with possible multiple equilibria is a challenging issue both theoretically and empirically. The distribution of outcomes is influenced by not only the unknown parameters, but also the underlying equilibrium which is actually played but not observed. Without further specifications, with any given parameter values, the sample likelihood or moment conditions are still indeterminate and cannot be used for estimation. Bajari et al.(2010a) propose a two-step algorithm to estimate a discrete choice game under incomplete information. They first derive nonparametric estimates for players’ choice probabilities and then use them to estimate other structural parameters. When there are a large number of repetitions of the same game, under the assumption that the same equilibrium is played for all the repetitions, the individual choice probabilities can be estimated consistently. However, in empirical studies of social interactions, especially interactions among friends, it is frequent to work with cross-section data sets. For data sets with individual information over years, there is also a problem with the evolution of social relations, which makes it unrealistic to assume that the same equilibrium is played repeatedly. This method, nonetheless, can still be used for some special model structures or under some additional assumptions. For example, when individual outcomes are influenced by a global equilibrium aggregate, Bisin et al.(2011) first estimate the equilibrium aggregate and then recover other parameters. Leung(2013) focuses on one particular type of 85 equilibria, where individuals with the same observable characteristics play the same strategy. With a large number of independent groups and/or a large number of agents playing the same strategy, repetitions are derived. However, the method by Bajari et al.(2010a) would be invalid with the presence of unobserved group heterogeneity which cannot be fully explained by observed characteristics. For social interaction models with potentially multiple equilibria, without assuming repetitions of the same equilibrium, this paper proposes using a parametric stochastic rule to specify a probability distribution of equilibrium selection. Although this method requires solving or approximating equilibria, it can be applied to a general model framework and data generating processes, including both discrete and continuous choices, bounded and unbounded outcomes, and incomplete information about idiosyncratic shocks and exogenous characteristics. In a related paper, Bisin et al(2011) choose both the parameter values and equilibria to maximize the sample likelihood function. That method is implicitly built on a particular selection distribution. That is, exactly one equilibrium is chosen with probability one. However, that specification cannot lead to economic implications on equilibrium selection. Moreover, if the likelihood function is of a complicated form, it might be difficult to maximize likelihood when choosing both parameter values and equilibria. Using a stochastic rule to complete a game with multiple equilibria is not new in the literature of game estimation. Bajari et al.(2010b),(2010c) use this method to identify and estimate discrete choice games under both complete and incomplete information. However, it is not straightforward to make this approach applicable to various empirical studies in social interactions under incomplete information. Socially interacted behaviors can be either discrete or continuous, bounded or unbounded. Moreover, in Bajari et al.(2010b),(2010c), only the idiosyncratic shocks are private information. In their recent research, Yang and Lee(2014) point out that there can also be incomplete information about some exogenous characteristics for social interactions. For example, when analyzing peer effects in class performance, class and individual characteristics such as grades, locations, genders, SAT scores, and IQ scores are often used as exogenous covariates. It would be unrealistic to 86 assume that individual SAT scores and IQ scores are public information. Nonetheless, incorporating various types of behaviors and information structures makes it more difficult to specify the probability distribution of equilibrium selection and compute the likelihood of the complete model. The difficulty in specification comes from the equilibrium set. The structure of Nash equilibria for a game with complete information has been well understood. So is that for a finite-player finite-action game under incomplete information. However, the existence of a pure strategy equilibrium in a game with private information when the number of possible actions and types are not finite has been an open question for a long time. Following the pioneering work by Milgrom and Weber(1985) and Radner and Rosenthal(1982), Khan and Sun(1995) and Kan and Zhang(2014) provide existence conditions when the set of actions is compact. Although their conditions apply to general abstract private information, in many empirical applications, it is unsatisfactory to restrict the values of outcomes to be bounded. Moreover, as our model is based on a reduced-form Bayesian Nash Equilibrium (BNE), their conditions about the payoff functions may not be directly applied. Therefore, we characterize the equilibrium set and derive conditions for existence and equilibrium properties specific to our framework. It is shown that, in terms of the distribution of outcomes, a BNE is equivalent to an equilibrium conditional expectation of individual choices, which are functions of the private information used to make predictions. Therefore, the equilibrium conditional expectations are used to represent BNEs. Particularly, if all exogenous characteristics are public information and only the idiosyncratic shocks are privately known, conditional expectation functions reduce to vectors in an Euclidean space satisfying a system of nonlinear equations. By the transversality theorem and the intersection theory in differential topology, it is shown that under certain regularity conditions, there are a finite number of equilibria. Another result is a sufficient condition for equilibrium uniqueness, which is weaker than the condition derived from contraction mapping by Yang and Lee (2014). When some exogenous characteristics are private information and they have a continuum support, the equilibrium 87 conditional expectations are generally functions. They are embedded into a Banach space of functions, which is related to the classical Lp spaces for integrable functions. By the Schauder fixed point theorem, sufficient conditions are derived, which ensure that the set of equilibria is nonempty and compact. As a result, the set of equilibria can be approximated by a finite number of equilibria. With a finite number of elements in the (possibly approximated) equilibrium set, a probability mass function for equilibrium selection can be specified based on a parametric selection rule. That completes the model. By the strategy of “identification at infinity” and techniques in spatial econometrics, parameters can be identified for the linear model with continuous choices, the model with binary choices, and the Tobit model. Challenges in estimation come from computation. When all exogenous characteristics are public information, solving for the set of equilibria is equivalent to getting all solutions to a system of nonlinear equations. According to Garcia and Zangwill(1981), it is possible to get all solutions via a homotopy continuation method under the regularity and path-finiteness condition, when the system can be extended to complex spaces in an analytic way. It is verified that those conditions hold for a couple of models with normal shocks. There are also discussions about the application of another related homotopy algorithm, used by Borkovski et al.(2010a) and (2010b). There is also a brief discussion about group unobservables, peer effects, and a deterministic selection rule. For models with peer effects, it suffices to focus on conditional expectations about group average outcomes. Hence, an equilibrium can be represented by a vector-valued function with less coordinate functions than that in the general model framework. Particularly, this paper discusses about two types of binary choice models in detail. In the Type I model, agents take one of two actions, 0 and 1. The utility for choice 0 is normalized to 0 for every agent. When an agent chooses 1, however, her utility depends on the number of agents who are associated with her and also choose 1. Two key features of this model is that ex post, the agents who choose 0 are not affected by others and have no effects on the utilities of agents who choose 1. The entry game is a case in point. The Type II model does not have these two properties. Like the model discussed in Brock and 88 Durlauf(2001), in the Type II model, the utility an agent can get depends on the difference between her choices and those of the agents who she is associated with. With normal idiosyncratic shocks, it is more likely to have multiple equilibria in the Type II model than it is in the Type I model. Additionally, by comparing different estimation methods in the Monte Carlo experiments, it is found that assuming equilibrium uniqueness can bring in biases when the intensity of social interactions is large and there are multiple equilibria in the data generating process. The paper proceeds as follows. The model framework is introduced in Section 4.2. Then we relate the model to a game under incomplete information and a BNE to an equilibrium conditional expectation function in Section 4.3. Sections 4.4 and 4.5 contain a detailed analysis of the set of equilibria, identification, and estimation for two different types of information structures. In Section 4.6, there is a brief discussion about unobserved group characteristics, peer effects, and the deterministic equilibrium selection rule. Section 4.7 focuses on equilibrium set characteristics and Monte Carlo experiments of two types of binary choice models. Section 4.8 concludes. Technical proofs are put in Appendices C.1 through C.5. 4.2 4.2.1 Models A Model Framework The discussion of multiple equilibria is in the framework for social interactions under a general form of incomplete information analyzed by Yang and Lee (2014). Consider a group of n socially related agents. Their relations are represented by an n × n matrix Wn . For any i, j = 1, · · · , n, Wn,ij ≥ 0. Wn,ii = 0 for all i = 1, · · · , n. For any i 6= j, Wn,ij > 0 if i connects with j; and Wn,ij = 0 otherwise. Take a game with n players for example. As the payoffs of any two agents are interdependent, Wn,ij = 1 for any i 6= j; and Wn,ii = 0. This social relation matrix can also be used in the model for peer effects in a social group, where the behavior of an agent is related to those of all the other group members. If Wn is used to represent the spatial relations of geographic regions or local governments, Wn,ij 89 is usually negatively correlated with the distance between i and j. In that case, Wn is symmetric. If we use Wn to represent friendship networks, Wn will be symmetric if only mutual friendship are considered. However, if the network is directed, it is possible that i considers j as one of her close friends while j does not regard i as her good friends. Then Wn may be asymmetric. Let yi∗ denote the latent variable. The observed outcome, or agent’s behavior, yi , depends on yi∗ in the following way: yi = hi (yi∗ ), (4.2.1) where hi (·) : < → < is a real-valued function, which can be linear or nonlinear. In the general setting, we allow the form of this function to vary across agents. In applications, we usually have hi (·) = hj (·) = h(·). The latent variable for i is related to her expectations about the outcomes for other agents as yi∗ = u(Xi ) + λ X Wi,j E[yj |XJpi , Z] − i . (4.2.2) j6=i According to (4.2.2), the value of yi∗ depends on three parts. The first part, u(Xi ), represents the direct effects of exogenous covariates, Xi = (X g , Xic , Xip ). We consider the group features, X g ; some commonly known individual characteristics, Xic , and some personal features which may be privately known, Xip . The third part is the idiosyncratic shock, represented by i . Those shocks are i.i.d. and independent of all the exogenous characteristics and social relations. Their identical distribution is characterized by a pdf function, f (·), with its cdf, F (·). Assume that i is known by individual i herself, but not by other agents or econometricians. The second part represents the interaction effects from socially associated agents. There are two features in this formulation. First, yi∗ is affected by agent j only if i connects with j; i.e., Wn,ij 6= 0. Second, j influences i through i’s expectation on j’s true outcome. In the model, i’s expectations are made on the basis of public information about social relations in Wn , group features, X g , commonly known individual characteristics, Xjc ’s, and her private information about exogenous characteristics, Xjp ’s. The information structure can be fully described by specifying the subset of agents whose 90 Xjp ’s are known to an agent. Given a finite number of agents, this is achievable using vectors. For each i, we define an n × 1 vector, Ji , such that Ji (j) = 1 if i knows Xjp ; and Ji (j) = 0 otherwise, for each 1 ≤ j ≤ n. As a result, information structure in a group of n 0 0 0 agents is represented by an n2 × 1 vector, J = (J1 , · · · , Jn ) . For every i, we define by XJpi , 0 the vector formed by Xjp ’s, which are known to i, XJpi = (Xjp : Ji (j) = 1)0 . Suppose that P Xjp is of dimension kp . Then the dimension of XJpi is Ni = ( nj=1 Ji (j))kp .32 To simplify notation, we summarize all the publicly known variables in one vector, 0 0 0 0 0 Z = (X g , X1c , · · · , Xnc , Wn,11 , · · · , Wn,1n , · · · , Wn,n1 , · · · , Wn,nn , J1 , · · · , Jn ) . (4.2.3) Then we sum up i’s information set used to make predictions by two random vectors, one about private information, XJpi , and the other about public information, Z.33 The parameter, λ, represent the intensity of social interactions. If λ > 0, the social interaction effect is positive. If λ < 0, outcomes are negatively related. The case of λ = 0 represents absence of social interactions. 4.2.2 Examples As it is shown in Yang and Lee (2014) and Yang, Qu and Lee (2014), the model, (4.2.1) and (4.2.2), is general enough to include the linear model with continuous choices, the model with binary choices and the Tobit model. We list them along with some other possible applications below. 1. (Linear Model with Continuous Choices) If hi (d) = d for all i and d ∈ <, we have that yi = u(Xi ) + λ X Wn,ij E[yj |XJpi , Z] − i . (4.2.4) j6=i For example, if 1 only knows X1p , X2p and X3p , J1 (j) = 1 for j = 1, 2, 3; and J1 (j) = 0 for j > 3, and 0 0 0 0 = (X1p , X2p , X3p ) . We assume that such an information structure J is common knowledge. However, the realizations of those random variables are private information. In this example, although it is publicly known that agent 1 knows her own features and those of agents 2 and 3, the realizations of X1p , X2p and X3p may be unknown to agent 4. 32 XJp1 33 As it is explained in Yang and Lee (2014), because the idiosyncratic shocks, i ’s are independent of each other and they are also independent of the exogenous covariates and social relations, adding the realization of i to her information set does not change i’s predictions on others’ outcomes. 91 2. (Binary Choice Model I) If hi (d) = I(d > 0), for all i and d ∈ <, where I(·) is the indicator, we have that yi = I(u(Xi ) + λ X Wn,ij E[yj |XJpi , Z] − i > 0). (4.2.5) j6=i 3. (Binary Choice Model II) Based on the assumption that agents drive utilities from taking actions similar to their friends and/or neighbors, Brock and Durlauf(2001) consider another model for binary choices, yi = 1 or − 1, according to hi (d) = 2I(d > 0) − 1, and yi = 2I(u(Xi ) + λ X Wn,ij E[yj |XJpi , Z] − i > 0) − 1. (4.2.6) j6=i 4. (Tobit Model with Homogeneous Cutoff Points) If all negative outcomes are censored, i.e, hi (d) = dI(d ≥ 0) for all i and d ∈ <, we have that X yi = max u(Xi ) + λ Wn,ij E[yj |XJpi , Z] − i , 0 . (4.2.7) j6=i 5. (Tobit Model with Heterogeneous Cutoff Points) If hi (d) = I(d > v(X g , Xic )) for all d ∈ < for i = 1, · · · , n, we have that X yi =I(u(Xi ) + λ Wn,ij E[yj |XJpi , Z] − i > v(X g , Xic )) j6=i · (u(Xi ) + λ X (4.2.8) Wn,ij E[yj |XJpi , Z] − i ). j6=i 6. (Two-sided Censored Outcomes) If hi (d) = dI(c1 < d < c2 ) for all i and d ∈ < and some parameters, c1 < c2 , we get yi = (u(Xi )+λ X Wn,ij E[yj |XJpi , Z]−i )(c1 < u(Xi )+λ j6=i X Wn,ij E[yj |XJpi , Z]−i < c2 ). j6=i (4.2.9) 7. (Ordered Multiple Choices) If hi (d) = PK k=0 k(ck < d < ck+1 ) for all i and d ∈ <, where K > 1 is a fixed integer, c0 = −∞, c1 < · · · < cK , cK+1 = ∞, we derive the 92 following model: yi = K X k(ck < u(Xi ) + λ k=0 X Wn,ij E[yj |XJpi , Z] − i < ck+1 ). (4.2.10) j6=i 8. (Investment Decisions with Cobb-Douglas Production Functions) At last, consider interactions in investment among competing firms or contiguous local governments. yi∗ denotes the latent investment. yi represents the output. Assume that output is influenced by technology A, the labor input Li , as well as the capital investment. Since investment cannot be negative, the actual investment is max {yi∗ , 0}. Assume that A can be estimated from other data sources. Li ’s are public information and exogenously given. The potential investment yi∗ is still determined by (4.2.2). With a Cobb-Douglas production function, ∗ ι yi = hi (yi∗ ) = AL1−ι i (max {yi , 0}) , (4.2.11) where o < ι < 1. Various information structures can be discussed in this framework. 1. (Publicly-known Characteristics) If all exogenous covariates are public information, Ji = 1n , for all i = 1, · · · , n, where 1n is an n × 1 vector of 1’s. 2. (Self-known characteristics) If Xip is revealed just to agent i, for any i = 1, · · · , n, Ji (i) = 1 and Ji (j) = 0 for all j 6= i. 3. (Socially-known Characteristics) If for any two agents i and j, i knows Xjp if and only if i connects to j, for all i, we have that Ji (i) = 1, Ji (j) = I(Wn,ij > 0) for all i 6= j. 4.3 4.3.1 Game, Equilibrium, and Expectations Game Explanations In the model, (4.2.1) and (4.2.2), an agent’s behavior is interacted with those of others when she is uncertain about some of their attributes. Hence, we can view outcomes of the model as the outcome of an equilibrium for a simultaneous move game with incomplete 93 information. According to Harsanyi (1967a; 1967b), assuming that agents’ payoffs are related to a randomly determined “state”, an agent’s uncertainty comes from the fact that her signal does not completely recover the true state. Then predicting others’ unknown characteristics is equivalent to making inference about the realized state from her own signal. We follow the setup in Osborne and Rubinstein (1994). For n group members, let Xpi represent the support of Xip and E the common support Q of i ’s. The set of states, ni Xi × E n , is the set of all possible Xip ’s and i ’s for all players. In this case, player i’s “type” is her private information, XJpi , and idiosyncratic shocks, i . Q Her set of types is then Ti = k:Ji (k)=1 Xk × E. The signal function is a mapping from Q Q the states to her type, τi : ni Xi × E n → k:Ji (k)=1 Xk × E. Her prior belief on the set of states is the joint distribution of Xip ’s and the distribution of the i.i.d. shocks, F (·), which is the same for all agents. The set of actions for agent i is denoted by Yi . Her strategy is a Q contingent plan specifying the action to take for each realized type, si : k:Ji (k)=1 Xk × E → Yi . The payoff received by an agent depends on actions taken by all group members, Q y = (y1 , · · · , yn ) ∈ ni=1 Yi , as well as the uncertain state. se = (se1 (·), · · · , sen (·)) is a Bayesian Nash Equilibrium (BNE) in this game if sei (XJpi , i ) = hi (u(Xi ) + λ X Wn,ij E[sej (XJpj , j )|XJpi , Z, i ] − i ). (4.3.1) j6=i With specific hi (·) functions, it is possible to build a structural model. See Yang and Lee (2014) for continuous and binary choices and Yang, Qu and Lee (2014) for the Tobit model when agents’ actions are subject to the non-negative constraint. 4.3.2 Equilibrium and Expectations Under the assumption that observed outcomes are realizations of a BNE (4.3.1), we relate equilibrium strategies, se = (se1 (·), · · · , sen (·)), to conditional expected outcomes in an n o equilibrium, E[yj |XJpi , Z = z] . Pick any i and k such that i 6= k. By consistency, we get that E[yi |XJpk , Z = z] = E[hi (u(Xi ) + λ X Wn,ij E[yj |XJpi , Z = z] − i )|XJpk , Z = z]. j6=i 94 (4.3.2) Therefore, given public information Z = z, for any i, conditional expectations on i’s behavior 0 0 0 depends on the private information. Two agents k and k , where k 6= i, k 6= i, and k 6= k , have the same expectations about i’s behavior if they have the same private information, i.e., XJpk = XJp 0 . k Similar to Yang and Lee (2014), conditional expectations can be modeled as functions and embedded into a function space. In this paper, however, conditional expectation functions are defined in a different way in order to utilize properties of classical function spaces.34 Given social relations Wn and information structure J = (J1 , · · · , Jn ), for each i, we collect all possible types of private information which are used by others to predict her actions as, n o Jbi = Je ∈ {0, 1}n : Je = Jj for some j s.t. Wn,ji 6= 0 . (4.3.3) Denote the number elements in this set by Mi .35 Considering that two agents may have P the same type of private information, Mi ≤ j6=i Mn,ji . Labeling elements in Jbi by m = 1, · · · , Mi , we get n o Jbi = Jei,1 , · · · , Jei,Mi . (4.3.4) For any j with Wn,ji 6= 0, there is exactly one vector in Jbi representing j’s private information, Jj . That is, there is a unique mi (j) ∈ {1, · · · , Mi }, such that Jj = Jei,mi (j) . The mapping, mi (·), defined in this way is onto. Let Xpi be the support of Xip (conditional on public information Z = z). It is a subset of <kp .(Recall that kp is the dimension of Xip ’s.) 34 Treating the privately known characteristics used to make predictions as a random vector, Yang and Lee (2014) define the conditional expectation about i’s behaviors as a function of all possible random vectors used to make predictions. That function maps a random vector in its domain to a random variable, specifying the value of conditional expectations for each realization of that random vector. Consider a group of 3 people as an example. Agent 1 is linked to agents 2 and 3, i.e., W3,21 6= 0 and W3,31 6= 0. XJp2 and XJp3 are the private information for 2 and 3respectively. The conditional expectation about y1 , ψ1 , is then defined on the set of random vectors, A1 = XJp2 , XJp3 . ψ1 (XJp2 ) is a random vector, such that for each ω in the sample space of Xip ’s, ψ1 (XJp2 )(ω) = E[y1 |XJp2 = XJp2 (ω), Z = z]. In our model, however, we define exclusively a function for every possible type of private information that is used to predict i’s behaviors. For the aforementioned example, if J2 6= J3 , we define functions ξ1,J2 and ξ1,J3 , on the support of those random vectors. That is, ξ1,J2 (xpJ2 ) = E[y1 |XJp2 = xpJ2 , Z = z] and ξ1,J3 (xpJ3 ) = E[y1 |XJp3 = xpJ3 , Z = z]. The conditional expectation functions defined in this way are mappings from a subset of an Euclidean space to <1 , which makes it convenient to apply the properties of the classical Lp spaces. 35 If Mi = 0, i’s actions does not influence others’ choices. Then expectations on her behaviors do not influence the distribution of outcomes. It is redundant in the system. In computation, we can just exclude the redundancy. 95 For any (i, m), denote by Xpi,m the support of privately known characteristics contained in P Jei,m . Then it is a subset of the Euclidean space with dimension kp ( nj=1 Jei,m (j)). We denote its elements simply by xpi,m . For example, if Jei,m (j) = 1, for j = 2, 3; and Jei,m (j) = 0, 0 0 0 otherwise. Xpi,m is the support for (X2p , X3p ). Its elements are realizations, (xp2 , xp3 ) . Define e : Xp 1 ξi,m i,m → < as e ξi,m (xpi,m ) = E[yi |X pe Ji,m = xpe , Z = z]. (4.3.5) Ji,m e is a mapping from a subset of an Euclidean space with dimension k Then ξi,m p Pn j=1 Ji,m (j) e to <1 .36 Collecting all those functions, we derive a vector-valued function, e e e e , · · · , ξn,1 , · · · , ξn,M ), ξ e = (ξ1,1 , · · · , ξ1,M n 1 p m=1 Xi,m Q n Q Mi whose domain is i=1 and range is <M , where M = Pn i=1 Mi . ξ e has two properties: e ξ e (xp1,1 , · · · , xp1,M1 , · · · , xpn,1 , · · · , xpn,Mn )i,m = ξi,m (xpi,m ); (4.3.6) and the equilibrium condition, e ξi,m (xpi,m ) = E[hi (u(X g , Xic , Xip ) + λ X p p e Wn,ij ξj,m (Xj,m ) − i )|Xi,m = xpi,m , Z = z], j (i) j (i) j6=i (4.3.7) for all i = 1, · · · , n, m = 1, · · · , Mi , and xpi,m ∈ Xi,m .37 In particular, when all exogenous characteristics are public information, conditional expectations only depend on public information. In that case, E[yi |Z = z] is a scalar for any i and ξ e reduces to an n × 1 vector, 0 ξ e = (ξ1e , · · · , ξne ) . In Appendix C.1, we construct a Banach space, (Ξ(Wn , J ), k · k). ξ ∈ (Ξ(Wn , J ), k · k) if and only if each of its coordinates, ξi,m , is an element of L1 (Xpi,m , Bi,m , µp ; <1 ), the space of all integrable functions on Xpi,m under the probability measure implied by the distribution 36 e In principle, ξi,m can be defined on the whole Euclidean space. However, considering the support of may not be full, it will be convenient to work with bounded subset of the Euclidean space sometimes. Therefore, we just use an abstract subset at this stage. Xip ’s 37 By our definition, given i, for each j 6= i, via the mapping mj (·), we find exactly the private information p Jei,mj (i) = Ji . Thus, in (4.3.7), all the Xj,m ’s are the same. They are just XJpi , the random vector of j (i) exogenous characteristics which are known by i. 96 of Xip ’s conditional on public information Z = z, µp , with the standard L1 norm. Focusing on the conditional expectation functions with this property, the properties of the classical function space, L1 (Xpi,m , Bi,m , µp ; <1 ), can be utilized to characterize equilibria in a general setting. The relationship between a BNE and a consistent conditional expectation function is expressed by the following proposition. Q Proposition 4.3.1 Conditional on public information Z = z, if se = (se1 , · · · , sen ) : ni=1 Ti × Q E n → ni=1 Yi is a BNE of this model, then there is a conditional expectation function ξ e satisfying conditions (4.3.6) and (4.3.7). On the contrary, if there is a conditional expectation function satisfying conditions (4.3.6) and (4.3.7), there is a BNE for this model. Proof. See Appendix C.1. Due to Proposition 4.3.1, there is an onto correspondence from the set of BNEs to the set of equilibrium conditional expectation functions. Although it is possible that two different BNEs may imply the same equilibrium conditional expectation functions, those two BNEs are equivalent in terms of the distribution of equilibrium outcomes.38 Therefore, from now on, we will focus on the equilibrium conditional expectation functions. Such a functions is simply referred to as an equilibrium. The set of those functions are viewed as the set of equilibria. For the discussions in subsequent sections, we impose the following assumptions. Assumption 4.3.1 f () > 0 for all −∞ < < ∞. That is, the support for all i ’s is <. Assumption 4.3.2 For any real number a, Hi (a) = E [hi (a − )] < ∞ and is differentiable with respect to a. Hi (·) can take various forms in different models. 1. (Linear Model with Continuous Choices) Hi (a) = a and dHi (a)/da = 1. 38 Re-defining the set of BNEs by equivalence classes in terms of the distribution of equilibrium outcomes, there is a one-to-one correspondence between the set of BNEs and the set of equilibrium conditional expectation functions. 97 2. (Binary Choice Model I) Hi (a) = F (a) and dHi (a)/da = f (a). 3. (Binary Choice Model II) Hi (a) = 2F (a) − 1 and dHi (a)/da = 2f (a) 4. (Tobit Model with Homogeneous Cutoff Points) Hi (a) = aF (a) − R c<a cf (c)dc and dHi (a)/da = F (a). 5. (Tobit Model with Heterogeneous Cutoff Points) With cutoffs being v(X g , Xic ), R Hi (a) = aF (a − v(X g , Xic )) − c<a−v(X g ,X c ) cf (c)dc; i and dHi (a)/da = F (a − v(X g , Xic )) + v(X g , Xic )f (a − v(X g , Xic )). 6. (Two-sided Censored Outcomes) Hi (a) = a(F (a − c1 ) − F (a − c2 )) − R a−c1 a−c2 cf (c)dc and dH1 (a)/da = F (a − c1 ) − F (a − c2 ) + c1 f (a − c1 ) − c2 f (a − c2 ). P 7. (Ordered Multiple Choices) Hi (a) = K k=0 k(F (a − ck+1 ) − F (a − ck )) PK and dHi (a)/da = k=0 k(f (a − ck+1 ) − f (a − ck )). 8. (Investment Decisions with Cobb-Douglas Production Function) R R 1−ι ι ι−1 f (c)dc. Hi (a) = AL1−ι i c<a (a − c) f (c)dc and dHi (a)/da = ιALi c<a (a − c) Yang and Lee (2014) and Yang, Qu, and Lee (2014) find a sufficient condition for the existence of a unique equilibrium. i (a) Proposition 4.3.2 Under the condition that maxi supa | dHda | < ∞, if |λ| kWn k∞ max sup | i where kWn k∞ = max1≤i≤n Pn j=1 Wn,ij , a dHi (a) | < 1, da (4.3.8) there is one unique equilibrium in the model. Proof. See the Appendix in Yang and Lee (2014). With a unique equilibrium, the model, (4.2.1) and (4.2.2), will be complete. Parameters can be estimated using standard likelihood or moment conditions. According to (4.3.8), in order to ensure a unique equilibrium, the possible range for the intensity of social interactions, λ, depends on the number of links and the derivative of the functions, Hi (·)’s. If they are large, the range for λ will be very narrow. kWn k∞ can be big for large groups. For 98 example, when Wn represents the relations among n players in a game, kWn k∞ = n − 1, which increases with the group population. In the spatial econometrics literature, it is often fn,ij = to row-normalize Wn so that W P Wn,ij j6=i Wn,ij fn k∞ = 1, which helps alleviate the and kW problem. For social interactions, when Wn,ij is equal to either 1 or 0 depending on whether i is socially associated with j, after this row-normalization, any Wn,ij for j 6= i is weighted by the number of social links formed by i. In general, the number of social links an agent forms differs from person to person. For an agent who has more social relations, her links will be discounted more heavily than those of other agents who have less social relations. That is, links are treated differently in the same network with row-normalization. In addition, when the social links are not fully exogenous like those in Yang (2014), the number of social links an agent has is endogenous. Then row-normalization might change the correlation between i (a) | depends on the type of behaviors. It the social relations and outcomes. maxi supa | dHda is no bigger than 1 for the continuous choice model and the Tobit model with zero cutoffs. However, for some models, such as the Tobit model with heterogeneous cutoff points and the model of ordered multiple choices listed in this paper, maxi supa |dHi (a)/da| can be very large. See also Yang (2014) for a similar case in the sample selection model. Moreover, by imposing (4.3.8), possible strong interactions are excluded. This paper investigates model estimation without imposing Assumption (4.3.8). It is shown that the method of random equilibrium selection in Bajari et al (2010b) and (2010c) can be extended to this general framework for social interactions. Suppose that an equilibrium ξ e is selected from the set of equilibria, E(X, Wn ), according to probability measure, µe (γ(·; X, Wn ), α), (4.3.9) where γ(·; X, Wn ) is a vector-valued criterion function whose coordinates correspond to different criteria, such as Pareto efficiency and maximal entry rate. α is a parameter vector, 0 attaching weights to different criteria. Then the full likelihood for y = (y1 , · · · , yn ) can be written as Z L(y; X, Wn ) = n Y f (yi |ξ e , X, Wn )dµe (γ(ξ e ; X, Wn ), α), E(X,Wn ) i 99 (4.3.10) where we apply the independence of yi ’s owing to the independence across the i.i.d. idiosyncratic shocks. A practical specification depends on characterization of the set of equilibria. Although it is well established that there are a finite number of equilibria for the finite-player discrete choice game analyzed by Bajari et al.(2010a),(2010b), and (2010c), as far as I know, there are no conditions in the literature that can be directly and easily applied to our model framework (See Khan and Sun (2002) for a survey of related theories). Therefore, we investigate the set of equilibria in this model and derive conditions specific to our framework. As it turns out that the equilibrium sets have different characteristics under different information structures, according to whether exogenous characteristics are private information or not, we discuss those two scenarios separately. 4.4 Estimation with Publicly Known Characteristics We begin our discussions with the case that all exogenous covariates are public information and only the idiosyncratic shocks are privately known. In this case, there are just two types of exogenous characteristics, X g and Xic ’s. Since all expectations are based on public information, every agent other than i will form the same expectations on i’s behavior. Therefore, an equilibrium conditional expectation function ξ e reduces to an n × 1 vector, 0 ξ e = (ξ1e , · · · , ξne ) . In this case, (4.3.7) can be rewritten as ξie = Hi (u(Xi ) + λ X Wn,ij ξje ), (4.4.1) j6=i for all i = 1, · · · , n. Given exogenous covariates, X, and social matrix, Wn , define S : <n → <n such that for all i = 1, · · · .n, S(ξ; X, Wn )i = Hi (u(Xi ) + λ X Wn,ij ξje ) − ξi . (4.4.2) j6=i ξ e ∈ <n is an equilibrium conditional expectation vector if and only if S(ξ e ; X, Wn ) = 0. Therefore, for a group of n agents with publicly-known exogenous covariates, X, and social relations, Wn , the set of equilibria, E(X, Wn ), can be describes as the set of solutions to 100 this system of nonlinear equations. That is, E(X, Wn ) = {ξ ∈ <n : S(ξ; X, Wn ) = 0} . 4.4.1 (4.4.3) Characterization of the Equilibrium Set In this section, we characterize the set of equilibria, E(X, Wn ), through inspecting solutions to S(ξ; X, Wn ) = 0. Solving equations is one of the central topics in mathematics. There are a myriad of well-established results about it. For this particular problem, we employ the oriented intersection theory to analyze solutions to the equation system, (4.4.2), aiming to derive conditions specific to this model framework. The applications of differential topology for equilibrium characterizations is not new in economic studies. For example, Debreu (1970) and Dierker (1972) used these theories to analyze the set of competitive equilibria in an economy. The key idea of this approach is to deform S(·; X, Wn ) and connect it to a function with a simpler form in a “smooth” way, which is called a homotopy. Implications about the set of zeros of S(·; X, Wn ) are then derived from the set of zeros of that simpler function. Garcia and Zangwill (1981) provide an intuitive introduction and basic results for this method. In this paper, we utilize some more general results from the textbook by Guillemin and Pollack (1974) and construct homotopies tailored to our model framework. In this way, several properties of the set of equilibria and, especially, a new sufficient condition for the existence of a unique equilibrium can be derived. We present our main results and leave technical proofs to Appendix C.2. Define the function T (·; X, Wn ) : <n → <n such that T (ξ; X, Wn )i = Hi (u(Xi ) + λ X Wn,ij ξj ), (4.4.4) j6=i for all i = 1, · · · , n. We can see that ξ is a solution to S(ξ; X, Wn ) = 0 if and only if ξ is a fixed point of T . The rate that the Euclidean norm of kT (ξ)kE explodes relative to kξkE is crucial for the existence of an equilibrium. 101 Assumption 4.4.1 For any group, X, Wn , there is a real number b < 1, such that39 lim kξkE →∞ kT (ξ; X, Wn )kE /kξkE = b. (4.4.5) Under Assumption 4.4.1 and an easy-to-satisfy regularity condition, the conclusions in Proposition 4.4.1 hold.40 Proposition 4.4.1 Under Assumptions 4.3.1, 4.3.2, C.2.1, for any regular social group, (X, Wn ), if, in addition, Assumption 4.4.1 holds, there is r0 (X, Wn ) > 0, such that all equilibria are within the open ball B(0, r0 (X, Wn )) = {ξ ∈ <n : kξkE < r0 (X, Wn )} and the number of equilibria is finite. Proof. See Appendix C.2. Finiteness comes from regularity. Regularity implies that all equilibria are isolated. As a result, in the closed ball B[0, r0 (X, Wn )], which is compact and contains B(0, r), the number of equilibria is finite. Moreover, we can derive a new condition for the existence of a unique equilibrium. Proposition 4.4.2 Under Assumptions 4.3.1, 4.3.2, C.2.1, and 4.4.1, for a regular group, the total number of equilibia is odd. In addition, if the Jacobian determinant, det(DS(ξ; X, Wn )), does not change its sign in the ball B(0, r0 (X, Wn )) which contains all equilibria, there is a unique equilibrium. Proof. See Appendix C.2. Recalling that in Yang and Lee (2014), the sufficient condition for a unique equilibrium is (4.3.8). The following lemma shows that condition (4.3.8) is stronger than the condition in Proposition 4.4.2. Lemma 4.4.1 When (4.3.8) holds, sgn(det(DS(ξ; X, Wn ))) = (−1)n for all ξ ∈ <n . 39 The subscript, “E”, denotes the Euclidean norm. 40 See Appendix C.2 for detailed discussions about the meaning and implications of the regularity condition. Additionally, it is shown that for models where u(·) is linear in Xic with a non-zero slope, if dHi (a)/da 6= 0 for any i and a ∈ <1 , almost all groups, (X, Wn ), satisfy the regularity condition. 102 Proof. See Appendix C.2. The above characterizations of the equilibrium set hinges on Assumption 4.4.1. It is easy to see that this condition is satisfied when Hi (·) are uniformly bounded. Therefore, the above results apply for models with binary choices, two-sided censored outcomes and ordered multiple choices. It also holds for unbounded (piecewisesly) continuous choices with the magnitude of Hi (a) increases with the magnitude of a ∈ <1 , as long as the increasing rate is not very big. A case in point is the example of investment choices. Since the elasticity coefficient in the Cobb-Douglas production function, ι, is between 0 and 1, using Jensen’s inequality, we have that ι X (max u(X ) + λ W ξ − , 0 f (i )di |T (ξ)i | = AL1−ι i n,ij j i i j6=i Z X ≤ AL1−ι [ max u(X ) + λ W ξ − , 0 f (i )di ]ι i n,ij j i i Z j6=i = AL1−ι i [HT (u(Xi ) + λ X Wn,ij ξj )]ι , j6=i where HT (a) = aF (a)− R c<a cf (c)dc is the H(·) function corresponding to the Tobit model. When E[] < ∞, HT (a)/|a| ≤ 1 when |a| is sufficiently large. For any i, HT (u(Xi ) + λ P Wn,ij ξj ) ι |T (ξ)i | 1−ι P j6=i ≤AL kξkE |u(Xi ) + λ j6=i Wn,ij ξj | |u(Xi ) + λ P Wn,ij ξj | ι kξkι j6=i E · . kξkE kξkE When |λ|kWn k∞ < ∞, the right hand side goes to zero as kξkE goes to infinity, for 0 < ι < 1.41 Therefore, we derive a condition which can guarantee the existence of a pure strategy BNE with unbounded piecewisely continuous choices and non-compact private shocks. Our condition, Assumption 4.4.1, therefore, is complementary to sufficient conditions about the existence of an equilibrium for games with private information under general game settings (See Khan and Sun (2002) for a research survey and Khan and Zhang (2014) for a recent 41 When kξk → ∞, |ξj | → P E HT (u(Xi )+λ j6=i Wn,ij ξj ) ι ( ) goes to kξkE ∞ for at least one j. If |ξk | < ∞ for all k with Wn,ik > 0, P H (u(Xi )+λ Wn,ij ξj ) i 0, as kξkE goes to infinity. Otherwise, ( T|u(Xi )+λ P j6=W )ι is n,ij ξj | j6=i P bounded by 1 as kξkE is large enough. As |u(Xi ) + λ j6=i Wn,ij ξj | ≤ |u(Xi )| + |λ|kWn k∞ kξkE , when |u(X )+λ P W ξ | ι i n,ij j j6=i |λ|kWn k∞ < ∞, is also bounded. kξkE 103 improvement). However, in some models, to ensure Assumption 4.4.1 to hold, we need to impose restrictions on the range of λ, which may be stringent sometimes. The linear model with continuous choices and the Tobit model are such examples. We discuss how to characterize equilibria in those models. First, for continuous choices, Hi (a) = a for all i = 1, · · · , n and a ∈ <. S(·; X, Wn ) = 0 is actually a linear equation system: S(ξ; X, Wn ) = u + λWn ξ − ξ = 0, 0 (4.4.6) 0 where u = (u(X1 ), · · · , u(X)n ) and ξ = (ψ1 , · · · , ψn ) . For a regular group, DS(ξ; X, Wn ) = λWn − In is non-singular. Therefore, (4.4.6) has one and only one solution in that case. That is to say, for a regular group, in the linear model of socially interacted continuous choices, there is one and only one equilibrium. For the Tobit model with homogeneous cutoffs which are normalized to be equal to 0, R Hi (a) = H(a) = aF (a) − c<a cf (c)dc, is unbounded and strictly increasing. S(·; X, Wn ) = 0 is a system of nonlinear equations: S(ξ; X, Wn )i = H(u(Xi ) + λ X Wn,ij ξj ) − ξi , (4.4.7) j6=i for i = 1, · · · , n. Define pi = E[I(yi > 0)], for all i = 1, · · · , n. There is a one-to-one correspondence between p and ξ, for ξi = E[yi ] = H(F−1 (pi )) holds for all i = 1, · · · , n. Then (4.4.7) can be written as F (u(Xi ) + λ X Wn,ij H(F−1 (pj )) − pi = 0. (4.4.70 ) j6=i for i = 1, · · · , n. This is a system of equations about p. Because p ∈ [0, 1]n and F (·) is bounded, we can apply Propositions 4.4.1 and 4.4.2, for the Tobit model and find an odd number of equilibria within a ball B(0, r) with r > 1. Similarly, if the cutoffs are heterogeneous and are modeled as v(X g , Xic ), we have that ξi = E[yi ] = H(F−1 (pi ) + v(X g , Xic )). Then the counterpart to (4.4.7) is X F u(Xi ) + λ Wn,ij H(F−1 (pj ) + v(X g , Xic )) − v(X g , Xic ) − pi = 0. j6=i 104 (4.4.70 ) 4.4.2 Selection Rule and Complete Likelihood When there are a finite number of equilibria for a group, to attach a probability of equilibrium selection is to attach a probability mass to each point in this set. Following Bajari et al.(2010b) and (2010c), the probability masses are associated with some selection crite0 ria and parameters. To be specific, let γ(ξ e , X, Wn ) = (γ1 (ξ e , X, Wn ), · · · , γL (ξ e , X, Wn )) be a vector composed of 0’s and 1’s representing equilibrium properties. For example, γ1 (ξ e , X, Wn ) = 1 if ξ e is Pareto dominated by another equilibrium; and γ1 (ξ e , X, Wn ) = 0 otherwise. γ2 (ξ e , X, Wn ) = 1 if the equilibrium expected utility is maximal in ξ e . For the model with binary choices, we can make γ3 (ξ e , X, Wn ) = 1, if the number of agents who choose 1 is bigger in ξ e than that in any other equilibria. Similarly, for the Tobit model, we can make γ4 (ξ e , X, Wn ) = 1, if the number of agents whose behaviors are not censored is maximized at ξ e . γl (ξ e , X, Wn )’s can also be continuously dependent on the properties of an equilibrium, say the equilibrium total expected utilities. Let α ∈ <L denote the weight. Suppose that given a set of equilibia, E(X, Wn ), an equilibrium is picked randomly according to the following random rule: ξ e,l is selected if 0 0 0 α γ(ξ e,l , X, Wn ) + sl ≥ α γ(ξ e,l , X, Wn ) + sl0 , 0 (4.4.8) 0 for any ξ e,l ∈ E(X, Wn ) and l 6= l , where sl ’s are i.i.d. equilibrium-specific shocks with type-I extreme value distribution. Therefore, the propabiity that ξ e,l is selected is 0 exp(α γ(ξ e,l , X, Wn )) . 0 e,l ξ∈E(X,Wn ) exp(α γ(ξ , X, Wn )) ρ(ξ e,l ; E(X, Wn ), α) = P (4.4.9) 0 Then the complete likelihood function for an outcome y = (y1 , · · · , yn ) in a social group is as follows, L(y; X, Wn ) = X e ρ(ξ ; E(X, Wn ), α) ξ e ∈E(X,Wn ) n Y i=1 which is the basis for identification and estimation. 105 f (yi |ξ e ), (4.4.10) 4.4.3 Identification Our analysis about identification is based on the parametric and distributional assumptions below. First, we assume that the payoff function, u(·), is linear in covariates. Since all exogenous characteristics are public information in this section, we only need to consider X g and X c . 0 0 Assumption 4.4.2 u(Xi ) = β0,0 + X g β0,1 + Xic β1 for all i = 1, · · · , n. Assumption 4.4.3 The pdf for the i.i.d. idiosyncratic shocks, i ’s, is f (·; σ) with a known function form and an unknown parameter, σ > 0. Assumption 4.4.4 Xic ∈ <L and i ’s have full support. For interactions within one group, the group characteristics, X g , is absorbed by the constant term. Therefore, we suppress β0,1 = 0 now. e λ, e σ α, β, e) are obserDefinition 4.4.1 Given social relations, W = W n , (α, β, λ, σ) and (e vationally equivalent at W n , if they imply the same distribution of observables, namely, e λ, e σ FY,X|W n (·, ·; α, β, λ, σ) = FY,X|W n (·, ·; α e, β, e). (4.4.11) e λ, e σ (α, β, λ, σ) is identifiable at W n , if any (e α, β, e) 6= (α, β, λ, σ) is not observationally equivalent to (α, β, λ, σ) at W n . Different functions hi (·)’s correspond to different types of behaviors. In this paper, we concentrate on identifying model parameters for three of them, linear model for continuous choices, binary choices, and the Tobit model for censored outcomes. Those three models are representative in terms of the relationships between the observed outcomes and the latent variables. For the linear model with continuous choices, because there is a unique equilibrium for almost all groups, we can identify β, σ and λ from socially interacted outcomes without worrying about equilibrium selection.42 For the binary choice model and 42 As a consequence, the parameter for the selection rule, α, is not identifiable in this case. 106 the Tobit model, however, given the possibility of equilibrium multiplicity, the distribution of outcomes will depend on the probabilities of equilibrium selection and the distribution of outcomes in an equilibrium. In a nonparametric setting, Aguirregabiria and Mira (2013) identify structural parameters via exclusion restrictions for a game with discrete choices. In this parametric setting for social interactions, the technique of identification at infinity is employed to identify model parameters. Proposition 4.4.3 In the linear model of continuous choices, for a group with social relaei = (1, P W n,ij , X c0 , P W n,ij X c0 )0 , tions W n , denote X i j j6=i j6=i P 0 0 0 c bi = (1, X c , and X i j6=i W n,ij Xj ) . Under Assumptions 4.4.2, 4.4.3, and 4.4.4, β0,0 , β1 , σ and λ can be identified, if ei X ei0 ] > 0; min min eigE[X (4.4.12) bi X b 0 ] > 0, min min eigE[X i (4.4.120 ) 1≤i≤n or 1≤i≤n when W n is row-normalized, where min eig(·) is the minimal eigenvalue of the corresponding matrices. Since there is only one equilibrium in the linear model for almost all groups, α is not identified in this case. Proof. See Appendix C.3. For the payoff parameters, β, λ and σ, in the binary choice model and the Tobit model, we make the following assumption on the model coefficients. Assumption 4.4.5 β1,l 6= 0 for some l ∈ {1, · · · , L}. Without loss of generality, suppose that l = 1. Assumption 4.4.6 The i.i.d. idiosyncratic shocks, i ’s, are distributed according to a meanscale family with a pdf, f (c) = (1/σ)fs ((c − µ )/σ), where fs (·) is some known standard distribution pdf, µ and σ are the location and scale parameters. Normalize µ = 0 and σ = 1. 107 Lemma 4.4.2 Under Assumptions 4.4.2, 4.4.4, and 4.4.5, for any i and ω−i ∈ {0, 1}n−1 , there is a subset X c (ω) ⊆ <nL such that P (X ∈ X c (ω)) > 0, c c |→∞,j6=i,X c ∈X c (ω) P (y−i = ω|X ) = 1. and lim|Xj,1 Proof. See Appendix C.3. Proposition 4.4.4 In the model of binary choices, for a group with social relations W n , P 0 c0 e for any ω ∈ {0, 1}n−1 , denote X(ω) i = (1, Xi , j6=i W n,ij ωj ) . Under Assumptions 4.4.2, 4.4.4, 4.4.5, and 4.4.6, β and λ can be identified, if for some non-zero vector ω0 ∈ {0, 1}n−1 , there is D0 > 0, such that c c c e e0 inf min eigE[X(ω) i X (ω)i |X ∈ X (ω) , |Xj,1 | ≥ D, j 6= i] > 0. D≥D0 (4.4.13) Proof. See Appendix C.3. As for the Tobit model, we still use the technique of “identification at infinity” for parameters of the payoff function. However, since uncensored outcomes are continuous, we cannot fix any uncensored outcomes as a result of dominant strategy. Nonetheless, when no outcomes are censored, the distribution of interacted outcomes will be similar to that of continuous choices in a linear model, where there is only one equilibrium. Therefore, we can identify parameters for payoffs and shock distributions separately from the parameters for equilibrium selection. Additionally, different from the binary choice model where those parameters are identified up-to-scale, in the Tobit model, σ can be identified based on the following relationship about the average (individual) outcomes and censoring rate found by Yang, Qu and Lee (2014): c E[yi |X ] = E[I(yi > 0)|X c ]F−1 (E[I(yi c > 0)|X ]; σ) − Z c<F−1 (E[I(yi >0)|X c ];σ) cf (c)dc. (4.4.14) Because (4.4.14) holds for any individual under every equilibrium, it can be used to identify σ regardless equilibrium multiplicity. To utilize this relationship, we impose the assumption below: Assumption 4.4.7 f (·; σ) is differentiable with respect to σ, limc→∞ c(dF (c; σ)/dσ) = 0, and ∂F (c;σ)/∂c f (c;σ) is strictly monotonic with respect to c. 108 Lemma 4.4.3 Under Assumptions 4.4.2, 4.4.4, and 4.4.5, there is a subset X1c ⊆ <nL such c c |→∞,1≤j≤n,X c ∈X c P (yj = 1, 1 ≤ j ≤ n|X ) = 1. that P (X ∈ X0c ) > 0 and lim|Xj,1 1 Proof. See Appendix C.3. Proposition 4.4.5 In the Tobit model, for a group with social relations W n , denote ei = (1, P W n,ij , X c0 , P W n,ij X c0 )0 and X bi = (1, X c0 , P W n,ij X c0 )0 . X i j j i j6=i j6=i j6=i Under Assumptions 4.4.2, 4.4.4, and 4.4.5, β0,0 and λ can be identified, if there is D0 > 0 with inf ei X e 0 |X c , |X c | ≥ D, 1 ≤ j ≤ n] > 0; min min eigE[X i 1 j,1 D≥D0 1≤i≤n (4.4.15) or inf bi X b 0 |X c , |X c | ≥ D, 1 ≤ j ≤ n] > 0; min min eigE[X i 1 j,1 D≥D0 1≤i≤n (4.4.150 ) when W n is row-normalized. If, in addition, Assumption 4.4.7 is satisfied, σ is identified. Proof. See Appendix C.3. After parameters for the payoff function u(·) are identified, we identify α given (β, σ, λ). Suppose that for (β, σ, λ), for group (X, Wn ), there are D equilibria. Denote their values under the criterion γ(·; X, Wn ) by γ(ξ d ; X, W n ) ∈ <q . Stacking them together, we get an n × q matrix: 0 0 0 Γ(X c , W n ; β, σ, λ) = (γ (ξ 1 ; X, W n , · · · , γ (ξ D ; X, W n )) . Proposition 4.4.6 In the binary choice (Tobit) model, with social relations, W n , under the assumptions in Proposition 4.4.4(Proposition 4.4.5), α can be identified if 0 E[Γ (X c , W n ; β, σ, λ)Γ(X c , W n ; β, σ, λ)|X c ] has full column rank for any X c . Proof. See Appendix C.3. When there are different groups, β0,1 can be identified from variations across groups. 4.4.4 Computation and Estimation With a parametric selection probability distribution, it is natural to derive parameter estimates by maximizing the complete likelihood function, (4.4.10). However, that requires 109 computation of all the equilibria, which is a challenging issue both theoretically and numerically. According to Garcia and Zangwill(1981), that ambitious goal is achievable for a class of problems by the homotopy continuation method (simply, the homotopy method). To be c : S(e c) = 0}. specific, in complex spaces, S : Cn → Cn , is analytic. Define a homotopy: {e we construct the following homotopy: Ri (e c, t) = (1 − t)(e cqi i − 1) + tS i (e c), (4.4.16) for i = 1, · · · , n and 0 ≤ t ≤ 1. qi is a positive integer. For t = 0, Ri (e c) = e cqi i − 1, which is c) = S i (e c). Separate the real and imaginary a polynomial with qi solutions. For t = 1, Ri (e ∗ R I parts of variables and functions. That is, e c = (e cR , e cI ) and Ri (e c, t) = (Ri (e c, t), Ri (e c, t)). The ∗ c=e c(ω) function R then has 2n coordinates. Re-parametrize the system by ω, such that e R I and t = t(ω).43 Then we get Ri (e c, t) = 0 and Ri (e c, t) = 0 for all i = 1, · · · , n. Denote this ∗ R I R I system as R = (R1 , R1 , · · · , Rn , Rn ). Taking derivatives, we get that 0 ∗ ∂R ∂R ∗ ∗ e 0 De c(ω) + Dt(ω) = DR ((c), t) De DR (y)Dy = c Dt(ω) = 0, ∂e c ∂t 0 (4.4.17) 0 where y = (e c , t) . It is shown by Garcia and Zangwill(1981) that the above system can be solved through “basic differential equations” (BDE): ∗ 0 yi (ω) = (−1)i detDR−i (y), ∗ (4.4.18) ∗ for i = 1, · · · , 2n + 1, staring at (e c0 , t0 )’s with R (e c0 , t0 ) = 0. DR−i (y) is the Jacobian of ∗ ∗ DR (y) with the i-th column removed. The above calculation is possible if DR (e c, t) is of full row rank for all (e c, t) with R(e c, t) = 0, which is called the “regularity” condition. If ∗ in addition, the homotopy is also “path finite”(that it, for any t, any e c satisfying R (e c, t) never goes to infinity), solving the BDEs can guarantee getting all solutions to S(e c) = 0. Similar to the previous discussions about regular groups, the regularity condition is satisfied 43 This re-parametrization is important. With multiple equilibria, any fixed t may corresponds to different e c’s. 110 in general.44 A sufficient condition for path-finiteness is that the following limit S(e c)i q c i −1 ke ci k→∞ e lim (4.4.19) is not a pure real negative number, for all i. Especially, the path-finite condition is satisfied when the above limit is equal to zero. To apply this theory, we need first to extend S(ξ; X, Wn ) from the real line to complex spaces and make sure its extension is analytic. This extension, actually, is important, in order to ensure that all solutions to S(ξ; X, Wn ) = 0 can be solved in this way. That is ∗ c, t)/∂e c) ≥ because for analytic functions in complex spaces, we can make sure det(∂R (e 0.45 It then follows from (4.4.18) that t(ω) is monotonic. That is, on any path, t is monotonic and never turns back. As a result, stemming backwardly from any points in n o ∗ ∗ e c : R (e c, 1) = S(e c) = 0 , a path goes down directly to a zero for R (e c, 0) and never goes up. That ensures that every zero of S(e c) can be connected through a path to one of the zeros ∗ ∗ c, 0). If we restrict to the real line, however, det(∂R (e c, t)/∂e c) of the homotopic function, R (e can be either positive or negative or zero. In that case, paths can reverse back and some zeros of S(e c) may not be reached. One example can be found in Garcia and Zangwill (1981). This extension is crucial, nonetheless, not straightforward for some economic applications. Bajari et al.(2010c) show that when the power of the polynomials, qi ’s, are sufficiently large, the homotopy method can be used to derive all solutions for the multinomial choice model. For the framework in this paper, instead, it is relatively easy to extend S(ξ; X, Wn ) for many classes of models. Inspecting the models we listed in the paper, most of the Hi (·)’s Ra are defined by integrals. Noticing that G(e a) = a0 g(e c)de c is analytic, when g(·) is analytic in a simply-connected region, we can derive analytic extensions for the models listed in this paper. For example, when the idiosyncratic shocks are normally distributed, its density, f (·), R Ra is analytic. For its CDF, we can manipulate as F (a) = c<a f (c)dc = a0 f (c)dc + F (a0 ), R ea for some real number a0 > 0. The extension that Fb (e a) = a0 f (e c)de c + F (a0 ) is analytic.46 44 This result, as well as that for regular groups in equilibrium analysis, are a direct result of the Sard’s theorem. See the textbook written by Guillemin and Pollack(1974) for details. 45 This result follows from the Cauchy-Riemann equations. 46 The whole space, C2 is simply-connected. The integral here then does not depend on the path. 111 Since sums, products, and composites of analytic functions are analytic, we can derive analytic extensions for other models in a similar way. For the model of investment decisions under Cobb-Douglas technology, there is a power function, which can be multivalued in the complex plane. We choose one sheet in that case. It is not hard to check the sufficient condition for “path-finiteness” in the specific model structure of this paper. As the diagonal entries of the social relation matrix, Wn , are all c) does not depend on e ci . Hence, the limit in (4.4.19) will be zero when zeros, for any i, S i (e qi = 2. Solving the “basic differential equations” requires computing the Jacobinas, which can increase computation burden. As a result, Garcia and Zangwill(1979) propose to combine the homotopy continuation method with Newton’s method, which can facilicate computation. In practice, there is a computation algorithm, the homotopy algorithm, with a Fortran code suite, HOMPACK90, provided by Watson et al. (1987) and (1997). That algorithm, is related but not the same as the homotopy continuation method we discussed above. Due to Watson et al. (1987), the homotopy algorithm is a global convergent algorithm and is based on the theory that almost all starting points can lead the a zero of a function, or equivalently, a fixed point of a function. Therefore, there is not guarantee that it can lead to all solutions to an equation. However, it is also noted by Borkovsly et al. (2010a) and (2010b), with discretization in computation, the homotopy algorithm can alleviate the problem that the function we are using is non-analytic. In practice, we may compare the performance of both methods. With G independent groups, the log likelihood of the whole sample can be written as: log L(Y1 , · · · , YG |β, λ, σ, α) = G X g=1 ng 0 log( X ξ e ∈E(Xg ,Wg ) P Y exp(α γ(ξ e ; Xg , Wg )) f (yi,g |ξ e )) exp(α0 γ(ξee ; Xg , Wg )) ξee ∈E(Xg ,Wg ) i=1 112 (4.4.20) The form of the sample likelihood function follows from two types of independence. First, those groups are independent of each other. Second, because the privately known idiosyncratic shocks are independent, within any group, given an equilibrium, the outcome of a group member is independent of those of other group members. With a large number of independent groups, we can apply conventional large sample theory about maximum likelihood estimation. In practice, as it is computationally intensive to compute all the equilibria, instead of maximizing the sample log likelihood, for any given parameter vector, we can compute the set of equilibrium and simulate the selection result and outcome distribution. Then we can calculate simulated moments and estimate parameters through maximizing the simulated moment conditions. 4.5 Estimation with Self-Known Characteristics When some exogenous characteristics, Xip ’s, are not public information, conditional e (·) vary with the private information used to make predictions. This paper expectations, ξi,m focuses on the special case that Xip is known only to i (and the econometricians). That is, for any i, Ji (i) = 1 and Ji (j) = 0 for all j 6= i. Equilibrium and estimation method for the general information structure can be analyzed in a similar way, notations and calculations will be more complicated though. We make an additional simplifying assumption, as Yang and Lee (2014) do. 0 0 0 Assumption 4.5.1 The conditional distribution of X p = (X1p , · · · , Xnp ) is exchangeable, if Xip ’s have the same support Xp ∈ <kp , for any public information Z = z and for any permutation, $ : {1, · · · , n} → {1, · · · , n}, the conditional distribution of X p given Z = z, fp (·), satisfies p p fp (X1p = xp1 , · · · , Xnp = xpn ) = fp (X$(1) = xp1 , · · · , X$(n) = xpn ), 0 0 0 for any xp = (xp1 , · · · , xpn ) in their support. Under Assumption 4.5.1, fixing public information, Z = z, the conditional distribution, fp (Xip |Xjp , Z = z) is invariant with i, j as long as i 6= j. So we just denote it by fp (e x|x). 113 0 For any i, and k, k 6= i, for any x in the support Xp , X Wn,ij ξje (Xip ))|Xkp = x, Z = z] ξie (Xkp = x) =E[Hi (u(Xi ) + λ j6=i =E[Hi (u(Xi ) + λ X Wn,ij ξje (Xip ))|Xkp0 = x, Z = z] j6=i =ξie (Xkp0 = x). That is to say, the conditional expectation ξie (·) depends only on the realization of self-known information. Any two agents other than i will make the same prediction on i’s behavior whenever their own self-known features are the same. According to Appendix C.1, the conditional expectation about i’s behaviors, ξie : Xp → <, is a mapping from the support of self-known covariates to the space of possible outcomes. For conditional expectations about behaviors of all group members, the vector-valued function ξ e = (ξ1e , · · · , ξne ), satisfies: ξ e (xp1 , · · · , xpn )i = ξie (xpi ), (4.5.1) and ξie (x) Z = x e Hi (u(X g , Xic , x e) + λ X Wn,ij ξje (e x))fp (e x|x)de x, (4.5.2) j6=i for all i = 1, · · · , n and x ∈ Xp . Particularly, if Xip is independent of Xjp for any i 6= j, fp (e x|x) = fp (e x). That is, information about Xjp does not help to predict i’s actions, given public information Z = z. In this case, the conditional expectation function ξ e reduces to an n × 1 vector with Z X e ξi = Hi (u(X g , Xic , x e) + λ Wn,ij ξje )fp (e x)de x, (4.5.3) x e j6=i which is a system of nonlinear equations and can be analyzed in a way similar to that used when all exogenous characteristics are public information. Hence, in the subsequent sub-sections, discussions are focused on the case that Xip and Xjp are correlated for i 6= j. Since the domain of each ξie is the support of Xjp ’s, whether Xjp ’s are discrete or continuous random vectors will have different implications on the conditional expectation functions. We discuss equilibrium, estimation and identification with self-known characteristics separately for those two cases. 114 4.5.1 Discrete Private Characteristics Suppose that Xip ’s are discretely distributed and have a finite number of mass points. Suppose that their common support is Xp = x1 , · · · , xK for some K < ∞. The transi0 tional probabilities are Pk,k0 = P r(Xip = xk |Xjp = xk , Z = z). Then ξ e can be represented by a vector, 0 e e e e ξ e = (ξ1,1 , · · · , ξ1,K , · · · , ξn,1 , · · · , ξn,K ), (4.5.4) e = ξ e (xk ). The consistency condition (4.5.2) reduces to where ξi,k i e ξi,k = K X k 0 Pk,k0 Hi (u(X g , Xic , xk ) + λ 0 X e Wn,ij ξj,k 0 ), (4.5.5) j6=i for i = 1, · · · , n and k = 1, · · · , K. This is a finite dimension nonlinear system of equations, similar to the equilibrium condition without private information. Thus, the techniques in the previous section can be used to analyze equilibria and complete the model. Identification of model parameters can be proved analogously. Although the support of privately known characteristics, Xip ’s, is bounded, if the commonly known individual features, Xic ’s, have a full support, the method of “identification at infinity” can still be used. 4.5.2 Continuous Private Characteristics Equilibrium Set If Xip ’s are continuous random variables, ξ e is a function defined on a continuum set of points satisfying the functional equations (4.5.1) and (4.5.2). According to Appendix C.1, we can view ξ e as a point in a Banach space, (Ξ(Wn , J ), k · k). Define an operator, T : (Ξ(Wn , J ), k · k) → (Ξ(Wn , J ), k · k) as Z X p p T (ξ)(x1 , · · · , xn )i = Hi (u(X g , Xic , x e) + λ Wn,ij ξje (e x))fp (e x|xpi )de x, x e j6=i 115 (4.5.6) for all i and (xp1 , · · · , xpn ) ∈ Xnp . Then an equilibrium corresponds to a fixed point of this operator. We can prove the existence of an equilibrium by the Schauder fixed point theorem.47 To apply this theorem, we impose the following two assumptions. Assumption 4.5.2 For all i = 1, · · · , n, Hi (·) is differentiable. Additionally, max sup | 1≤i≤n c∈< dHi (c) |kWn k∞ < ∞. dc (4.5.7) Assumption 4.5.3 There is 0 ≤ b < 1 such that maxkξk→∞ kT (ξ)k/kξk = b. It is proved in Lemma C.4.4 that under Assumption 4.5.2, T is continuous.(Actually, T is a Lipschitz function when this condition holds.) In addition, it is shown by Lemma C.4.5 that with Assumption 4.5.3, there is r0 > 0, such that for all ξ in the closed ball, B[0, r0 ] = {ξ : kξk ≤ r0 }, kT (ξ)k ≤ r0 . That is to say, the images of all points in B[0, r0 ] is still in this ball. Moreover, if there is any equilibrium, it must be contained in B[0, r0 ]. The ball, B[0, r0 ], is nonempty, closed, and convex. To apply the Schauder fixed point theorem, it suffices to show that T (B[0, r0 ]) is contained in a compact subset of B[0, r0 ]. However, that is not trivial, because the ball B[0, r0 ] is not compact in the function space, (Ξ(Wn , J ), k · k). To capture a compact set in this space, we begin with the relatively compact sets, for the closure of a relatively compact set is compact. As we have mentioned, ξ ∈ (Ξ(Wn , J ), k · k) if any only if each of its coordinate functions, ξi , belongs to the Lebesgue space, L1 (Xp , BX , µp ; <1 ). Utilizing the characterization of relatively compact subsets in Lebesgue spaces by Dunford and Schwartz(1958), we derive necessary and sufficient conditions for relative compactness in (Ξ(Wn , J ), k · k). (See Proposition C.4.3 for the results of a general form of private information about Xip ’s.) On the basis of these discussions, a theorem about the set of equilibria is derived. Proposition 4.5.1 Under Assumptions 4.5.2 and 4.5.3, if in addition, Z max |T (ξ)i (x + x e)fp (x + x e) − T (ξ)i (x)fp (x)|dx → 0, 1≤i≤n Xp (4.5.8) 47 The theorem is cited as Proposition C.4.4 in Appendix C.4. See Bonsall (1962) for details. A brief introduction can be found at http://en.wikipedia.org/wiki/Schauder fixed point theorem. 116 as x e → 0, uniformly for any ξ ∈ B[0, r0 ]; and Z max |T (ξ)i (x)|fp (x)dx → 0, (4.5.9) 1≤i≤n Xp −Cr as r → ∞, uniformly for all ξ ∈ B[0, r0 ], the set of equilibria, E(X, Wn ), is a nonempty and compact subset of (Ξ(Wn , J ), k·k) and is contiained in the closed ball B[0, r0 ]. In particular, (4.5.8) and (4.5.9) are satisfied, if 0 0 1. Hi (·)’s are uniformly bounded, i.e., max1≤i≤n supa∈<1 |Hi (a)| ≤ B for some B ; 2. E[Xip |Z = z] < ∞, for all i; and 3. For some δ0 > 0, for each i, there is an function gi (x, x b) such that R R b)|dxdb x < ∞, fp,i (x + x e, x b) ≤ gi (x, x b), a.e., for any x e in the cube Xp Xp |gi (x, x i,m Ji Cδ0 , where fp,i (·, ·) is the joint density of Xip and Xjp , i 6= j, conditional on public information Z = z.48 This existence theorem requires some conditions about the behaviors, Hi (·)’s, and the joint distribution of Xip ’s conditional on public information Z = z. Conditions (4.5.8) and (4.5.9) are about uniform convergence, which may be hard to verify. However, when Hi (·)’s are uniformly bounded, we just need to verify some distribution conditions. This type of scenarios include models with bounded behaviors, such as the models for binary choices, ordered multinomial choices and two-sided censored choices. In that case, we can see that if conditional on public information Z = z, Xip ’s have a continuous joint distribution on a bounded support, the sufficient conditions about distribution are satisfied. When the support is unbounded, for some distributions, the joint density of Xip ’s can still be dominated by a integrable function. One example is the normal distribution. See Lemma C.4.6 in Appendix C.4. Therefore, when outcomes and/or behaviors are uniformly bounded and the joint distribution of Xip and Xjp (i 6= j) is normal conditional on public information Z = z, there is at least one equilibrium and the set of equilibria is a compact set contained in a closed ball B[0, r0 ]. 48 A cube, Cr , is the set in <kp such that for any x e ∈ Cr , all its coordinates are within [−r, r]. 117 The compactness of E(X, Wn ) is important. Given this result, for any small positive number, η > 0, there is a finite number of points in E(X, Wn ), ξ e,1 , · · · , ξ e,K , such that for any equilibrium ξ e , we can pick one of those points, say ξ e,k , with kξ e − ξ e,k k < η.49 This way, the finite set ξ e,1 , · · · , ξ e,K can be viewed as an approximation of all equilibria with precision η. This is the basis on which we specify the distribution of equilibrium selection in the current structure of incomplete information. Conditions (4.5.8) and (4.5.9) are used to ensure that we can apply the Schauder fixed point theorem to the whole ball B[0, r0 ]. If we restrict to equilibria with some special properties so that they are contained in a compact subset of B[0, r0 ], we may apply the Schauder fixed point theorem in this compact subset. Then (4.5.8) and (4.5.9) will be satisfied, some other possible equilibria will be excluded though. For the model of investment decisions with Cobb-Douglas production function, Assumption 4.5.3 is satisfied. To see this, via the Jensen’s inequality and the Hölder’s inequality, R Xp |T (ξ)i (x)|fp (x)dx kξk n o R RR P ι max u(X g , Xic , y) + λ j6=i Wn,ij ξj (y) − i , 0 f (i )di fp (y|x)dyfp (x)dx| Xp =AL1−ι i R ≤AL1−ι i Xp R [HT (u(X g , Xic , y) + λ kξk ι W ξ (y))] fp (y|x)dyfp (x)dx n,ij j j6=i P kξk R P ι g c , y) + λ W [ H (u(X , X n,ij ξj (y))fp (x, y)dxdy] T i j6=i ≤AL1−ι i kξk P ι h Z H (u(X g , X c , y) + λ T i j6=i Wn,ij ξj (y)) 1−ι kξk P =ALi kξk |u(X g , Xic , y) + λ j6=i Wn,ij ξj (y)| P iι |u(X g , Xic , y) + λ j6=i Wn,ij ξj (y)| · fp (x, y)dxdy kξk P g c iι ιhZ H T (u(X , Xi , y) + λ kξk p j6=i Wn,ij ξj (y)) p P ≤AL1−ι fp (x, y)dxdy i c g kξk |u(X , Xi , y) + λ j6=i Wn,ij ξj (y)| h Z |u(X g , Xic , y) + λ P iι q j6=i Wn,ij ξj (y)| q · fp (x, y)dxdy , kξk (4.5.10) for some p, q ≥ 1 with 1/p + 1/q = 1. Similar to our discussion in the previous section, if |λ|kWn k∞ < ∞, R Xp |T (ξ)i (x)|fp (x)dx kξk goes to zero as kξk goes to infinity. Then 49 See Dunford and Schwartz (1958) and Folland(1999) for a brief introduction on compactness and relative compactness. 118 limkξkE →∞ kT (ξ)k kξk = 0. However, other conditions may not hold in this case. Instead, if we focus on the case that Xip ’s have a compact support and all equilibrium conditional expectations, ξ e ’s, are continuous, we may derive a similar characterization of the set of equilibria.50 Nonetheless, for some models with unbounded choices, such as the linear model of continuous choices and the Tobit model, in order to satisfy Assumption 4.5.3, we might have to impose strong conditions on λ and kWn k∞ . To avoid that, we employ other techniques to analyze those models. First consider the linear model with continuous choices. In this case, the equilibrium condition for conditional expectations is: Z Z X e)fp (e x|x)de x+ λ ξie (x) = u(X g , Xic , x Wn,ij ξje (e x))fp (e x|x)de x. x e x e j6=i Reorganizing the above equation, we get that Z Z X e e ξi (x) + (−λ) Wn,ij ξj (e x))fp (e x|x)de x = u(X g , Xic , x e)fp (e x|x)de x. x e (4.5.11) (4.5.12) x e j6=i This is an n-dimension Fredholm integration alternative, whose solutions are investigated 0 by general Fredholm theory.51 Especially, when u(X g , Xic , Xip ) = v(X g , Xic ) + Xip β and 0 E[Xjp |Xip ] = µ + CXip , by Yang and Lee (2014), when Inkp − λ(Wn ⊗ C ) and In − λWn are both invertible, there is one and only one linear equilibrium conditional expectation, ξ e . For the Tobit model with heterogeneous cutoffs, define P pi (x) = F (u(X g , Xic , x) + λ j6=i Wn,ij ξje (x) − v(X g , Xic )), R for all i and x ∈ Xp . Then we have that ξi (x) = Hi (F−1 (pi (e x)) + v(X g , Xic ))f (e x|x)de x. The equilibrium condition can be rewritten as Z X pi (x) = F (u(X g , Xic , x) + λ Wn,ij Hj (F−1 (pj (e y )) + v(X g , Xic ))f (e y |x)de y − v(X g , Xic )). j6=i 50 In that case, we may use the sup-norm for each coordinate function, the ξi ’s, instead of the k · k1 norm. 51 See Ruston (1986) for systematic discussions. Lax(2002) provides with a succinct discussions. Define R P an operator, TF such that for any ξ, TF (ξ)(x1 , · · · , xn )l = TF (ξ)l (xl ) = xe (−λ) j6=i Wn,ij ξje (e x)fp (e x|xl )de x, p p kp for all l and xl ∈ X . If X is compact in < , TF is a compact operator from (Ξ(Wn , J ), k · k) to a space of p continuous functions R on gX . cThen this integration R has a solution if 0 e)fp (e x|xn )de x) is orthogonal to the null space u e(x1 , · · · , xn ) = ( xe u(X , Xi , x e)fp (e x|x1 )de x, · · · , xe u(X g , Xic , x 0 of the transpose operator, TF . 119 As F (·) is bounded, we can apply Proposition 4.5.1 to analyze equilibrium set. If v(X g , Xic ) = 0 for all i, that is the case of the Tobit model with homogeneous cutoffs normalized to 0. Equilibrium Set Approximation Although we can approximate the whole equilibrium set by a finite number of equilibria for any level of precision, as functions defined on a continuum, it is not possible to derive the exact values of those functions at every point in their domains. In order to apply the stochastic selection rule, we approximate such a function. Four possible approximation approached are discussed here. The first method uses the simple functions. In Appendix C.4, we show that ξ = (ξ1 , · · · , ξn ) ∈ (Ξ(Wn , J ), k · k) if and only if each of its coordinate function, ξi , is an element of the Lebesgue space, L1 (Xp , Bp , µp ; <1 ). In such a function space, there is a special set of functions called the “simple functions”. A function ξ from Xp to <1 is µp simple if ξ has only a finite set of values, {ν1 , · · · , νK }; and for any νk , ξ −1 (νk ) is an element of the σ-algebra, Bp . By Dunford and Schwartz (1958), the set of µp -measurable simple functions is dense in L1 (Xp , Bp , µp ; <1 ). That is, for any given level of precision, we can always find a simple function to approximate a function in L1 (Xp , Bp , µp ; <1 ). For any integer K, choose a finite partition of Xp , U K,1 , · · · , U K,K , where each U K,k is in Bp . For P x ∈ U K,k ). Then the any i = 1, · · · , n, define a simple function as ξiK (e x) = K k=1 κi,K,k I(e equilibrium condition (4.5.2) can be approximated as K X κi,K,k I(e x ∈ U K,k ) ≈ Z Hi (u(X g , Xic , ye)+λ k=1 X Wn,ij j6=i K X κj,K,k I(e y ∈ U K,k ))fp (e y |e x ∈ U K,k )de y , (4.5.13) k=1 for all i = 1, · · · , n and k = 1, · · · , K. Therefore, Z κi,K,k = Hi (u(X g , Xic , ye) + λ X Wn,ij j6=i = K X P (U K,k 0 |U K,k )E[Hi (u(X K X κj,K,k I(e y ∈ U K,k ))fp (e y |e x ∈ U K,k )de y k=1 g , Xic , ye) (4.5.14) +λ k=1 X j6=i 120 Wn,ij κj,K,k0 )|e x∈U K,k , ye ∈ U K,k 0 ], 0 0 where P (U K,k |U K,k ) = P (Xjp ∈ U K,k |Xip ∈ U K,k ). As i runs over 1, · · · , n and k runs over 1, · · · , K, there are nK equations for nK unknowns. This is similar to the previous analysis for the case of discretely distributed Xip ’s. What is different is that the conditional distribution, fp (e y |xpK,k ), is not discrete. When all Hi (·)’s and fp (·) can be extended to the complex space in an analytic way, as the right-hand-side of (4.5.14) is a weighted sum of integrations over Hi (·), the above system can be extended to complex spaces. Multiple solutions for κj,K,k ’s can then be computed by the homotopy method. The corresponding simple functions are then used as an approximation of the equilibrium set. To approximate equilibrium conditional expectation functions by simple functions is, in essence, to discretize the domain, Xp . Rust(1987) uses this method to solve for the optimal engine replacement scheme in an optimization programming problem. Compared with his work, instead of a unique optimal scheme in Rust(1987), it is possible to get multiple solutions to (4.5.14), which makes the model in this paper more computationally intensive. Precision depends on the choice of partitions. Generally speaking, the finer the partition (increasing K), the more precise the approximation. However, it remains a problem how to choose the cutoffs for a partition. Inspecting (4.5.2), the value of an equilibrium conditional expectation is determined by an integration, which can be approximated by the quadrature method(See Judd (1998) and Lee(2001) for details.). Employing this approximation of integration, we get the second method to approximate equilibria. Take the Gauss-Legendre quadrature as an example. Consider the simple case that the dimension for Xip ’s is kp = 1 and Xp is an interval, [a, b]. We have that ξie (x) = b Z a ≈ K X Hi (u(xg , xci , x e) + λ X Wn,ij ξje (e x))fp (e x|x)de x j6=i ωk Hi (u(xg , xci , k=1 · fp ( X (υk + 1)(b − a) (υk + 1)(b − a) + a) + λ Wn,ij ξje ( + a)) 2 2 (4.5.15) j6=i (υk + 1)(b − a) b−a + a|x) , 2 2 where υk , for k = 1, · · · , K, are abscissae and ωk are the weights. They are fixed. If we can get the values of expectation function on a finite number of points, xpk = (υk +1)(b−a) 2 + a, for k = 1, · · · , K, we can approximate the value of expectation function at any point in 121 the support of Xip . To be specific, when we take x = xpk for k = 1, · · · , K, we will get nK equations for nK unknowns, ξie (xpk ) i,k ’s. Using the homotopy method, we get a n o , for some d = 1, · · · , D. For each d, plugging finite number of solutions, ξie,d (xpk ) i,k n o back into (A.2.1), we derive an approximation of an equilibrium conditional ξie,d (xpk ) i,k expectation function. If Xip has a full support, we need to change variables so that integral is over a bounded interval. This quadrature method is used by Yang and Lee (2014) and Yang, Qu and Lee (2014) to derive approximations to equilibrium conditional expectation functions when there are private information in exogenous characteristics. The difference lies in the way to solve the system of equations (A.2.1). They search for the unique solution under condition (4.3.8). Instead, in this paper, the homotopy method is used to derive all o n the solutions, ξie,d (xpk ) . i,k When Xip ’s are of multiple dimensions, the tensor product may be used. Details are introduced by Judd (1998). However, when the dimension of Xip ’s are high, computation can be intensive. Therefore, for high-dimension privately known characteristics, the stochastic integration may be used for approximation instead. To be specific, let g(·) be a density with its support containing the support of Xip such that fp (xp |x)/g(xp ) is well defined. Then we can generate K random draws, say, xpk , from density h(·). The stochastic approximation will be ξie (x) K p X 1 X e p fp (xk |x) g c p Wn,ij ξj (xk )) . Hi (u(x , xi , xk ) + λ ≈ K g(xpk ) k=1 (4.5.16) j6=i The fourth method is a combination of the quadrature method and basis function approximation. That is, an equilibrium function is approximated by a linear combination of a finite number of bases. In our model, ξ = (ξ1 , · · · , ξn ) is an element of (Ξ(Wn , J ), k · k) if and only if each of its coordinate function, ξi is an element of the Lebesgue space, L1 (Xp , Bp , µp ; <1 ). We can use basis of this Banach space to approximate an equilibrium. For example, the dimension of Xip ’s is kp = 1 and Xp = [a, b]. We first transform ξ into a function on [0, 1] by changing variables. 122 1. When −∞ < a < b < +∞, by setting x = a + (b − a)e x, Rb R1 x)fp (a + (b − a)e x)de t. a ξi (x)fp (x)dx = 0 bξ(a + (b − a)e Define ξei (e x) = bξ(a + (b − a)e x)fp (a + (b − a)e x). 2. When a = −∞ and b < +∞, by setting x = log(be x), Rb R1 x. x))fp (log(be x)) xe1 de a ξi (x)fp (x)dx = 0 ξ(log(be Define ξei (e x) = ξ(log(be x))fp (log(be x)) xe1 . 3. When a > −∞ and b = +∞, by setting x = log(a/(1 − x e)), Rb R1 1 e))fp (log(a/(1 − x e))) 1−e x. x de a ξi (x)fp (x)dx = 0 ξ(log(a/(1 − x 1 Define ξei (e x) = ξ(log(a/(1 − x e))fp (log(a/(1 − x e))) 1−e x. 4. When a = −∞ and b = +∞, by setting t = log(e x/(1 − x e)), R1 Rb 1 x/(1 − x e)))fp (log(e x/(1 − x e))) xe(1−e x. a ξi (x)fp (x)dx = 0 ξ(log(e x) de 1 Define ξei (e x) = ξ(log(e x/(1 − x e)))fp (log(e x/(1 − x e))) xe(1−e x) . We can see that ξi is an element of L1 (Xp , Bp , µp ; <1 ) if and only if ξei is in L1 ([0, 1], B[0,1] , m; <1 ), the space of Lebesgue integrable functions defined on the unit interval [0, 1]. Let τ0 (e x) = 0, (4.5.17) for all x e ∈ [0, 1]; and τk,j = 2k−1 (I((2j − 2)2−k ≤ x e < (2j − 1)2−k ) − I((2j − 1)2−k ≤ x e < 2j · 2−k )), (4.5.18) for k and 1 ≤ j < 2k−1 and x e ∈ [0, 1]. Sort them in the order, τ0 , τ1,1 , τ2,1 , τ2,2 , · · · , and re-label those functions as τe0 , τe1 , τe2 , · · · . Then {e τk } is called the Haar bases for L1 ([0, 1], B[0,1] , m; <1 ).52 See Figure C.1 for a graphic illustration. Choose an integer L, we approximate ξei by a linear combination of a finite number of those basis functions, i.e., P ξei ≈ L el . From it, we get an approximation of ξ. Take the first case listed above as l=0 κi,L,l τ P an example, ξi (x) ≈ L κ τ e ((x−a)/(b−a)) / bf (x) . Plugging this approximation p i,L,l l l=0 52 Since ke τk k1 = 1, the Haar basis is a basis with unit norm. 123 back into the consistency condition, we get that PL κi,L,l τel ((x − a)/(b − a)) bfp (x) Z b L X X κj,L,l0 τel ((e y − a)/(b − a)) Hi u(X g , Xic , ye) + λ Wn,ij = fp (e y |e x)de y, bfp (e y) a 0 l=0 j6=i (4.5.19) l =1 As for the integration, we can choose the quadrature points as we did before, i.e., L X l=0 = K X k=1 κi,L,l τel ((x − a)/(b − a)) bfp (x) L X X (υk + 1)(b − a) τel ((υk + 1)/2) ωk Hi u(X g , Xic , + a) + λ Wn,ij κj,L,l0 (υ +1)(b−a) 2 bfp ( k + a) 0 j6=i l =1 (4.5.20) 2 (υk + 1)(b − a) b−a · fp ( + a|e x)de y , 2 2 for i = 1, · · · , n, l = 1, · · · , L, and k = 1, · · · , K. The Gauss-Legendre quadrature abscrissae, υk , and weights, ωk , are fixed. The Haar bases, τel ’s, are known functions. From the data, it is possible to get the distribution density of Xip ’s conditional on public information. Consequently, once we plug in x = (υk +1)(b−a) , 2 the value of τel ((x−a)/(b−a)) bfp (x) can be calculated. In this way, we derive a system of nK nonlinear equations for nL coefficients, κi,L,l ’s. Choosing K = L and applying the homotopy method when analytic extension is possible, we may get multiple solutions, κdi,L,l , for d = 1, · · · , D. They correspond to multiple equilibria. Approximations when [a, b] is unbounded can be computed in a similar way by changing variables. Comparing the above approaches, we can see that using the first method, simple function approximation, we first fix the class of functions which are used to make approximation and then pin down the unknown coefficients of equilibrium functions by the equilibrium condition. However, there is not a general guidance on the choice of partitioning cutoffs. For the second and third method, we do not fix the form of equilibrium expectation functions. Instead, we first solve the values of an equilibrium expectation function at a fixed finite set of points and then use those values to approximate the equilibrium expectation function at any point in its domain. These two methods also specify how those points are chosen. 124 Using the last method, we construct a flexible form of functions using basis functions and then employ the quadrature method to approximate integration. Using the second method, approximation precision depends on the number of quadrature abscissae. For the fourth method, instead, the approximation performance hinges on the number of function basis chosen. Equilibrium Selection and Parameter Identification Given a precision η > 0, consider a finite approximation to the set of equilibria, E(X, Wn ), E0 (X, Wn ) = ξ e,1 (X, Wn ), · · · , ξ e,D (X, Wn ) , we can derive an approximation to the likelihood function (4.3.10): e ; X, Wn ) = L(Y D X ρ(ξ e,d ) n Y f (yi |ξ e ). (4.5.21) i=1 d=1 For example, we can set 0 ρe(ξ e,d exp(α γ(ξ e,d ; X, Wn )) ; E0 (X, Wn ), α) = PD d0 =1 0 exp(α0 γ(ξ e,d ; X, Wn )) . (4.5.22) Then we derived the (approximated) sample likelihood function: e X, Wn ) = L(y; D X ρe(ξ d,k ; E(X, Wn ), α) n Y f (yi |ξ e,k ). (4.5.23) i=1 d=1 When there are G independent groups, the corresponding approximation to the log likelihood of the whole sample is e 1 , · · · , YG |β, λ, σ, α) log L(Y = Dg ng 0 X Y exp(α γ(ξ e,g,d ; Xg , Wg )) log( f (yi,g |ξ e,g,d )). PDg 0 0 e,g,d e exp(α γ(ξ ; Xg , Wg )) i=1 g=1 d=1 d0 =1 G X (4.5.24) As for identification, β, λ, and σ can still be identified using the strategy of “identification at infinity”. That is because those parameters can be separately from α using this technique. However, it is more difficult to identify α. The reason is that (4.5.23) is not the exact likelihood but an approximation. However, as the set of equilibria is determined by β, λ, and σ, it does not depend on α. If we assume that all independent groups choose the 125 same approximation, Dg = D, and use the same probability mass, (4.5.22), we can identify α from variation across groups. Parameters can be estimated either by directly maximizing the approximated likelihood function or simulated moment conditions, similar to the approach used when all exogenous characteristics are public information. The performance of estimates relies on the approximation. In this paper, actually, there are two types of approximations. First, approximate the set of equilibria by a finite subset. Second, approximate each of those functions. Regarding the equilibrium conditional expectation on i’s behavior based on a structure p of private information, Ji,m , as a function of the realization of Xi,m , the model under a general form of incomplete information about Xip ’s can be estimated in a similar way. 4.6 4.6.1 Discussions and Extensions Group Unobservables In the previous discussions, all exogenous characteristics, X g , Xic ’s, and Xip ’s are observable to econometricians. In reality, however, some variables are known to agents but unavailable from the data. For example, researchers studying students’ class performances may not know how good the teachers are, which is known to the students. Similarly, in market sale data, it is possible that no information about the wealth of customers is available from the data. But the firms may have some relevant information. In this section, we take into account unobservable variables that are public information in a group and can affect the payoff of all group members. Modeling them as random effects, we derive the following framework: ∗ yi,g = hi,g (yi,g ), (4.2.10 ) and ∗ yi,g = u(X g , Xic , Xip , ζ g ) + λ X j6=i 126 Wg,ij E[yj,g |XJpi,g , Z] − i,g . (4.2.20 ) where g is the group index. By (4.2.20 ), we implicitly assume that all group unobervables are additive and summarize them as a single variable, ζ g . Assume that ζ g ’s are i.i.d. independent of all the other exogenous variables, social relations, as well as the idiosyncratic shocks. Their distribution is denoted by the pdf, fζ (·; ϑ). Because each ζ g is publicly known to agents in g, for interactions among group members, it acts the same as X g . For any group g, we can characterize the equilibrium set and use a parametric stochastic selection rule to complete the model. Suppose that the distribution of equilibrium selection is 0 ρ(ξ e ; E(X g , X c , X p , ζ g , Wg ), α) = ρ(α γ(ξ e ; X g , X c , X p , Wg ); E(X g , X c , X p , ζ g , Wg )), (4.6.1) with some known selection rule γ(ξ e ; X g , X c , X p , Wg ) and unknown parameter α. The above form shows that the group unobservables only affect the set of equilibria, but not the selection rule. Therefore, given a set of equilibria(finite, or approximated by a finite number of equilibrium expectation functions), the realizations of unobserved group features does not influence the distribution of equilibrium outcomes. This assumption is reasonable if the selection rule is consistent with social welfare maximization or Pareto optimization. The selection rules in previous sections satisfy (4.6.1). Then we can complete the model and write down the sample log likelihood function as follows: ng G hZ Z Y X log L(Y ; X, W ) = log f (yi,g |ξge )· g=1 ξ e ∈E(X g ,X c ,X p ,ζeg ,Wg ) i=1 (4.6.2) i ρ(α γ(ξ e ; X g , X c , X p , Wg ); E(X g , X c , X p , ζeg , Wg ))fζ (ζeg ; ϑ)dζeg . 0 Because X g ’s are observed from the data set and ζ g ’s are not, identification and estimation methods will be different from those in previous sections. Assuming that u(·) is a linear function of exogenous characteristics, i.e., 0 0 0 p c , Xp , ζg) = β g c g u(X g , Xi,g 0,0 + X β0,1 + Xi,g β1 + Xi,g β2 + ζ , i,g for all i and g. We normalize the mean of ζ g to be equal to 0, i.e., E[ζ g ] = 0. For data about a single group, applying the technique of identification at infinity, we can identify β1 , β2 , 0 and βe0 = β0,0 + X g β0,1 + ζ g (as a whole) under certain conditions, based on our previous 127 discussions. If Hi (·) is strictly increasing, the group average behaviors will be increasing in 0 βe0 = β0,0 + X g β0,1 + ζ g , which helps us identify β0,0 , β0,1 , and the distribution of ζ g ’s from variations across groups. As for estimation, we can calculate integration over unobserved ζ g ’s by stochastic simulations. That is to say, we randomly take S draws from the distribution fζ (·; ϑ) for each g, ζ g,1 , · · · , ζ g,S and calculate the simulated log likelihood: b ; X, W ) = log L(Y G X g=1 log ng Y S Z h1 X S s=1 ξ e ∈E(X g ,X c ,X p ,ζ g,s ,Wg ) f (yi,g |ξge )· i=1 (4.6.20 ) i 0 ρ(α γ(ξ e ; X g , X c , X p , Wg ); E(X g , X c , X p , ζ g,s , Wg )) . 4.6.2 Peer Effects Consider the case that an agent’s behaviors are affected by the performances of her peers. That is, Wn,ij = 1 for all i 6= j. Since every agent makes predictions on anyone else, we denote the set of all possible private information about Xip ’s in the group as n o Jb = Je : Je = Ji for some i = 1, · · · , n . Denote the number of elements in this set as n o M0 . We can denote the set as Jb = Je1 , · · · , JeM0 . For each i, there is a unique m(i) with 1 ≤ m(i) ≤ M0 such that Ji = Jem(i) . We use xpJ,m to represent one realization of p the random vector XJ,m corresponding to the private information, Jem . The corresponding p = XJpi , representing the private information support is denoted as XpJ,m . Note that XJ,m(i) e ), takes the following special known to i. The conditional expectation function, ξ e = (ξi,m form: e ξ e (xpJ,1 , · · · , xpJ,M0 )i,m = ξi,m (xpJ,m ) = E[Hi (u(Xi ) + λ X p p e ξj,m (Xm(i) ))|XJ,m = xpJ,m , Z], j6=i (4.6.3) for all i = 1, · · · , n, m = 1, · · · , M0 , and xpJ,m ∈ XpJ,m .53 When all exogenous covariates are public information, ξ e reduces to an n × 1 vector, satisfying ξie = E[Hi (u(Xi ) + P λ j6=i ξje )|Z = z]. Since only the total expected behaviors of peers is taken into account 53 In the general case, we define the expectation about i conditional on private information for agents other than i. For convenience, when discussing peer effects, we also define conditional expectation about i based on private information the same as i’s. Since Wn,ii = 0 for all i, model equilibria and implications do not change with this small alteration. 128 when an agent is making decisions, we wonder whether it is possible to represent an equilibrium by a vector-valued function, the dimension of whose range is less than n. That means e ’s, is reduced. Lee, Li and Lin(2014) prove that the number of coordinate functions, ξi,m this is possible in the binary choice model when all exogenous characteristics are public information and only the idiosyncratic shocks are privately known. Their results can be extended to the general setting in this paper. Let ξ = (ξ 1 , · · · , ξ M0 ) be a vector-valued function such that ξ(xpJ,1 , · · · , xpJ,M0 )m = ξ m (xpJ,m ), (4.6.4) for any m = 1, · · · , M0 . For any i, there is a unique m(i) = 1, · · · , M0 , such that Ji = Jem(i) . Consider a function equation system, Gi (ξ m(i) , ξi ; X, Wn ) = 0, where ξi = (ξi,1 , · · · , ξi,M0 ) is a vector-valued function which satisfies (4.6.4) and that for any m, Gi (ξ m(i) , ξi ; X, Wn )(xpJ,1 , · · · , xpJ,M0 ) m = Gi (ξ m(i) , ξi ; X, Wn )m (xpJ,m ) p p p =E[Hi (u(Xi ) + λξ m(i) (XJ,m(i) ) − λξi,m(i) (XJ,m(i) ))|XJ,m = xpJ,m , Z = z] − ξi,m (xpJ,m ) =0, (4.6.5) for all xpJ,m ∈ XpJ,m . Applying the Brouwer fixed point theorem, for any ξ m(i) , there is a function ξi satisfying the above system of equations. For any χ = (χ1 , · · · , χM0 ) with χm ∈ L1 (XpJ,m , BJ,m , µp ; <1 ), consider the linear operator ∆i (ξ m(i) , ξi ; X, Wn ) such that ∆i (ξ m(i) , ξi ; X, Wn )(χ) m (xpJ,m ) p p = − λE[DHi u(Xi ) + λξ(XJ,m(i) ) − λξi,m(i) (XJ,m(i) ) · p p χm (XJ,m(i) )XJ,m = xpJ,m , Z = z] − χm (xpJ,m ); for m = m(i); and ∆i (ξ m(i) , ξi ; X, Wn )(χ) m (xpJ,m ) = −χm (xpJ,m ), for m = 1, · · · , M0 and m 6= m(i), where DHi (a) denotes the derivative of Hi (·) at point a. ∆i (ξ m(i) , ξi ; X, Wn ) is the Fréchet derivative of Gi with respect to ξi at (ξ m(i) , ξi ; X, Wn ). 129 If ∆(ξ m(i) , ξi ; X, Wn ) is an isomorphism, by the Implicit Function Theorem in Banach e spaces, for a neighborhood ξ m(i) , there is only one ξie such that the functional equation, Gi (ξ m(i) , ξi ; X, Wn ) = 0 is satisfied. In this way, we can derive an operator Λi , which defines each function ξi as the image of ξ mi such that Gi (ξ m(i) , Λi (ξ m(i) ); X, Wn ) = 0. If ∆i (ξ m(i) , ξi ; X, Wn ) is an isomorphism for all i, we call the group to be regular. e e e Proposition 4.6.1 Suppose that the group is regular. If there is a function ξ = (ξ 1 , · · · , ξ M0 ) such that it satisfies (4.6.4), and e ξ m (xpJ,m ) = n X e p ) E[Hi (u(X g , Xic , Xip ) + λξ m(i) (XJ,m(i) (4.6.6) i=1 − e p p λ(Λi (ξ m(i) ))m(i) (XJ,m(i) ))|XJ,m = xpJ,m , z], where Λi (·) is defined above, then there is a vector of functions, e e e e ξ e = (ξ1,1 , · · · , ξ1,M (0) , · · · , ξn,1 , · · · , ξn,M0 ), such that (4.6.3) holds for all i, m, and xpJ,m ∈ XpJ,m . On the contrary, if there is a vector of e , · · · , ξe e e functions, ξ e = (ξ1,1 1,M0 , · · · , ξn,1 , · · · , ξn,M0 ), satisfying (4.6.3), there is a function e e e ξ = (ξ 1 , · · · , ξ M0 ) such that (4.6.4) and (4.6.6) hold. Proof. See Appendix C.5. Particularly, when all exogenous characteristics are public information, the functional equation system (4.6.5) reduces to e e Gi (ξ , ξie ; X, Wn ) = Hi (u(Xi ) + λξ − λξie ) − ξie = 0, (4.6.7) which is just a nonlinear equation. The regularity condition then reduces to e −λDHi (u(Xi ) + λξ − λξie ) − 1 6= 0. We can see that if DHi (a) > 0 for all i and a and λ ≥ 0, the regularity condition is e satisfied. Then a BNE is equivalent to the expected group total outcomes, ξ , which is a scalar. The system analyzed by Lee, Li and Lin (2014) corresponds to the special case that Hi (a) = F (a). When each Xip is known to i only and the joint distribution of Xip ’s conditional on the public information Z = z is exchangeable with a pdf fp (·), (4.6.5) takes 130 the following form: e Gi (ξ , ξie ; X, Wn )(x) Z e = Hi (u(X g , Xic , x e) + λξ (e x) − λξie (e x))fp (e x|x)de x − ξie (x) (4.6.8) Xp =0, for all i and x ∈ Xp . In this case, a BNE is equivalent to the expectation of group total e outcomes conditional on the realization of an individual’s self-known characteristics. ξ (·) is a function mapping the realization of that privately known characteristics to a real number. Therefore, focusing on peer effects, the dimension of an equilibrium can be reduced, simplifying estimation. 4.6.3 Deterministic Rule In previous sections, we assume that an equilibrium is selected from the set of equilibria according to a stochastic rule. It is also possible to use a deterministic rule. To be specific, let E(X, Wn ) denote a set of equilibria for a group (X, Wn ). It is equivalent to the set of solutions to a system of (generally nonlinear, functional) equations, S(ξ; X, Wn ) = 0. Let Π(ξ; X, Wn ) denote a real-valued function of equilibria and group features (X, Wn ). For instance, Π(ξ; X, Wn ) can be the expected total utility of the group, or the expected number of market entrants. We select an equilibrium to maximize the objective function: max Π(ξ e ; X, Wn ) s.t. S(ξ e ; X, Wn ) = 0. e ξ (4.6.9) When all exogenous covariates are known to the public, (4.6.9) is just an ordination optimization problem. According to our discussion in Section 4.4, the set of equilibria, or equivalently, the set of zeros for S(·; X, Wn ), is finite. To solve (4.6.9) is to pick one of those finitely many points to maximize the criterion function. In general, however, the conditional expectation function, ξ e , is a vector valued function defined on subsets of the Euclidean spaces, (4.6.9) is a problem of functional optimization. In that case, we may solve the optimization problem using optimal control. 131 For peer effects, especially, assuming that an equilibrium is selected under a fixed equilibrium selection rule, it is possible to derive a simpler estimation method. When both Π(ξ e ; X, Wn ) and S(ξ e ; X, Wn ) are continuous with exogenous characteristics, X, and the set of equilibria is compact, we can apply the Maximum theorem, which claims that the set of solutions to (4.6.9) is an upper-hemicontinuous correspondence of X.5455 If we can assure that there is a unique maximizer (by imposing some convexity conditiona, for example), the unique optimal equilibrium will be a continuous function of the group characteristics of ∗ X. So do the mean, ξ . Therefore, when there is a large number of independent groups, ∗ ∗ we can non-parametrically estimate ξ from group means. By Proposition 4.6.1, with ξ , we can recover conditional expectations on individual behaviors, the ξi∗ ’s. Then we can estimate model parameters either from sample likelihood function or moment conditions, ∗ conditioning on the equilibrium represented by ξ . The equilibrium conditional expectation acts like the group aggregate in the model built by Bisin et al.(2011). It shows that without assuming that the single equilibrium is played repeatedly over time periods or across markets, as long as the same criterion rule is used and there is a unique optimizing equilibrium, we can still use the two-step estimation, if the actions of individuals are influenced by the average of her peers. 4.7 Binary Choice Models: Analysis and Experiments This section discusses in detail the equilibrium sets of two forms of binary choice models, (4.2.5) and (4.2.6). Additionally, by Monte Carlo experiments, there is a comparison about the small sample performances between the maximum likelihood estimation with complete likelihood for multiple equilibria in this paper and the nested fixed point maximum likelihood estimation assuming unique equilibrium used by Yang and Lee(2014). 54 In our discussion on equilibria in the general information structure in Appendix C.4, we show that under some conditions, all equilibria is in a compact set. Since the set of zeros of the continuous function, S(·; X, Wn ), is closed. The set of equilibria is itself compact. 55 As for the Maximum Theorem, see Stokey et al.(1989) for a proof when criterion, Π(·, X, Wn ), and the constraint, S(·; X, Wn ), are functions defined on Euclidean spaces. A proof for general metric spaces can be found in Wikipedia. Seehttp://en.wikipedia.org/wiki/Maximum theorem. 132 4.7.1 Binary Choice Model I Consider the model, (4.2.1) and (4.2.2), where hi (z) = I(z > 0) for all i, such that (4.2.5) holds. According to Yang and Lee(2014), this corresponds to a game where agents in a group simultaneously choose between 0 and 1. In this case, the expected utility following choice “0” is normalized to be zero. Instead, if i chooses “1”, her expected utility is affected P by her expectations about others’ actions, u(Xi ) + λ j6=i Wn,ij E[yj |XJpi , Z] − i . Therefore, P yi = I(u(Xi ) + λ j6=i Wn,ij E[yj |XJpi , Z] − i > 0). The entry game for a group of firms in the same industry is a case in point. Discussions in this section focuses on the case that all Xi ’s are public information and i ’s are i.i.d. normal with mean 0 and variance 1. It follows from the previous analysis that an equilibrium is an n × 1 vector in ξ ∈ [0, 1]n , such that ξi = Φ(ui + λ X Wn,ij ξj ), (4.7.1) j6=i for i = 1, · · · , n, where Φ(·) is the cdf for standard normal distribution and ui is a simplified notation for u(Xi ). The investigation of the equilibrium set begins with the special case that every two group members are associated with each other. That is, Wn,ij = 1 for all i 6= j. As it is shown in Section 4.6, in this case, an equilibrium conditional expectation is a scalar and can be represented as a zero of a nonlinear function. The characteristics of the equilibrium set is summarized by the following proposition, Proposition 4.7.1 In Binary Choice Model I (4.2.5), consider a group of n agents such that any two of them are associated with each other, i.e., Wn,ij = 1 for all i 6= j. Suppose that λ 6= 0. √ • If − 2π < λ < 0, there is a unique equilibrium; • When λ > 0, there is a unique equilibrium if min ui > 1≤i≤n 133 λ , 2 (4.7.2) or λ max ui < − , 1≤i≤n 2 (4.7.3) where Φ(·) and φ(·) are respectively the cdf and pdf of the standard normal distribution. Proof. See Appendix C.6. From Proposition 4.7.1, it is more likely to have multiple equilibria when λ > 0 than it is when λ < 0. Thus, further investigation of the equilibrium set focuses on the case that λ > 0. Figures C.2 to C.5 illustrates the characteristics of the equilibrium set as the group population and interaction intensity varies. Figures C.2 and C.3 show that there is a unique equilibrium when (4.7.2) is satisfied. Figure C.2 corresponds to the case that agents are symmetric with ui = u = 2 for all i. Figure C.3 is for the case that ui ’s are heterogeneous and vary from 1 to 2. When neither (4.7.2) nor (4.7.3) is satisfied, as it is shown by Figures C.4 and C.5, there can be multiple equilibria. In Figure C.4, symmetric agents have ui = u = −2 for all i. However, ui ’s change from -3 to -2 in Figure C.5. In the above graphs, there are at most three equilibria. Actually, the number of equilibria is no more than three under a reasonable condition. Lemma 4.7.1 Suppose that 0 < λ < √ 2 2π 3 . For the function c(a; λ) = λφ(a) + 1 + a2 (2λφ(a) − 1), there is a+ (λ) > 0, such that c(a; λ) > 0 for −a+ (λ) < a < a+ (λ); c(a+ (λ); λ) = c(−a+ (λ); λ) = 0, and c(a; λ) < 0, for a < −a+ (λ) or a > a+ (λ). In addition, a+ (λ) increases with λ. Proof. See Appendix C.6. Proposition 4.7.2 In Binary Choice Model I (4.2.5), consider a group of n agents such that any two of them are associated with each other, i.e., Wn,ij = 1 for all i 6= j. Suppose that 0 < λ < √ 2 2π 3 . There are at most three equilibria if min ui > λΦ(a+ (λ)) + a+ (λ), 1≤i≤n 134 (4.7.4) or max ui < λΦ(−a+ (λ)) − a+ (λ) − λn. (4.7.5) 1≤i≤n Proof. See Appendix C.6. For general social relations, an equilibrium is represented by an n × 1 vector, ξ = 0 (E[y1 ], · · · , E[yn ]) and is a zero of a system of nonlinear equations. Applying Proposition 4.4.2, there is a unique equilibrium if the sign of the determinant of the following matrix does not change as ξ varies in [0, 1]n : P 0 ··· φ(u1 + λ j6=1 Wn,1j ξj ) P 0 φ(u2 + λ j6=2 Wn,2j ξj ) · · · λ . .. .. .. . . 0 0 ··· 0 0 Wn − I n . .. . P φ(un + λ j6=n Wn,nj ξj ) (4.7.6) For a graphical illustration, suppose that u(Xi ) is a linear function: c c u(Xi ) = β0 + Xi,1 β1 + Xi,2 β2 , (4.7.7) c and X c are two commonly known exogenous characteristics. Take β ∗ = 0, where Xi,1 0 i,2 β1∗ = β2∗ = 1. Simulate one sample with G = 100 independent groups, each of which has n = 5 members. In each group, the number of social relations an agent can build is randomly determined, ranging from 0 to n − 1. Based on the randomly generated total link P number for agent i, Fi = j6=i Wn,ij , for each j 6= i, generate a random number, rnn,ij . Then Wn,ij = 1, if rnn,ij is among the Fi largest ones. The social relation matrix, Wn , is not row-normalized. Characteristics of the equilibrium set in this case are illustrated by Figures C.6 and C.7. It can be seen that for Binary Choice Model I, although the sufficient condition for equilibrium uniqueness in Yang and Lee(2014) is violated for a big proportion of groups when λ is a little bit bigger than 0.6, for groups in this sample, when λ is within 0 and 1, there is only a unique equilibrium. As λ increases, the average equilibrium outcomes and the proportion of agents who choose action 1 increase. 135 In the Monte Carlo experiments, the social relation matrix is constructed in the same way as it is for the illustration in Figures C.6 and C.7. There are L = 400 simulations. In each simulated sample, there are G independent groups with homogeneous population n. n is fixed at 5 in the experiments. There are two cases about the number of independent groups, G = 100 and G = 200. The true value of the interaction intensity is λ∗ = 0.8. Table C.1 summarizes the estimation results for three regression methods. Regression I is the conventional Probit estimation without social interactions. Regressions II and III take into account the interactions among socially related agents. Regression II assumes condition (4.3.8) is satisfied and uses the contraction mapping iteration method to solve for the (assumed) unique equilibrium. On the contrary, Regression III does not make restrictive assumptions on the intensity of social interactions, λ, or the number of equilibrium in estimation. It allows for multiple equilibria, uses the homotopy continuation method to compute the equilibrium set, and chooses the equilibria with maximal expected probabilities for choice “1” to complete the model. From the table, it can be seen that Regression III outperforms Regression I and II in terms of parameter estimation biases and the value of estimated average log likelihoods. That is because of the distortions imposed by (4.3.8). From Table C.1, condition (4.3.8) is violated by 67.44% (67.35%)of the groups in the sample on average when the number of groups is G = 100 (G = 200) under the true parameter values. Additionally, the average upper bound on the interaction intensity for a sample imposed by (4.3.8), 0.6267, is very close to the estimates of λ in Regression II (The estimate for λ is 0.6263 when G = 100 and 0.6266 when G = 200.) Therefore, imposing (4.3.8) can be restrictive when it is violated by a large proportion of the groups in a sample. 4.7.2 Binary Choice Model II In the basic framework, (4.2.1) and (4.2.2), take hi (z) = 2I(z > 0) − 1 for all i. Then (4.2.6) holds. Similar to Brock and Durlauf(2001), this model describes the equilibrium outcomes of a simultaneous move game with discrete choices where the utility an agent gets depends on the difference between her own action and those of her friends. To be specific, 136 suppose that in a group of n agents, an agent i can choose two actions, -1 and 1. Her utilities depend on her own choice and those of the agents who she is associated with. If e P Wn,ij yj + e she chooses 1, with others choosing y−i , her utility is u e(Xi , 1) + λ 1i . If she j6=i e P Wn,ij yj + e e chooses -1, her utility is u e(Xi , −1) − λ −1 j6=i i . When λ > 0, an agent benefits e < 0, an from taking the same action as her friends and/or peers do. In contrast, when λ agent gets rewarded by distinguishing herself from her friends and/or peers. Suppose that all exogenous characteristics are public information but the idiosyncratic shocks are private information. As i does not know her friends’ actions when her decisions are made, she has e P Wn,ij E[yj ] + e to maximize her expected utility, which is u e(Xi , 1) + λ 1i for action 1 and j6=i e P Wn,ij yj + e −1 for action -1. Thus, she will choose 1 if u e(Xi , −1) − λ j6=i i e u e(Xi , 1) − u e(Xi , −1) + 2λ X Wn,ij E[yj ] − (e −1 1i ) > 0. i −e j6=i e and i = e Define u(Xi ) = u e(Xi , 1) − u e(Xi , −1), λ = 2λ, −1 1i . Plug them into (4.2.1) i −e and (4.2.2). Choose hi (z) = 2I(z > 0) − 1. Then the Type II model for binary choices is derived. Suppose that (e 1i , e −1 i )’s are i.i.d. normal with zero mean and variance 0 1 2. An 0 equilibrium conditional expectation, ξ = (ξ1 , · · · , ξn ) = (E[y1 ], · · · , E[yn ]) , satisfies X ξi = 2Φ(u(Xi ) + λ Wn,ij ξj ) − 1 j6=i (4.7.8) e = 2Φ(e u(Xi , 1) − u e(Xi , −1) + 2λ X Wn,ij ξj ) − 1. j6=i If any two agents are associated with each other, i.e., Wn,ij = 1 for any i 6= j, λ represents the intensity of influences from any other group member. Based on the previous discussions, in this case, the equilibrium can be described by the group total expected P P outcome, ξ = ni=1 ξi = ni=1 E[yi ]. Similar to Binary Choice Model I, ξ can be described as a zero of a nonlinear function. It is possible to characterize the equilibrium set by analyzing this function. Proposition 4.7.3 In Binary Choice Model II (4.2.5), consider a group of n agents such that any two of them are associated with each other, i.e., Wn,ij = 1 for all i 6= j. Suppose that λ 6= 0. 137 √ • If − 2π 2 < λ < 0, there is a unique equilibrium; • When λ > 0, there is a unique equilibrium if min ui > λn, (4.7.9) max ui < −λn, (4.7.10) 1≤i≤n or 1≤i≤n where Φ(·) and φ(·) are respectively the cdf and pdf of the standard normal distribution. Proof. See Appendix C.6. Compare Proposition 4.7.1 and Proposition 4.7.3, it is easy to see that for both types of the Binary choice models, it is easier to ensure uniqueness when λ < 0 than it is when λ > 0. Additionally, the sufficient conditions for a unique equilibrium is more stringent for the Type II model of binary choices than it is for the Type I model. Equilibrium multiplicity for the Type II Binary Choice model is illustrated by Figures C.8 to C.11. Figures C.8 and C.9 show that there is a unique equilibrium when (4.7.9) is satisfied. Figure ?? is for the case that in a group of population n, ui = n + 1 for all i. Figure C.9 depicts the case for heterogeneous agents such that max1≤i≤n ui = n + 3 and min1≤i≤n ui = n + 1. Figures C.10 and C.11 show that there can be multiple equilibria when both (4.7.9) and (4.7.10) are violated. Figure C.10 corresponds to the case of homogeneous agents with ui = u = 1 for all i. The case for heterogeneous agents with ui ’s randomly change from 1 to 2 is shown in Figure C.11. It is interesting to see that there are multiple equilibria in this case for the Type II model of Binary choices (as it is shown in Figure C.11) and there is a unique equilibrium for the Type I model (as it is shown in Figure C.3). These graphical illustrations confirm that it is more likely to have multiple equilibria in the Type II model of binary choices. Similar to the discussions for the Type I binary choice model, under a certain condition, there are at most three equilibria in the Type II model. 138 Lemma 4.7.2 Suppose that 0 < λ < √ 2π 3 . Fix λ, define a function e c(a; λ) = 2λφ(a) + 1 + a2 (4λφ(a) − 1). There is e a+ (λ) > 0, such that e c(a; λ) > 0, for −e a+ (λ) < a < e a+ (λ); e c(e a+ (λ); λ) = e c(−e a+ (λ); λ) = 0; and e c(a; λ) < 0, for a < −e a+ (λ) or a > e a+ (λ). In addition, e a+ (λ) increases with λ. Proof. See Appendix C.6. Proposition 4.7.4 In Binary Choice Model II (4.2.6), consider a group of n agents such that any two of them are associated with each other, i.e., Wn,ij = 1 for all i 6= j. Suppose √ that 0 < λ < 2π 3 . There are at most three equilibria if min ui > λ(2Φ(e a+ (λ)) − 1 + n) + e a+ (λ), (4.7.11) max ui < λ(2Φ(−e a+ (λ)) − 1 − n) − e a+ (λ). (4.7.12) 1≤i≤n or 1≤i≤n Proof. See Appendix C.6. For general social relation matrix, Wn , an equilibrium is an n × 1 vector satisfying a system of nonlinear equations. Applying Proposition 4.4.2, there is a unique equilibrium if the sign of the determinant of the following matrix does not change as ξ varies in [−1, 1]n : φ(u1 + λ 2λ P j6=1 Wn,1j ξj ) 0 .. . 0 φ(u2 + λ 0 P ··· 0 j6=2 Wn,2j ξj ) · · · .. .. . . 0 ··· 0 .. . P φ(un + λ j6=n Wn,nj ξj ) W n − In . (4.7.13) When there are multiple equilibria, the selection rule is applied to complete the model. Based on previous discussions, there are a finite number of equilibria. Denote them by 139 ξ e,1 , · · · , ξ e,L . Suppose that equilibria are selected according to the total (ex ante) exP pected utilities. That is, γ(ξ e,l , X, W ) = ni=1 Ui (ξ e,l ), where Ui (ξ e,l ) Z X X e e −1 1i > u e(Xi , −1) − λ Wn,ij ξje,l + e = I(e u(Xi , 1) + λ Wn,ij ξje,l + e i ) j6=i j6=i e · (e u(Xi , 1) + λ X j6=i e −1 1 e 1 1i ) 2 φ( i )φ( i )de Wn,ij ξje,l + e 1i de −1 i σ σ σ Z + e I(e u(Xi , 1) + λ X e 1i ≤ u e(Xi , −1) − λ Wn,ij ξje,l + e e · (e u(Xi , −1) − λ j6=i −1 Wn,ij ξje,l + e i ) j6=i j6=i X X e −1 1 e 1i −1 Wn,ij ξje,l + e )φ( i )de 1i de −1 i ) 2 φ( i σ σ σ ξie,l + 1 e X e(Xi , −1) +λ Wn,ij ξje,l ξi + u 2 j6=i Z e,l eP u e(Xi , 1) − u e(Xi , −1) + 2λ 1i 1 e 1 j6=i Wn,ij ξj + e 1 + e i Φ φ( i )de 1i σ σ σ Z e,l eP −1 u e(Xi , −1) − u e(Xi , 1) − 2λ e −1 i 1 j6=i Wn,ij ξj + e −1 + e i Φ φ( i )de −1 i σ σ σ X ξ e,l 1 =(u(Xi ) + λ Wn,ij ξje,l ) i + u(Xi ) + u e(Xi , −1) 2 2 j6=i P Z u(Xi ) + λ j6=i Wn,ij ξje,l + e 1i 1 e 1 φ( i )de 1i + e 1i Φ σ σ σ P Z −1 −u(Xi ) − λ j6=i Wn,ij ξje,l + e e −1 i 1 + e −1 Φ φ( i )de −1 i i σ σ σ =(e u(Xi , 1) − u e(Xi , −1)) In this model, σ = q 1 2. (4.7.14) u e(Xi , −1) is normalized to be equal to 0. Thus, individual expected utilities can be computed given the model primitives, ui ’s, and an equilibrium set. Although there are no explicit analytical forms for the last two integrals in (4.7.14), they can be computed numerically by the Gaussian quadrature. For further investigation of the equilibrium set for general social relations, suppose that u(Xi ) = u e(Xi , 1) − u e(Xi , −1) takes the linear function form (4.7.7). Like the discussions in the Type I Binary Choice model, take β0∗ = 0 and β1∗ = β2∗ = 1. Consider a sample of G = 100 independent groups. Each of them has n = 5 members. In a group, the number of social links an agent has is randomly determined. For each i, generate a random number, P fi . If fi ≥ 0.5, the number of social links for i, Fi = j6=i Wn,ij , is n − 1; otherwise, Fi = n − 2. Given Fi , generate random numbers, rnij for each j 6= i. Then Wn,ij = 1 if rnij 140 is among the Fi largest. Use the homotopy continuation to compute the equilibrium set for this sample. Figures C.12 and C.13 illustrate the characteristics of the equilibrium set and the (selected) equilibrium outcomes. According to Yang and Lee(2014), when |λ| < 0.3133, there is a unique equilibrium. From Figure C.12, there is still a unique equilibrium when λ is a little bit bigger than 0.3133. However, unlike the Type I model for binary choices, as λ continues to increase, some groups have more than one equilibria. Additionally, there is an increasing tendency for the sample average number of equilibria and the proportion of groups with multiple equilibria. Figure C.13 shows the characteristics of the selected outcomes corresponding to the selection criterion (4.7.14). As the interaction intensity, λ, increases, there is not an obvious trend for the sample average expected equilibrium outcomes, actual outcomes, the proportion of agents who choose 1, and the proportion of agents who choose -1. This is different from the characteristics of the equilibrium outcomes in the Type I model of binary choices, where as λ increases more agents choose action 1 and the average equilibrium outcome also increases (see Figure C.7). Samples are generated in the same way for the Monte Carlo experiments. There are L = 400 simulations. Consider two values that λ takes, 0.2 and 0.8. When λ = 0.2, the sufficient condition for equilibrium uniqueness in Yang and Lee (2014) is satisfied. However, when λ = 0.8, as it is shown by Figure C.12, it is possible to have multiple equilibria. In experiments, four estimation method are compared: (1) conventional maximum likelihood estimation without social interactions, i.e., assuming λ = 0; (2) nested fixed point maximum likelihood estimation which assumes a unique equilibrium, restricts λ, and uses contraction mapping iterations to compute the equilibrium; (3) nested fixed point maximum likelihood estimation which assumes a unique equilibrium and computes the equilibrium by solving nonlinear equations without restricting λ; and (4) maximum likelihood estimation for complete likelihood with equilibrium selection which uses the homotopy continuation method to compute the set of equilibria. Results are summarized in the Tables C.2 and C.3. It is obvious that the conventional regression which ignores the interactive effects among socially associated agents can bring in more biases than regressions which take into account 141 social interactions even when the intensity of social interactions is moderate. When λ∗ = 0.2, there is a unique equilibrium according to Yang and Lee(2014). From Table C.2, the last three regression methods have similar performances. However, with large interaction intensity, λ∗ = 0.8, from the tabulated results in Table C.3, Regression IV outperforms the other three regressions. As the upper bound on interaction intensity which ensures equilibrium uniqueness is 0.3133 on average, which is far smaller than the true parameter value, λ∗ = 0.8, when |λ| is restricted within this bound, estimations are biased, which is shown by the performances of Regression II. Since the average estimates for λ by Regression I is 0.3109, which is very close to 0.3133, this upper bound is nearly binding. A potential improvement may be achieved by relaxing the restrictions on |λ| and computing the equilibrium by solving a system of nonlinear equations, using numerical algorithms such as the Newton’s method. This method is used in Regression III. Although this method performs well for moderate interactions when there is a unique equilibrium (as Table C.2 shows), when λ∗ = 0.8, it brings in biased estimation. That is because this method still assumes that there is a unique equilibrium. However, the average number of equilibria in the simulated sample is 2.2918 and 78.11% of groups have more than one equilibria. In Regression IV, equilibrium multiplicity is considered and equilibria are selected according their expected total utilities. It can be seen that Regression IV has much smaller biases and much larger estimated sample log likelihoods than those of other regression methods. That is, the computation intensity of Regression IV is rewarded with good estimation performances when there are multiple equilibria. 4.8 Conclusion In a general framework of social interactions under incomplete information with multiple equilibria, this paper investigates the approach to complete the model, along with identification, computation, and estimation issues. The proposed solution to multiple equilibria extends the random equilibrium selection method used by Bajari et al(2010b) and (2010c) to a setting, which is general in the types of behaviors and information structures. 142 Although this all-solution method can be computationally intensive, it does not impose strong assumptions on the data generating process and can be applied to a broad range of empirical studies. Although the model have specific structures on the interdependence among socially associated agents, it incorporates discrete and continuous choices, bounded and unbounded outcomes, unbounded idiosyncratic shocks, as well as different information structures. Therefore, the characterization of equilibria in this paper complements the existence theorems for the Bayesian Nash Equilibrium in the recent theory literature. Group unobservables are important in empirical studies. In this paper, there is a brief discussion for the case when those unobservables only affect individual choices but not the equilibrium selection rule. It will be an interesting extension if that assumption is relaxed. Using the stochastic selection rule, we do not need to worry about the case that two equilibria score the same according to that criterion, for only the distribution of equilibrium selection matters. If the deterministic rule is used, however, the model will still be incomplete when there is a tie between two equilibria according to the deterministic objective function. However, if we can apply some mathematical tools, such as optimal control, optimization over the equilibrium set may be less computationally intensive than the computations of all the equilibria. Consequently, further investigations of the deterministic rule are of significance. 143 Chapter 5: Conclusion This thesis investigates modeling and estimation of social interactions under a general form of incomplete information. The framework is built on the basis of a simultaneous move game under incomplete information. It is assumed that the observed outcomes come from a Bayesian Nash Equilibrium (BNE), which is equivalent to the equilibrium conditional expectations of the individual outcomes. Utilizing functional fixed point theorems and the intersection theory in differential topology, it characterizes the set of equilibria and proposes effective methods for identification, computation, and estimation. This thesis contributes to the current research in social interactions both theoretically and empirically and also complements the theoretical analysis of the existence of a pure strategy Bayesian Nash Equilibrium and estimation of incomplete information simultaneous move games. My dissertation research is focused on analyzing the effects of social networks on behaviors. Another fundamental problem about networks is how a network depends on people’s actions. When the social relations are not fully predetermined but rather depend on the choices of individuals in a social group, the individual behaviors and social relations may be correlated through some unobservables. In that case, ignoring this correlation can bring in endogenous biases. As a result, it is of theoretical and empirical significance to investigate the formation of social networks and its implications on the social interaction effects. This investigation may be more interesting when there is also asymmetric private information about the social relations. Hence, it is of interest to introduce endogenous network formation in the current framework. 144 Another future research project is to discuss dynamics of the interactions among socially associated agents and the evolution of endogenous networks. With more and more big data sets tracing the behaviors of a big group of agents become available, it is necessary to devise effective estimation methods incorporating time dynamics. Equilibrium multiplicity may still be a challenging issue. However, with dynamics, it might be possible to refine the equilibrium set and reduce the number of equilibria, which can potentially alleviate the computation burden. 145 Bibliography ABREVAYA, J. and SHEN, S. 2014. Estimation of Censored Panel-Data Models with Slope Heterogeneity. Journal of Applied Econometrics, 29:523–548. AGUIRREGABIRIA, V. and MIRA, P. 2002. “Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models”. Econometrica, 70(4):1519– 1543. AGUIRREGABIRIA, V. and MIRA, P. 2013. “Identification of Games of Incomplete Information with Multiple Equilibria and Common Observed Heterogeneity”. working paper. ARADILLAS-LOPEZ, A. 2010. “Semiparametric Estimation of A Simultaneous Game with Incomplete Information”. Journal of Econometrics, 157:409–431. BAJARI, P., HONG, H., KRAINER, J., and NEKIPELOV, D. 2010a. “Estimating Static Models of Strategic Interactions”. Journal of Business and Economic Statistics, 28(4): 469–482. BAJARI, P., HONG, H., KRAINER, J., and NEKIPELOV, D. 2010b. “Computing Equilibria in Static Games of Incomplete Information Using the All-Solution Homotopy”. working paper. BAJARI, P., HONG, H., and RYAN, S. 2010c. “Identification and Estimation of A Discrete Game of Complete Information”. Econometrica, 78(5):1529–1568. BALLESTER, C., CALVÓ-ARMENGOL, H., and ZENOU, Y. 2010. “Who is Who in Networks. Wanted: the Key Player”. Econometrica, 74(5):1403–1417. 146 BISIN, A., MORO, A., and TOPA, G. 2011. “The Empirical Content of Models with Multiple Equilibria in Economies with Social Interactions”. working paper. BLEVINS, J. 2014. Nonparametric Identification of Dynamic Decision Processes with Discrete and Continuous Choices. Quantitative Economics, forthcoming. BONSALL, F. F. 1962. “Lectures on Some Fixed Point Theorems of Functional Analysis”. Tata Institute of Fundemental Research, Bombay. BORKOVSKY, R., DORASZELSKI, U., and KRYUKOV, Y. 2010a. “A User’s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method”. Operations Research, 58(4):1116–1132. BORKOVSKY, R., DORASZELSKI, U., KRYUKOV, Y., and SATTERTHWAITH, M. 2010b. “Learning-by-doing, Organizational Forgetting, and Industry Dynamics”. Econometrica, 78(2):453–508. BOUCHER, V., BRAMOULLé, Y., Djebbari, H., and Fortin, B. 2014. Do Peer Affect Student Achievement? Evidence from Canada Using Group Size Variation. Journal of Applied Econometrics, 29:91–109. BROCK, W. and DURLAUF, S. 2000. “Estimating Static Models of Strategic Interactions”. In HECKMAN, K. and LEAMER, E., editors, Handbook of Econometrics: Edition 1, pages 3297–3380. Elsevier. BROCK, W. and DURLAUF, S. 2001. “Discrete Choice with Social Interactions”. The Review of Economic Studies, 42:430–445. BROCK, W. and DURLAUF, S. 2003. “Multinomial Choice with Social Interactions”. working paper. BROCK, W. and DURLAUF, S. 2007. “Identification of Binary Choice Models with Social Interactions”. Journal of Econometrics, 140:52–75. 147 BROOKS, J. and DINCULEANU, N. 1979. “Conditional Expectations and Weak and Strong Compactness in Spaces of Bochner Integrable Functions”. Journal of Multivariate Analysis, 9:420–427. BRUECKNER, J. 2003. Strategic Interaction among Governments: an Overview of Empirical Studies. International Regional Science Review, 26:175–188. DE FINETTI, B. 1975. Theory of Probability, volume 2. John Wiley& Sons, New York. DEBREU, G. 1970. “Economies with a Finite Set of Equilibria”. Econometrica, 38:387–392. DIERKER, E. 1972. “Two Remarks on the Number of Equilibria of an Economy”. Econometrica, 40:951–953. DUNFORD, N. and SCHWARTZ, J. 1958. “Linear Operators, Part I”. Interscience, New York. DURLAUF, S. 2000. A framework for the study of individual behavior and social interactions. working paper 16,Wisconsin Madison, Social Systems. FOLLAND, G. B. 1999. “Real Analysis: Modern Techniques and Their Applications”. John Wiley & Sons, INC., New York, 2nd edition. GARCIA, C. B. and ZANGWILL, W. I. 1979. “Finding All Solutions to Polynomial Systems and Other Systems of Equations”. Mathematical Programming, 16:159–176. GARCIA, C. B. and ZANGWILL, W. I. 1981. Pathways to Solutions, Fixed Points, and Equilibria. Prentice-Hall Series in Computational Mathematics. GASPAR, J. and JUDD, K. 1997. “Solving Large-Scale Rational Expectations Models”. Macroeconomic Dynamics, 68(2):235–260. GELMAN, A., CARLIN, J. B., STERN, H., B., D. D., VEHTARI, A., and RUBIN, D. 2014. Bayesian Data Analysis. CRC Press, Taylor& Francis Group, 3rd edition. 148 GUILLEMIN, V. and POLLACK, A. 1974. “Differential Topology”. Prentice-Hall, Inc., New Jersy. HARSANYI, J. C. 1967a. “Games with Incomplete Information Played by ‘Bayesian’ players, parts i”. Management Science, 14(3):159–182. HARSANYI, J. C. 1967b. “Games with Incomplete Information Played by ‘Bayesian’ players, parts ii”. Management Science, 14(3):320–334. HOTZ, V. J. and MILLER, R. A. 1993. “Conditional Choice Probabilities and the Estimation of Dynamic Models”. The Review of Economic Studies, 60:497–529. HSIEH, C. S. and LEE, L. 2011. “A Social Interactions Model with Endogenous Friendship Formation and Selectivity”. working paper. JUDD, K. 1998. “Numerical Methods in Economics”. MIT Press, Cambridge, MA. KHAN, M. A. and SUN, Y. N. 1995. “Pure Strategies in Games with Private Information”. Journal of Mathematical Economics, 24:633–653. KHAN, M. A. and SUN, Y. N. 2002. “Non-cooperative Games with Many Players”. In AUMANN, R. J. and HART, S., editors, Handbook of Game Theory with Economic Applications: Edition 1, volume 3, pages 1761–1808. Elsevier. KHAN, M. A. and ZHANG, Y. C. 2014. “On the Existence of Pure-strategy Equilibria in Games with Private Information: A Complete Characterization”. Journal of Mathematical Economics, 50:197–202. KUMAR, A. 2012. Nonparametric Estimation of the Impact of Taxes on Female Labor Supply. Journal of Applied Econometrics, 27:415–439. LAX, D., P. 2002. “Functional Analysis”. John Wilen&Sons, Inc., New York. LEE, L. 2000. “A Numerically Stable Quadrature Procedure for the One-Factor Random Component Discrete Choice Model”. Journal of Econometrics, 95(1):117–129. 149 LEE, L. 2001. “Interpolation, Quadrature and Stochastic Integration”. Econometric Theory, 17:933–961. LEE, L. 2007. “Identification and Estimation of Econometric Models with Group Interactions, Contextual Factors and Fixed Effects”. Journal of Econometrics, 140:333–374. LEE, L. and LI, J. 2009. “Binary Choice under Social Interactions: An Empirical Study With and Without Subjective Data on Expectations”. Journal of Applied Econometrics, 24:257–281. LEE, L., LI, J., and LIN, X. 2014. “Binary Choice Models with Social Network under Heterogeneous Rational Expectations”. Review of Economics and Statistics, 96:402–417. LEUNG, M. 2013. “Two-Step Estimation of Network Formation Models with Incomplete Information”. working paper. LI, S., LIU, Y., and DEININGER, K. 2013. How Important are Endogenous Peer Effects in Group Lending? Estimating a Static Game of Incomplete Information. Journal of Applied Econometrics, 28:864–882. MANSKI, C. 1985. “Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator”. Journal of Econometrics, 27:313–333. MANSKI, C. 1987. “Semiparametric Analysis of Random Effects: Linear Models from Binary Penel Data”. Econometrica, 55:357–362. MANSKI, C. 1993. “Identification of Endogenous Social Effects: The Reflection Problem”. The Review of Economic Studies, 60:531–542. MANSKI, C. 2000. “Economic Analysis of Social Interactions”. Journal of Economic Perspectives, 14:115–136. MANSKI, C. 2004. “Measuring Expectations”. Econometrica, 72:1329–1376. MAS-COLELL, A., D., W. M., and GREEN, J. 1995. “Microeconomic Theory”. Oxford University Press. 150 MELE, A. 2010. “A Structure Model of Segregation in Social Networks”. CeMMAP working paper. MILGROM, P. R. and WEBER, R. J. 1985. “Distributional Strategies for Games with Incomplete Information”. Mathematics of Operations Research, 10:619–632. MOFFITT, R. A. 2000. Policy Interventions, Low-level Equilibria, and Social Interactions. Social Dynamics, 97:45–82. OSBORNE, M. J. and RUBINSTEIN, A. 1994. “A Course in Game Theory”. MIT Press. PORTO, E. D. and REVELLI, F. 2013. Tax-Limited Reaction Functions. Journal of Applied Econometrics, 28:823–839. QU, X. and LEE, L. 2013. “LM Tests for Spatial Correlation in Spatial Models with Limited Dependent Variables”. Regional Science and Urban Economics, 42:430–445. RADNER, R. and ROSENTHAL, R. W. 1982. “Private Information and Pure-strategy Equilibria”. Mathematics of Operations Research, 7:401–409. ROTHENBERG, T. J. 1971. “Identification in Parametric Models”. Econometrica, 39(3): 577–591. RUST, J. 1987. “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher”. Econometrica, 55(5):999–1033. RUSTON, A. F. 1986. “Fredholm Theory in Banach Spaces”. Cambridge University Press. SHANG, Q. and LEE, L. 2011. Two-Step Estimation of Endogenous and Exogenous Group Effects. Econometric Reviews, 30(2):173–207. SIMON, J. 1987. “Commpact Sets in the Space Lp (0, T ; B)”. Annali di Matematica para ed applicata, CXLVI(IV):65–96. STOKEY, N. L., LUCAS, E. J., R., and PRESCOTT, E. C. 1989. “Recursive Methods in Economic Dynamics”. Harvard University Press. 151 TAMER, E. 2003. “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria”. The Review of Economic Studies, 70(1):147–165. TAO, J. and LEE, L. 2014. A Social Interaction Model with an Extreme Order Statistics. The Econometrics Journal, 17(3):197–240. van NEERVEN, J. 2014. “Compactness in the Lebesgue-Boucher spaces Lp (µ; X)”. Indagationes Mathematicae, 25:389–394. WATSON, L., BILLUPS, S., and MORGAN, A. 1987. “HOMPACK: A Suite of Codes for Globally Convergent Homotopy Algorithms”. ACM Transactions on Mathematical Software, 13:281–310. WATSON, L., SOSONKINA, M., MELVILLE, R., MORGAN, A., and WALKER, H. 1997. “Algorithm777: HOMPACK90:A Suite of Fortran 90 Codes for Globally Convergent Homotopy Algorithms”. ACM Transactions on Mathematical Software, 23:514–549. YANG, C. 2014. “Sample Selection with Social Interactions”. working paper. YANG, C. and LEE, L. 2014. “Social Interactions under Incomplete Information with Heterogeneous Expectations”. working paper. YANG, C., QU, X., and LEE, L. 2014. “A Tobit Model with Social Interactions under Incomplete Information”. working paper. 152 Appendix A: Appendix to Chapter 2 A.1 Proofs Proposition 2.3.1. The first part follows straightforwardly from the definition of ψ e and consistency condition of expectations in equilibrium, (2.3.1). For the second part, by the way that a strategy is defined, we have that E[sj (XJpj )|XJpi , z] =E[hj (u(Xj ) + λ X Wj,k ψke (XJpj ) − j )|XJpi , z] = ψje (XJpi ), k6=j when Wi,j 6= 0. Therefore, we have that si (XJpi ) = hi (u(Xi ) + λ X Wi,j E[sj (XJpj )|XJpi , z] − i ). j6=i Since y = (y1 , · · · , yn ) is the realization of actions in this BNE, (2.3.2) holds. The second part is proved. Lemma 2.3.1. From (2.3.6), kψk is non-negative. Moreover, kψk = 0 if and only if ψi (Ai ) = 0 a.e. for any i and Ai ∈ Ai , i.e., ψ = 0. For any real scalar α, Z Z p p |ψi (xpJj )|dFp (xp ) kαψk = max max |(αψ)i (xJj )|dFp (x ) = |α| max max 1≤i≤n {j:Wj,i 6=0} 1≤i≤n {j:Wj,i 6=0} = |α| kψk . 0 Additionally, for any ψ, ψ ∈ Ψ, Z 0 0 |(ψ + ψ )i (xpJj )|dFp (xp ) ψ + ψ = max max 1≤i≤n {j:Wj,i 6=0} Z ≤ max max |ψi (xpJj )|dFp (xp ) + max 1≤i≤n {j:Wj,i 6=0} Z max 1≤i≤n {j:Wj,i 6=0} 0 = kψk + ψ . 153 0 |psii (xpJj )|dFp (xp ) Lemma 2.3.2. Suppose {ψm } is a Cauchy sequence in Ψ. There exists a subsequence such that Z 1 |(ψmk+1 )i (Ai ) − (ψmk )i (Ai )|dFp (X p ) ≤ ψmk+1 − ψmk < k , 2 for any i and Ai ∈ Ai . For every k, define φk such that (φk )i (Ai ) = |(ψm1 )i (Ai )| + k X |(ψml+1 )i (Ai ) − (ψml )i (Ai )|. l=1 and function φ such that (φ)i (Ai ) = |(ψm1 )i (Ai )| + P∞ l=1 |(ψml+1 )i (Ai ) − (ψml )i (Ai )|. It is easy to see that kφk k ≤ kψm1 k + 1. Because φk ↑ φ, by monotonicity convergence, Z Z p p max max |(φ)i (xJj )|dFp (x ) = max max lim |(φk )i (xpJj )|dFp (xp ) ≤ kψm1 k+1. i i {j:Wj,i 6=0} {j:Wj,i 6=0} k→∞ Therefore, φ < ∞ a.e.. So the series ψm1 + P∞ k=1 (ψmk+1 − ψmk ) converges a.e.. ψ = limk→∞ ψmk is well-defined and kψk ≤ kφk. Therefore, ψ ∈ Ψ. By Fatou’s Lemma, Z max max |(ψm )i (XJpj ) − (ψ)i (XJpj )|dFp (X p ) i {j:Wj,i 6=0} Z ≤ max max lim inf |(ψm )i (XJpj ) − (ψmk )i (XJpj )|dFp (X p ) i {j:Wj,i 6=0} k→∞ Z ≤ lim inf max max |(ψm )i (XJpj ) − (ψmk )i (XJpj )|dFp (X p ). k→∞ i {j:Wj,i 6=0} By definition of the Cauchy sequence, limn→∞ kψm − ψk = 0. 0 Proposition 2.3.2. For any two functions, ψ, ψ ∈ Ψ, Z X 0 max |E[ Hi (u(Xi ) + λ Wi,j ψj (XJpi )) T (ψ) − T (ψ ) = max 1≤i≤n {k:Wk,i 6=0} j6=i X 0 − Hi (u(Xi ) + λ Wi,j ψj (XJpi )) |xpJk , z]|dFp (xp ) j6=i ≤ max max 1≤i≤n {k:Wk,i 6=0} = max max 1≤i≤n {k:Wk,i 6=0} |λ|D X |λ|D X Z E[|ψj (XJpi ) − ψj (XJpi )||xpJk , z]dFp (xp ) Z |ψj (XJpi ) − ψj (XJpi )|dFp (X p ) Wi,j 0 j6=i Wi,j 0 j6=i 0 ≤|λ| kW k∞ D| ψ − ψ . Proposition 2.4.1. We prove these results by guess-and-verify. Under “exchangeability”, we have that 154 0 ψie (X) = v(X g , Xic ) + E[Xip |X, z]β + λ P j6=i Wi,j R ψje (Y )fp (Y |X, z)dY . 0 Try ψie (X) = ai + bi X, where ai ∈ < and bi ∈ <L . As X varies, in order that P P 0 0 0 ai + bi X = v(X g , Xic ) + β (µ + CX) + β + λ j6=i Wi,j aj + λ j6=i Wi,j bj (µ + CX), P 0 0 0 for i = 1, · · · , n, it is necessary that bi = β C + λ j6=i Wi,j bi , and P P 0 0 0 0 0 ai = v(X g , Xic ) + β µ + λ j6=i Wi,j aj + λ j6=i Wi,j bj µ. Thus, bi = C β + λC BWi,. , where 0 0 0 0 Wi,. is the i-th row of W , and B = (b1 , · · · , bn ). That in turn gives B = C βln + λC BW , where ln denotes the n × 1 vector with all coordinates equal to 1. By vectorization, b = 0 0 0 vec(B) and b = (ln ⊗C β)+λ(W ⊗C )b. Hence, under the assumption that Inkp −λ(W ⊗C ) is invertible, we have (2.4.11). The n dimensional vector a is characterized by 0 0 a = v + (µ β)ln + λW (a + B µ) 0 0 0 = λW a + v + (µ β)ln + λvec(µ BW ) 0 0 = λW a + v + (µ β)ln + λ(W ⊗ µ )vec(B). From it, follows (2.4.12). A.2 Numerical Methods We consider the case when Xip is known only to i and the joint distribution of Xip ’s conditional on Z = z is “exchangeable”. So conditional on Z = z, Xip ’s have the same distribution. Denote their common support conditional on Z = z as Sp = {x : fp (x) > 0}, where fp (·) is the conditional density of Xip given Z = z. When self-known characteristics are continuously distributed, equilibrium expectation is a function defined on the continuum set, Sp . A simple approximation device is to discretize the support of self-known characteristics. That will transform the problem to the case where self-known characteristics are discrete random variables, which we have already discussed about. However, how refine discretization needs to be remains to be an issue. A general approach to solve a function from functional equations is the “projection method”, introduced by Judd (1998). Its key idea is to approximate the solution by a linear finite combination of a set of function bases. Those coefficients of the linear combination are pinned down by the functional equation. 155 However, inspecting specifics of our model, we suggest an alternative easier approach. By (2.4.4), the value of conditional expectation function at a realization of self-known characteristics is determined by an integral, which can be approximated by Gauss-Legendre quadrature56 . For simplicity, let us consider the case where Xip ’s are one-dimensional. Standard Gauss-Legendre quadrature considers integral over [−1, 1]. But one can extend the range to a finite interval, [a, b], which is assumed to be the support of X p , via a linear transformation. Then we have that Z b X Hi (u(xg , xci , xp ) + λ Wi,j ψje (xp ))fp (xp |x)dxp ψie (x) = a ≈ K X k=1 j6=i ωk Hi (u(xg , xci , X (zk + 1)(b − a) (zk + 1)(b − a) + a) + λ Wi,j ψje ( + a)) 2 2 j6=i b−a (zk + 1)(b − a) + a|x) , · fp ( 2 2 (A.2.1) where zk , for k = 1, · · · , K, are abscissae and ωk are weights. They are fixed. We can see that as long as we get the values of expectation function on those abscissae, xpk = (zk +1)(b−a) 2 + a, for k = 1, · · · , K, we can approximate the value of expectation function at any point in the range of X p . In (A.2.1), when x runs over xp1 , · · · , xpK , using the quadrature approximation, we derive a system of n × K equations about ψie (xpk ), for i = 1, · · · , n and k = 1, · · · , K, just like what we have in the case where Xip ’s are discrete. Hence, approximated values of ψie (xpk )’s can be solved by contraction mapping. Then we can derive approximation of conditional expectation function, ψ e at any value x via (A.2.1). Especially, when we plug in the observed Xip ’s of the data, we can derive values of conditional expectations used to calculate the likelihood function. In practice, a small number of abscissae of the Gauss-Legendre quadrature is enough to get good approximation. In our Monte Carlo experiments, we choose K = 8. When the support of X p is infinite, we may use nonlinear transformation. We illustrate that by an example. Consider the case when Xip ’s are jointly normal. We have shown that 56 Other quadrature methods can also be used. But we need to make sure that two conditions are satisfied. First, the range of integral should be big enough to include possible values that Xip ’s can take. Second, the abscissae used do not change with the realization of X p on which expectation is conditioned. 156 an analytical solution can be derived in some special cases. Generally, we have to solve for equilibrium conditional expectation functions numerically. Specifically, suppose that X1p , · · · , Xnp are jointly normal with mean 1 ρ η2 . . . ρ 0 (µ, · · · , µ) , and variance-covariance matrix ρ ··· ρ 1 · · · ρ . .. . . .. . . . ρ ··· 1 Then for any i 6= j, condition on Xjp = x, Xip is normal with mean ρx + (1 − ρ)µ and variance (1 − ρ2 )η 2 . Then (A.2.1) takes a special form: Z +∞ X e ψi (x) = Hi (u(X g , Xic , xp ) + λ Wi,j ψje (xp )) −∞ j6=i − ρx − (1 − ρ)µ)2 )dxp 2(1 − ρ2 )η 2 2π(1 − ρ2 )η 2 Z 1 X z z = Hi (u(X g , Xic , log( )) + λ Wi,j ψje (log( ))) 1 − z 1 − z 0 ·p 1 exp(− (xp j6=i ·p z (log( 1−z ) − ρx − (1 − ρ)µ)2 1 ) dz 2 2 2(1 − ρ )η z(1 − z) 1 (A.2.2) exp(− 2π(1 − ρ2 )η 2 s Z 1 X 2 ze + 1 ze + 1 g c = H (u(X , X , log( )) + λ Wi,j ψje (log( ))) i i 2 2 π(1 − ρ )η −1 1 − ze 1 − ze j6=i · exp(− ze+1 (log( 1−e z) − ρx − (1 − 2(1 − ρ2 )η 2 ρ)µ)2 ) 1 de z, (e z + 1)(1 − ze) z where the second equality is derived by a change of integration variable, xp = log( 1−z ) and the third equality comes from a transformation of z = ze+1 2 . Then we can use weights and abscissae of standard Gauss-Legendre quadrature57 . When Xip ’s are of multiple dimensions, we can use multiple-dimension quadrature methods. When the dimension is not very high, we can just use the tensor product. For high dimension X p , computation can be intensive. In that case, we can use some monomial formulas, some of which can be found in Judd (1998). Alternatively, we may use stochastic 57 When calculating expectation of a function of a normal random variable, it is conventional to use the Gauss-Hermite quadrature. However, p to use standard weights and abscissae of that method, we need to change integration variable as xp = 2(1 − ρ2 )η 2 z + ρx + (1 − ρ)µ. Then we cannot get a system of equation for values of the expectation function at a fixed set of known points. 157 integral approximation via importance sampling. Let h(ai ) be a density with its support containing the support of Xip such that fp (xp |x)/h(xp ) is well defined. Then we can generate K random draws, say, xpk , from h(·). The stochastic approximation will be p P P g c p e p fp (xk |x) ψie (x) ≈ K1 K k=1 Hi (u(x , xi , xk ) + λ j6=i Wi,j ψj (xk )) h(xp ) . k From these, we can solve ψ e (xpk )’s by contraction mapping, and approximate the function ψie (x) at any point x. A.3 Identification Classical analysis results of global identification are proved as follows. Proposition 2.5.1. In this case, we get a closed-form of equilibrium expectation, c E[Y |W = W , X c = X] = (I − λ∗ W )−1 (β0∗ ln + X β2∗ ). Therefore, if (β ∗ , λ∗ , F ) and e λ, e Fe ) are observationally equivalent for (W , X c ), (I − λ∗ W )−1 (β ∗ ln + X c β ∗ ) = (I − (β, 0 2 e )−1 (βe0 ln + X c βe2 ). Multiply both side by (I − λW e )(I − λ∗ W ) and use the property that λW e )−1 = (I−λW e )−1 W , we get that (I−λW e )(β ∗ ln +X c β ∗ ) = (I−λ∗ W )(βe0 ln +X c βe2 ). W (I−λW 0 2 That is, c c β0∗ λ∗ βe0 e ∗ λβ 0 0 β1∗ 0 0 λ∗ βe1 e ∗0 λβ 1 0 = 0. ln W ln X W X − βe0 − − βe1 − e Hence, if ln W ln X c W X c has full column rank, β0∗ = βe0 , β1∗ = βe1 , and λ∗ = λ. When W is row-normalized, W ln = ln . Then we have that 0 0 0 0 0 c c ∗ ∗ ∗ ∗ ∗ ∗ e e e e e e = 0. ln X W X β0 − β0 + λ β0 − λβ0 β1 − β1 λ β1 − λβ1 In that case, if ln X c W X c has full column rank, we still have that β0∗ = βe0 , β1∗ = βe1 , P ∗ ∗ c ∗ ∗ c e Because F c and λ∗ = λ. j6=i W i,j E[Yj |W = W , X ]−y), Yi |X ,W (y) = 1−F (β0 +Xi β1 +λ we can then identify F∗ . Proposition 2.5.2. Since c0 p0 Yi = β0∗ + X i β1∗ + X i β2∗ + λ∗ P j6=i W i,j E[Yj |J c we have that c p E[Yi |J = J, W = W , X c = X , XJpi = X J i ] P c0 p0 c p = β0∗ + X i β1∗ + X i β2∗ + λ∗ j6=i W i,j E[Yj |J = J, W = W , X c = X , XJpi = X J i ], 158 p = J, W = W , X c = X , XJpi = X J i ] − i , for Ji (i) = 1, i.e., Xip is known by i herself. Therefore, under Assumption 3.4.6, we can identify β0∗ , β1∗ , β2∗ and λ∗ . Then we can recover F∗ from the distribution of Yi ’s. e λ, e Fe ) are observationally equivaProposition 2.5.3. In this case, if (β ∗ , λ∗ , F∗ ) and (β, lent, e ⊗ W ))−1 (P ⊗ In )X e b ∗ = (Inm − λ(P b β, ψ e = (Inm − λ∗ (P ⊗ W ))−1 (P ⊗ In )Xβ where m is the number of possible values that Xip may take. Multiply both sides by e ⊗ e ⊗ W ))(Inm − λ∗ (P ⊗ W )) and use the property that (P ⊗ W )(Inm − λ(P (Inm − λ(P e ⊗ W ))−1 (P ⊗ W ), we get that (Inm − λ(P e ⊗ W ))(P ⊗ In )Xβ b ∗ = W ))−1 = (Inm − λ(P e Therefore, b β. (Inm − λ∗ (P ⊗ W ))(P ⊗ In )X 0 0 0 2 ∗ ∗ ∗ e e e b b = 0. (β − β) (λ β − λβ ) (P ⊗ In )X (P ⊗ W )X e If W is row b (P 2 ⊗ W )X b has full column rank, β ∗ = βe and λ∗ = λ. So if (P ⊗ In )X normalized, (P ⊗ In )ln = (P 2 ⊗ W )ln = ln . Note that P lm = P 2 lm = lm . We have c c lnm lm ⊗ X (P X p,s ) ⊗ ln lm ⊗ (W X ) (P 2 X p,s ) ⊗ ln ) 0 0 0 0 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ e ) e ) (λ βe2 − λβ e (β − βe1 ) (β − βe2 ) (λ βe1 − λβ = 0. β0 − βe0 + βe0 λ − β0 λ 2 1 1 2 Therefore, if lnm lm ⊗ X c (P X p,s ) ⊗ ln lm ⊗ (W X c ) (P 2 X p,s ) ⊗ ln ) has full cole We can then identify F ∗ from the distribution of Y . umn rank, β ∗ = βe and λ∗ = λ. Proposition 2.5.4. Suppose (β, λ) is observationally equivalent to the true (β ∗ , λ∗ ), by the equivalent (identical) equilibrium functions, the corresponding b’s would be equal, and so are the a’s. That is, for the equivalence of b’s, 0 0 0 0 (In ⊗ Ikp − λ(W ⊗ C ))−1 (ln ⊗ C β2 ) = (In ⊗ Ikp − λ∗ (W ⊗ C ))−1 (ln ⊗ C β2∗ ), which, by pre-multiplication to eliminate the inverse, can be written as 0 0 0 0 0 0 (ln ⊗ C β2 ) − λ∗ (W ⊗ C )(ln ⊗ C β2 ) = (ln ⊗ C β2∗ ) − λ(W ⊗ C )(ln ⊗ C β2∗ ). 0 0 0 By rearrangement, it follows that (In ⊗ Ikp − λ∗ W ⊗ C )(ln C )(β2 − β2∗ ) + (W ⊗ C )(ln ⊗ 0 C β2∗ )(λ − λ∗ ) = 0, and hence, 0 0 (ln ⊗ C )(β2 − β2∗ ) + Gc,n (ln ⊗ C )β2∗ (λ − λ∗ ) = 0, 159 (A.3.1) 0 0 where Gc,n = (In ⊗ Ikp − λ∗ W ⊗ C )−1 (W ⊗ C ). For the equivalence of a’s, by denoting Gn = (In − λ∗ W )−1 W , one has c 0 0 ln (β0 − β0∗ ) + X (β1 − β1∗ ) + ln µ (β2 − β2∗ ) + (W ⊗ µ )b(λ − λ∗ ) c 0 0 + Gn [ln β0∗ + X β1∗ + ln µ β2∗ + λ∗ (W ⊗ µ )b](λ − λ∗ ) = 0. Note that (In + λ∗ G n )W 0 0 = Gn and (In ⊗ µ )b = B µ, where B = b1 · · · bn . Therefore, it follows 0 c 0 c 0 0 ln [(β0 −β0∗ )+µ (β2 −β2 )]+X (β1 −β1∗ )+Gn [ln β0∗ +X β1∗ +(ln β2∗ +B )µ](λ−λ∗ ) = 0. (A.3.2) Either of the two assumptions 2.5.8 and 2.5.9, implies that β = β ∗ and λ = λ∗ . Proposition 2.5.5. Notice that c c c Fs−1 (E[Y |W = W , X c = X ]) = β0∗ + X β1∗ + λ∗ W E[Y |W = W , X c = X ]. c e λ) e are observationally equivalent at W and X , If (β ∗ , λ∗ ) and (β, 0 0 c c c ∗ ∗ ∗ e e β0 − β0 (β1 − β1 ) λ − λ = 0. ln , X , W E[Y |W = W , X = X ] c c c Hence, if ln , X , W E[Y |W = W , X = X ] has full column rank, β0∗ = βe0 , β1∗ = βe1 and λ∗ = λ. Proposition 2.5.6. Similarly, we have that c p Fs−1 (E[yi |W = W , X c = X , Xip = X i ]) P c c p = β0∗ + X β1∗ + λ∗ j6=i W i,j E[Y |W = W , X c = X , Xip = X i ]. By the definition of observational equivalence, we get the results. A.4 Unobserved Group Random Effects In our discussion in the main text, exogenous characteristics used by agents to make predictions are all observed by econometricians. In reality, however, there might be exogenous features observed by group members but are unavailable in a data set. For example, students in a class know how patient their math teacher is. But there is no available data about that. To investigate that kind of cases, we incorporate unobserved group characteristics into our model, which is possible for samples from many independent groups. Suppose there are G independent groups. For group g, X g , Xgc , Xgp and Wg represent the observed 160 group features, commonly known individual traits and privately known personal characteristics, as well as social network relations in this group, for g = 1, · · · , G. Suppose that in addition to the group features, X g , which is observable to econometricians, there are some other group characteristics, ω g , which are observed by all agents in group g but not by econometricians. Such unobserved group features might be incorporated into the model such that: p ∗ c yi,g = u(X g , Xi,g , Xi,g , ωg ) + λ X Wij,g E[yj,g |XJpi,g , Zg ] − i,g . (A.4.1) j6=i ω g is now contained in public information for group g: 0 0 0 0 0 c c Zg = (X g , X1,g , · · · , Xn,g , W11,g , · · · , W1n,g , · · · , Wn1,g , · · · , Wnn,g , J1,g , · · · , Jn,g , ωg ) . (A.4.2) By assuming all relevant unobserved group features are additive, they can be represented by a single variable ω g with a pdf fω (·; γ). Therefore, unobserved group features are treated as random effects in our model. Since ω g is publicly known to every group member, for socially interacted agents who are making decisions simultaneously, ω g plays the same role as X g does. However, because X g is observed from data and ω g is not, the presence of ω g affects the distribution of Y . So identification and estimation will be different from the case without unobserved group features. As for identification, we keep on assuming that is independent of X and is distributed with a full support; and the distribution of observable exogenous variables, X = (X g , X c , X p ) can be identified from the data about X. We also assume that u(·) is a linear 0 0 0 c β + X p β + X g β + ω . We normalize the mean of ω to function, u(Xi,g ) = β0 + Xi,g 1 3 g g i,g 2 be zero, E[ωg ] = 0, and assume that ω g ’s are i.i.d. across different groups. According to our discussion in the main text, under certain conditions we can identify β1 , β2 and λ from data about a single group. Then variation across independent groups help us identify β0 , β3 and the distribution of ωg . 161 The presence of unobserved group random effects complicates the sample likelihood. Conditional on ω g , probability distributions of yi,g ’s in group g are independent of each other. However, since ω g is unobserved and is treated as random, econometricians have to integrate over it. Therefore, the sample likelihood can be written as ng G Z Y Y p p c log L(θ; Y, X ) = log f (yi,g |X g , Xi,g , Xi,g , ω g ; β, σ, η)fω (ω g ; γ)dω g g=1 i=1 Z Y ng G X p c = log[ f (yi,g |X g , Xi,g , Xi,g , ω g ; β, σ, η)fω (ω g ; γ)dω g ]. g=1 (A.4.3) i=1 For estimation, the likelihood of yi,g is a nonlinear function of ω g . In the event that an analytical expression of the integral over ω g is not feasible, one can utilize Monte Carlo integration and construct a simulated sample likelihood function, The simulated likelihood function can be estimated with equilibrium expectation solution algorithm nested: b= L G X g=1 S ng 1 XY p c f (yi,g |X g , Xi,g , Xi,g , ωss ; β, σ, η)], log[ S (A.4.4) s=1 i=1 where for each g, ωgs ’s are S independent draws from distribution fω (·; ι). Additionally, for samples with large network size or sizes of groups, there is a practical issue about calculation. In the simulated likelihood, (A.4.4), for each g, the likelihood for all members in a group is calculated for each simulation, ωgs , and then summed up for repeated simulations. If the size of group g, ng , is large and individual likelihoods take small values, their product can be extremely small. As a result, the average simulated group likelihood will be so small that the result will be treated as zero in a computer, i.e. underflow. That will bring in computational issues when the logarithm is taken for it, or serious round errors would occur. To overcome that potential problem, we recommend using the method in Lee (2000) to change the order of summation and multiplication. To be specific, (A.4.4) can be transformed as G X g=1 log( ng S 1 YX p c f (yi,g |X g , Xi,g , Xi,g , ω g,s ; β, σ, η)ξi−1,g,s ), S i=1 s=1 where the weights, ξ’s are determined recursively: ξ0,g,s = 1; 162 (A.4.5) ξi,g,s = PS c , X p , ω s ; β, σ, η)ξ f (Yi,g |X g , Xi,g i−1,g,s g i,g 0 s0 =1 c , X p , ω s ; β, σ, η)ξ f (yi,g |X g , Xi,g g i,g i−1,g,s0 . To investigate the finite sample performance of the estimation algorithm where we nest the solution of a fixed point in a simulated likelihood, we do a Monte Carlo experiment p for the linear model of continuous choices, hi,g (z) = z for all i, g and z, Xi,g is discretely distributed with a finite support and ωg is i.i.d. normal with mean zero and standard deviation, γ = 1. The results are tabulated in Table ??. Compared with the corresponding results without unobservable group random effects, estimation bias will be larger when the number of independent groups is small. But the performance will become much better when the number of independent groups is increased from 100 to 500. We can also see that in the presence of unobserved group random effects, the maximized sample log likelihood is in general bigger for estimation under the true information structure than that under a misspecified information structure. A.5 Identification: Bayesian Analysis By Bayesian analysis, we check parameter identification by comparing prior distribution 0 c β + and posterior distribution. Consider a linear form for u(·) with u(Xi,g ) = β0 + Xi,g 1 0 0 p Xi,g β2 . Denote θ = (β0 , β1 , β2 , λ, σ) . The prior distribution is denoted as π(θ|W ). We assume that the prior distributions of β0 , β1 , β2 , and λ, and σ are independent. For each i, i = 0, 1, and 2, βi is normal with mean µi and standard deviation, σi . λ is truncated normal with mean µ3 and standard deviation σ3 , restricted in [−1/τG , 1/τG ] ∩ [−1, 1], where τG = maxg τg and τg = min g max 1≤i≤n n X |Wij,g |, max 1≤j≤n j=1 n X i=1 |Wij,g | . σ is truncated normal with mean µ4 and standard deviation σ4 , only taking positive values. Thus, the prior distribution relates to the network structure through the range of λ. The hyper-parameters are chosen as µ0 = 0, µ3 = 0.1, µ1 = µ2 = µ4 = 1, and σi = 1, for i = 0, 1, · · · , 4. 163 The sample is generated in the way as described in the Monte Carlo experiments. We consider two models, the linear model for continuous choices with hi,g (z) = z; and the binary choice (probit) model with hi,g (z) = I(z > 0), for all i and g; two information p structures such that Xi,g is either public information in group g or known only to (i, g); and two types of network structures such that the sample is either composed of a number of independent groups or just a single group. The posterior distribution is derived by Metropolis-Hastings algorithm. Beginning with a randomly chosen θ0 , we propose an update from θ to θe according to q(·|·), such that for k = 0, 1, 2, βek − βk ∼ δk N (0, 1); e ∼ T N (λ, δ3 )[−1/τG , 1/τG ] ∩ [−1, 1]; λ σ e ∼ T N (σ, δ4 )(0, ∞); θe is taken with probability α: ( e e e e = min 1, q(θ|θ) P (Y |θ, X, W ) π(θ) α(θ, θ) e P (Y |θ, X, W ) π(θ) q(θ|θ) ) . e we can verify that the detailed balance condition For any two parameter vectors, θ and θ, holds as follows: e P r(θ|θ)P (θ|Y, X, W ) P (Y |θ, X, W )π(θ|W ) b X, W )π(θ|W b )dθb P (Y |θ, ( ) e P (Y |θ, e X, W ) π(θ|W e ) q(θ|θ) e R = min 1, q(θ|θ) e P (Y |θ, X, W ) π(θ|W ) q(θ|θ) ( ) e P (Y |θ, X, W ) π(θ|W ) q(θ|θ) eR = min 1, q(θ|θ) e P (Y |θ, e X, W ) π(θ|W e ) q(θ|θ) e e e θ)q(θ|θ) e R P (Y |θ, X, W )π(θ|W ) =α(θ, b X, W )π(θ|W b )dθb P (Y |θ, e θ|θ) e R =α(θ, θ)q( e (θ|Y, e X, W ). =P r(θ|θ)P 164 P (Y |θ, X, W )π(θ|W ) b X, W )π(θ|W b )dθb P (Y |θ, e X, W )π(θ|W e ) P (Y |θ, b X, W )π(θ|W b )dθb P (Y |θ, As for the step of updating distribution, δ, we begin with some initial guesses and than make adjustments according to the standard deviation of draws that have been generated, as is conventional in the literature (Gelman et al. (2014)). Since our main focus is the intensity of social interaction, λ, we show prior and posterior distributions of this parameter in Figures A.1 and A.2. When there are a number of independent groups, it is easy to tell the posterior distribution apart from the prior distribution, which is an evidence of identification. If there is only one single group in the sample, due to inter-personal interactions, it is much harder to distinguish the posterior distribution from the prior distribution, especially for the binary choice model, as lots of information about the latent variable is lost when we only have binary choices. However, an increase in the sample size can help with identification. 165 Table A.1: Linear Model with Discrete Characteristics for Independent Groups True parameters Publicly Known Characteristics Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0007 (0.0299) -0.0001 (0.0130) 0.0007 (0.0310) -0.0001 (0.0138) β1 1 1.0011 (0.0224) 0.9999 (0.0103) 1.0012 (0.0227) 0.9999 (0.0104) β2 1 0.9989 (0.0465) 1.0007 (0.0204) 0.9988 (0.0468) 1.0007 (0.0206) λ 0.3 0.2993 (0.0293) 0.3000 (0.0133) 0.2993 (0.0314) 0.2999 (0.0144) σ 1 0.9999 (0.0156) 0.9999 (0.0071) 1.0031 (0.0156) 1.0031 (0.0071) -1.4187 (0.0156) -1.4188 (0.0071) -1.4219 (0.0155) -1.4220 (0.0071) 0.9700 1.0000 - - m log L rtrue True parameters Self-Known Characteristics Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0007 (0.0308) -0.0001 (0.0137) 0.0322 (0.0302) 0.0315 (0.0132) β1 1 1.0011 (0.0226) 0.9999 (0.0104) 1.0070 (0.0223) 1.0060 (0.0102) β2 1 0.9989 (0.0467) 1.0007 (0.0206) 0.9768 (0.0468) 0.9784 (0.0206) λ 0.3 0.2993 (0.0313) 0.2999 (0.0143) 0.2597 (0.0318) 0.2604 (0.0144) σ 1 0.9999 (0.0156) 0.9999 (0.0071) 1.0026 (0.0156) 1.0026 (0.0071) -1.4187 (0.0156) -1.4188 (0.0071) -1.4214 (0.0156) -1.4215 (0.0071) 0.9590 0.9990 - - m log L rtrue Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 166 Table A.2: Binary Choice with Discrete Characteristics for Independent Groups True parameters Publicly Known Characteristics Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0008 (0.1436) -0.0008 (0.0619) -0.0011 (0.1553) -0.0007 (0.0699) β1 1 1.0030 (0.0475) 1.0010 (0.0202) 1.0027 (0.0475) 1.0007 (0.0201) β2 1 1.0022 (0.0747) 1.0019 (0.0336) 1.0022 (0.0748) 1.0017 (0.0336) λ 0.3 0.2995 (0.2080) 0.3013 (0.0899) 0.3021 (0.2269) 0.3010 (0.1021) -0.4381 (0.0119) -4.3850 (0.0051) -0.4382 (0.0119) -0.4386 (0.0051) 0.6440 0.7750 - - m log L rtrue True parameters Self-Known Characteristics Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0026 (0.1535) -0.0011 (0.0701) 0.0381 (0.1437) 0.0378 (0.0625) β1 1 1.0030 (0.0479) 1.0011 (0.0200) 1.0033 (0.0479) 1.0012 (0.0200) β2 1 1.0025 (0.0751) 1.0019 (0.0338) 1.0000 (0.0750) 0.9995 (0.0338) λ 0.3 0.3036 (0.2244) 0.3015 (0.1027) 0.2428 (0.2094) 0.2432 (0.0913) -0.4382 (0.0120) -0.4385 (0.0051) -0.4383 (0.0120) -0.4386 (0.0051) 0.5910 0.7640 - - m log L rtrue Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 167 Table A.3: Linear Model with Continuous Characteristics for Independent Groups True parameters Publicly Known Characteristics Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0005 (0.0192) 0.0001 (0.0088) -0.0050 (0.0487) -0.0331 (0.0215) β1 1 1.0007 (0.0212) 0.9999 (0.0099) 1.0012 (0.0244) 1.0007 (0.0111) β2 1 0.9994 (0.0139) 0.9998 (0.0063) 1.0412 (0.0246) 1.0424 (0.0109) λ 0.3 0.3004 (0.0119) 0.3001 (0.0052) 0.2945 (0.0464) 0.2927 (0.0202) σ 1 0.9985 (0.0160) 0.9997 (0.0072) 1.1183 (0.0195) 1.1201 (0.0087) -1.4173 (0.0161) -1.4186 (0.0072) -1.5306 (0.0174) -1.5324 (0.0078) 1.0000 1.0000 - - m log L rtrue True parameters Self-Known Characteristics Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0018 (0.0471) 0.0005 (0.0202) 0.2586 (0.0288) 0.2607 (0.0133) β1 1 1.0007 (0.0223) 0.9999 (0.0102) 1.0109 (0.0223) 1.0105 (0.0103) β2 1 1.0015 (0.0199) 0.9999 (0.0085) 1.1102 (0.0138) 0.1103 (0.0062) λ 0.3 0.2993 (0.0323) 0.2999 (0.0136) 0.0425 (0.0149) 0.0417 (0.0066) σ 1 0.9985 (0.0160) 0.9997 (0.0072) 1.0142 (0.0162) 1.0155 (0.0073) -1.4173 (0.0161) -1.4186 (0.0073) -1.4329 (0.0160) -1.4343 (0.0072) 1.0000 1.0000 - - m log L rtrue True parameters µ 1 Distribution of Self-Known Characteristics n = 20, G = 100 n = 20, G = 500 1.0054 (0.0037) 0.9993 (0.0570) η 2 1.9964 (0.0637) 2.0000 (0.0280) ρ 0.4 0.3969 (0.0384) 0.4000 (0.0164) Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 168 Table A.4: Binary Choice with Continuous Characteristics for Independent Groups True parameters Publicly Known Characteristics Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0074 (0.1167) 0.0013 (0.0516) -0.0129 (0.2965) 0.0094 (0.1395) β1 1 1.0079 (0.0612) 1.0000 (0.0264) 1.0050 (0.0611) 0.9776 (0.0264) β2 1 1.0055 (0.0488) 1.0003 (0.0219) 1.0030 (0.0564) 0.9991 (0.0257) λ 0.3 0.3095 (0.1759) 0.2984 (0.0769) 0.3129 (0.4627) 0.2809 (0.2162) -0.2550 (0.0138) -0.2565 (0.0062) -0.2557 (0.0138) -0.2571 (0.0062) 0.7830 0.9820 - - m log L rtrue True parameters Self-Known Characteristics Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0064 (0.2898) 0.0129 (0.1410) 0.1509 (0.1195) 0.1681 (0.0524) β1 1 1.0080 (0.0612) 1.0000 (0.0264) 1.0081 (0.0613) 0.9999 (0.0264) β2 1 1.0064 (0.0554) 1.0014 (0.0258) 1.0227 (0.0486) 1.0171 (0.0218) λ 0.3 0.3081 (0.4519) 0.2805 (0.2183) 0.0536 (0.1777) 0.0384 (0.0773) -0.2549 (0.0136) -0.2565 (0.0061) -0.2550 (0.0136) -0.2565 (0.0061) 0.5640 0.6980 - - m log L rtrue True parameters µ 1 Distribution of Self-Known Characteristics n = 20, G = 100 n = 20, G = 500 1.0054 (0.1337) 0.9993 (0.0571) η 2 1.9964 (0.0637) 2.0000 (0.0280) ρ 0.4 0.3969 (0.0384) 0.4000 (0.0164) Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 169 Table A.5: Linear Model with Discrete Characteristics and Constant Friend Number for a Single Group True Parameters β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.0117 (0.2239) 0.9952 (0.0702) 1.0026 (0.1477) 0.2741 (0.3724) 0.9886 (0.0506) 0.0015 (0.2141) 1.0008 (0.0314) 1.0013 (0.0651) 0.2961 (0.3696) 0.9978 (0.0216) -0.0054 (0.0978) 1.0008 (0.0314) 1.0009 (0.0653) 0.3080 (0.1625) 0.9978 (0.0216) 0.0170 (0.2369) 0.9953 (0.0700) 1.0019 (0.1477) 0.2642 (0.3979) 0.9891 (0.0507) 0.0026 (0.2259) 1.0009 (0.0314) 1.0010 (0.0652) 0.2942 (0.3906) 0.9979 (0.0216) -0.0055 (0.1093) 1.0008 (0.0314) 1.0010 (0.0654) 0.3079 (0.1827) 0.9982 (0.0216) -1.4062 (0.0510) 0.5490 -1.4165 (0.0216) 0.5700 -1.4165 (0.0216) 0.6600 -1.4066 (0.0510) - -1.4166 (0.0217) - -1.4169 (0.0217) - Self-Known Characteristics Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 True Parameters n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0167 (0.2368) 0.9954 (0.0699) 1.0018 (0.1474) 0.2654 (0.3976) 0.9888 (0.0507) 0.0029 (0.2258) 1.0008 (0.0314) 1.0010 (0.0652) 0.2938 (0.3901) 0.9978 (0.0216) -0.0056 (0.1093) 1.0008 (0.0314) 1.0010 (0.0653) 0.3083 (0.1830) 0.9978 (0.0216) 0.0439 (0.2261) 0.9953 (0.0702) 1.0008 (0.1478) 0.2189 (0.3772) 0.9889 (0.0505) 0.0329 (0.2177) 1.0008 (0.0314) 1.0008 (0.0651) 0.2415 (0.3759) 0.9979 (0.0216) 0.0283 (0.0977) 1.0012 (0.0314) 0.9989 (0.0653) 0.2506 (0.1630) 0.9981 (0.0216) -1.4063 (0.0510) 0.5280 -1.4166 (0.0217) 0.5130 -1.4165 (0.0216) 0.6380 -1.4064 (0.0510) - -1.4166 (0.0217) - -1.4168 (0.0216) - m log L rtrue Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 170 Table A.6: Linear Model with Discrete Characteristics and Random Friend Number for a Single Group Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.0066 (0.1596) 1.0042 (0.0714) 1.0002 (0.1488) 0.2910 (0.2459) 0.9894 (0.0501) 0.0028 (0.1138) 1.0015 (0.0314) 1.0000 (0.0657) 0.2948 (0.1935) 0.9995 (0.0228) 0.0008 (0.0645) 1.0015 (0.0314) 1.0003 (0.0656) 0.2977 (0.1025) 0.9994 (0.0228) 0.0077 (0.1729) 1.0045 (0.0714) 1.0002 (0.1481) 0.2888 (0.2692) 0.9903 (0.0503) 0.0018 (0.1263) 1.0015 (0.0314) 1.0000 (0.0658) 0.2961 (0.2142) 0.9996 (0.0228) 0.0029 (0.0710) 1.0015 (0.0314) 0.9999 (0.0656) 0.2942 (0.1139) 1.0003 (0.0220) -1.4070 (0.0508) 0.5760 29.4344 -1.4181 (0.0228) 0.6020 149.6204 -1.4181 (0.0228) 0.7640 29.5245 -1.4079 (0.0509) 29.4344 -1.4183 (0.0228) 149.6204 -1.4190 (0.0229) 29.5245 True Parameters n = 200 U F = 59 Self-Known n = 1000 U F = 299 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0083 (0.1722) 1.0044 (0.0714) 1.0002 (0.1480) 0.2875 (0.2677) 0.9895 (0.0501) 0.0016 (0.1263) 1.0015 (0.0314) 1.0000 (0.0658) 0.2965 (0.2142) 0.9994 (0.0228) 0.0027 (0.0711) 1.0016 (0.0314) 1.0000 (0.0658) 0.2944 (0.1140) 0.9994 (0.0228) 0.0387 (0.1599) 1.0048 (0.0714) 0.9958 (0.1491) 0.2367 (0.2486) 0.9900 (0.0500) 0.0347 (0.1143) 1.0018 (0.0314) 0.9987 (0.0658) 0.2398 (0.1948) 0.9996 (0.0228) 0.0329 (0.0645) 1.0025 (0.0314) 0.9958 (0.0660) 0.2436 (0.1035) 1.0000 (0.0228) -1.4071 (0.0508) 0.5800 29.4344 -1.4181 (0.0228) 0.6040 149.6204 -1.4181 (0.0228) 0.7020 29.5245 -1.4076 (0.0507) 29.4344 -1.4183 (0.0228) 149.6204 -1.4187 (0.0228) 29.5245 True Parameters β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue mF m log L rtrue mF Self-Known Characteristics Publicly Known n = 1000 n = 200 n = 1000 n = 1000 U F = 59 U F = 59 U F = 299 U F = 59 Note: n is the number of agents in the group. U F is the maximum number of friends a person can make. mF is the average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 171 Table A.7: Binary Choice with Discrete Characteristics and Constant Friend Number for a Single Group True Parameters β0 0 β1 1 β2 1 λ 0.3 m log L rtrue Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.1266 (0.5925) 1.0290 (0.1554) 1.0398 (0.2591) 0.1039 (0.8816) 0.1239 (0.5687) 1.0073 (0.0678) 1.0051 (0.1110) 0.1101 (0.8695) 0.0363 (0.4429) 1.0077 (0.0672) 1.0054 (0.1106) 0.2447 (0.6713) 0.1475 (0.5921) 1.0290 (0.1557) 1.0359 (0.2586) 0.0742 (0.8804) 0.1350 (0.5848) 1.0073 (0.0678) 1.0045 (0.1111) 0.0933 (0.8941) 0.0562 (0.4725) 1.0077 (0.0672) 1.0048 (0.1105) 0.2144 (0.7191) -0.4307 (0.0381) 0.5950 -0.4369 (0.0171) 0.5630 -0.4367 (0.0170) 0.5420 -0.4309 (0.0381) - -0.4369 (0.0171) - -0.4368 (0.0170) - Self-Known Characteristics Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 True Parameters n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 0.1505 (0.5902) 1.0286 (0.1554) 1.0362 (0.2584) 0.0684 (0.8795) 0.1351 (0.5838) 1.0073 (0.0677) 1.0043 (0.1110) 0.0934 (0.8930) 0.0583 (0.4716) 1.0073 (0.0675) 1.0043 (0.1101) 0.2110 (0.7173) 0.1400 (0.5941) 1.0286 (0.1551) 1.0399 (0.2588) 0.0823 (0.8851) 0.1382 (0.5704) 1.0073 (0.0677) 1.0049 (0.1109) 0.0883 (0.8728) 0.0660 (0.4466) 1.0073 (0.0675) 1.0047 (0.1102) 0.1993 (0.6774) -0.4310 (0.0380) 0.4020 -0.4369 (0.0171) 0.4480 -0.4369 (0.0170) 0.4670 -0.4308 (0.0380) - -0.4369 (0.0171) - -0.4369 (0.0170) - m log L rtrue Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 172 Table A.8: Binary Choice with Discrete Characteristics and Random Friend Number for a Single Group Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.0952 (0.4874) 1.0413 (0.1625) 1.0420 (0.2625) 0.1576 (0.7243) 0.0738 (0.4485) 1.0046 (0.0663) 1.0030 (0.1075) 0.1900 (0.6807) 0.0120 (0.2650) 1.0045 (0.0669) 1.0037 (0.1078) 0.2838 (0.4025) 0.0966 (0.4907) 1.0411 (0.1622) 1.0405 (0.2611) 0.1554 (0.7304) 0.0929 (0.4662) 1.0046 (0.0663) 1.0027 (0.1074) 0.1609 (0.7074) 0.0191 (0.2760) 1.0044 (0.0668) 1.0034 (0.1075) 0.2728 (0.4310) -0.4280 (0.0381) 0.5100 29.4344 -0.4378 (0.0167) 0.5000 149.6204 -0.4381 (0.0169) 0.5300 29.5245 -0.4280 (0.0380) 29.4344 -0.4379 (0.0167) 149.6204 -0.4381 (0.0169) 29.5245 True Parameters n = 200 U F = 59 Self-Known n = 1000 U F = 299 β0 0 β1 1 β2 1 λ 0.3 0.0934 (0.4910) 1.0415 (0.1611) 1.0404 (0.2611) 0.1599 (0.7302) 0.0947 (0.4655) 1.0047 (0.0664) 1.0049 (0.1073) 0.1582 (0.7065) 0.0192 (0.2756) 1.0045 (0.0665) 1.0033 (0.1071) 0.2727 (0.4212) 0.1062 (0.4858) 1.0415 (0.1613) 1.0415 (0.2624) 0.1401 (0.7218) 0.0918 (0.4502) 1.0048 (0.0664) 1.0031 (0.1074) 0.1623 (0.6837) 0.0311 (0.2662) 1.0045 (0.0666) 1.0031 (0.1074) 0.2543 (0.4060) -0.4281 (0.0378) 0.5040 29.4344 -0.4378 (0.0166) 0.5070 149.6204 -0.4380 (0.0168) 0.5270 29.5245 -0.4280 (0.0507) 29.4344 -0.4378 (0.0166) 149.6204 -0.4381 (0.0168) 29.5245 True Parameters β0 0 β1 1 β2 1 λ 0.3 m log L rtrue mF m log L rtrue mF Self-Known Characteristics Publicly Known n = 1000 n = 200 n = 1000 n = 1000 U F = 59 U F = 59 U F = 299 U F = 59 Note: n is the number of agents in the group. U F is the maximum number of friends a person can make. mF is the average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 173 Table A.9: Linear Model with Continuous Characteristics and Constant Friend Number for a Single Group True Parameters β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.0040 (0.5329) 1.0008 (0.0698) 0.9995 (0.0460) 0.3086 (0.2362) 0.9909 (0.0496) 0.0179 (0.5118) 1.0003 (0.0316) 1.0007 (0.0215) 0.2864 (0.2302) 0.9973 (0.0225) -0.0004 (0.2137) 1.0003 (0.0316) 1.0007 (0.0215) 0.3028 (0.0926) 0.9974 (0.0225) 0.1510 (0.6050) 1.0011 (0.0697) 0.8927 (0.1564) 0.2710 (0.3875) 0.9941 (0.0499) 0.2124 (0.6147) 1.0003 (0.0306) 0.9089 (0.1350) 0.2332 (0.3393) 0.9980 (0.0224) 0.1337 (0.4495) 1.0004 (0.0318) 0.8910 (0.0692) 0.2827 (0.1673) 1.0010 (0.0225) -1.4085 (0.0501) 0.6610 -1.4160 (0.0225) 0.6680 -1.4160 (0.0225) 0.9350 -1.4118 (0.0501) - -1.4166 (0.0225) - -1.4197 (0.0225) - Self-Known Characteristics Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 True Parameters n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0138 (0.4033) 1.0011 (0.0700) 1.0034 (0.1742) 0.2908 (0.3818) 0.9910 (0.0498) 0.0029 (0.3356) 1.0004 (0.0316) 1.0004 (0.1473) 0.3000 (0.3277) 0.9974 (0.0224) -0.0035 (0.1722) 1.0003 (0.0316) 0.9988 (0.0754) 0.3042 (0.1647) 0.9974 (0.0225) 0.2034 (0.4569) 1.0002 (0.0701) 1.1310 (0.0474) 0.0821 (0.2177) 0.9920 (0.0449) 0.2180 (0.4382) 1.0002 (0.0316) 1.1316 (0.0237) 0.0635 (0.2146) 0.9977 (0.0225) 0.2034 (0.2028) 1.0007 (0.0316) 1.1314 (0.0237) 0.0775 (0.0854) 0.9986 (0.0225) -1.4086 (0.0503) 0.5680 -1.4161 (0.0225) 0.5820 -1.4160 (0.0225) 0.7850 -1.4097 (0.0500) - -1.4163 (0.0225) - -1.4173 (0.0226) - m log L rtrue Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 174 Table A.10: Linear Model with Continuous Characteristics and Random Friend Number for a Single Group Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.0043 (0.2582) 0.9985 (0.0738) 0.9993 (0.0465) 0.3025 (0.1329) 0.9871 (0.0504) -0.0085 (0.2434) 0.9994 (0.0334) 1.0007 (0.0207) 0.2995 (0.1126) 0.9978 (0.0227) -0.0038 (0.1197) 0.9994 (0.0335) 1.0006 (0.0207) 0.3016 (0.0555) 0.9978 (0.0227) 0.0668 (0.4144) 0.9988 (0.0742) 0.8803 (0.0949) 0.3092 (0.2243) 0.9954 (0.0507) 0.1383 (0.4191) 0.9995 (0.0335) 0.8910 (0.0790) 0.2866 (0.1958) 1.0002 (0.0228) 0.1088 (0.3379) 0.9991 (0.0336) 0.8862 (0.0435) 0.3033 (0.1040) 1.0068 (0.0230) -1.4047 (0.0512) 0.8070 29.4867 -1.4165 (0.0228) 0.8710 149.5960 -1.4165 (0.0227) 0.9860 29.5193 -1.4030 (0.0510) 29.4867 -1.4189 (0.0228) 149.5960 -1.4254 (0.0228) 29.5193 True Parameters n = 200 U F = 59 Self-Known n = 1000 U F = 299 β0 0 β1 1 β2 1 λ 0.3 σ 1 -0.0094 (0.2248) 0.9986 (0.0471) 0.9934 (0.0999) 0.3129 (0.2122) 0.9871 (0.0503) -0.0004 (0.1952) 0.9994 (0.0334) 1.0003 (0.0851) 0.3007 (0.1882) 0.9978 (0.0227) -0.0040 (0.1005) 0.9993 (0.0334) 0.9989 (0.0451) 0.3041 (0.0947) 0.9978 (0.0227) 0.1554 (0.2698) 0.9983 (0.0743) 1.1282 (0.1481) 0.0968 (0.1296) 0.9909 (0.0509) 0.1432 (0.2648) 0.9995 (0.0335) 1.1305 (0.0235) 0.0910 (0.1103) 0.9988 (0.0228) 0.1332 (0.1852) 1.0007 (0.0337) 1.1279 (0.0236) 0.0983 (0.0573) 1.0017 (0.0228) -1.4047 (0.0510) 0.7000 29.4867 -1.4165 (0.0228) 0.7590 149.5960 -1.4165 (0.0228) 0.9400 29.5193 -1.4085 (0.0515) 29.4867 -1.4175 (0.0228) 149.5960 -1.4204 (0.0227) 29.5193 True Parameters β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue mF m log L rtrue mF Self-Known Characteristics Publicly Known n = 1000 n = 200 n = 1000 n = 1000 U F = 59 U F = 59 U F = 299 U F = 59 Note: n is the number of agents in the group. U F is the maximum number of friends a person can make. mF is the average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 175 Table A.11: Binary Choice with Continuous Characteristics and Constant Friend Number for a Single Group True Parameters β0 0 β1 1 β2 1 λ 0.3 m log L rtrue Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.2318 (2.4761) 1.3816 (7.4183) 1.3554 (6.1150) 0.0654 (0.8985) 0.1781 (0.6683) 1.0146 (0.1008) 1.0138 (0.0887) 0.0436 (0.8995) 0.0740 (0.5812) 1.0047 (0.1008) 1.0142 (0.0884) 0.2020 (0.7420) 0.0041 (6.7784) 1.9455 (25.0543) 0.9883 (25.9679) 0.0126 (0.9487) 0.1963 (0.6316) 1.0046 (0.1007) 1.0124 (0.1085) 0.0127 (0.9523) 0.1354 (0.5812) 1.0146 (0.1008) 1.0060 (0.0884) 0.1078 (0.7420) -0.2497 (0.0856) 0.6260 -0.2559 (0.0812) 0.6180 -0.2557 (0.0812) 0.5660 -0.2502 (0.0858) - -0.2506 (0.0812) - -0.2559 (0.0812) - Self-Known Characteristics Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 True Parameters n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 0.0740 (0.5812) 1.0147 (0.1008) 1.0142 (0.0884) 0.2020 (0.7420) 0.1756 (0.6206) 1.0138 (0.1007) 1.0322 (0.1120) 0.0205 (0.9525) 0.0194 (0.5665) 1.0140 (0.0988) 1.0250 (0.1056) 0.1245 (0.8755) 0.1354 (0.5745) 1.0146 (0.1010) 1.0060 (0.1040) 0.1078 (0.8663) 0.2013 (0.6546) 1.0138 (0.1007) 1.0341 (0.0918) -0.0183 (0.8977) 0.1435 (0.5750) 1.0138 (0.0987) 1.0341 (0.0898) 0.0698 (0.7499) -0.2557 (0.0812) 0.5660 -0.2559 (0.0788) 0.3900 -0.2558 (0.0787) 0.4750 -0.2559 (0.0812) - -0.2558 (0.0787) - -0.2557 (0.0787) - m log L rtrue Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 176 Table A.12: Binary Choice with Continuous Characteristics and Random Friend Number for a Single Group Publicly Known Characteristics Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.1675 (0.7507) 1.0839 (0.3773) 1.0867 (0.2811) 0.0980 (0.7758) 0.1569 (0.8577) 1.0652 (1.7943) 1.0495 (1.2264) 0.1173 (0.7447) 0.0719 (0.7611) 1.0659 (1.7943) 1.0508 (1.2264) 0.2386 (0.5099) 0.2003 (0.7347) 1.0828 (0.3796) 1.0815 (0.2880) 0.0381 (0.8444) 0.1930 (0.8496) 1.0661 (1.8275) 1.0457 (1.2506) 0.0595 (0.8241) 0.1181 (0.7713) 1.0664 (1.8276) 1.0395 (1.2502) 0.1760 (0.6042) -0.2500 (0.0838) 0.5270 29.4867 -0.2514 (0.0822) 0.5240 149.5960 -0.2514 (0.0821) 0.5430 29.5193 -0.2503 (0.0838) 29.4867 -0.2514 (0.0822) 149.5960 -0.2515 (0.0822) 29.5193 True Parameters n = 200 U F = 59 Self-Known n = 1000 U F = 299 β0 0 β1 1 β2 1 λ 0.3 0.1823 (0.7438) 1.0870 (0.3982) 1.1083 (0.3402) 0.0452 (0.8468) 0.1771 (0.8437) 1.0650 (1.8271) 1.0644 (1.2500) 0.0586 (0.8225) 0.1032 (0.7699) 1.0660 (1.8272) 1.0580 (1.2496) 0.1736 (0.6094) 0.2034 (0.7555) 1.0879 (0.3967) 1.1140 (0.3364) 0.0183 (0.7761) 0.1934 (0.8576) 1.0641 (1.7939) 1.0682 (1.2257) 0.0303 (0.7535) 0.1274 (0.7589) 1.0650 (1.7939) 1.0688 (1.2258) 0.1267 (0.5160) -0.2500 (0.0816) 0.4830 29.4867 -0.2512 (0.0794) 0.4640 149.5960 -0.2513 (0.0793) 0.5290 29.5193 -0.2498 (0.0816) 29.4867 -0.2512 (0.0794) 149.5960 -0.2513 (0.0793) 29.5193 True Parameters β0 0 β1 1 β2 1 λ 0.3 m log L rtrue mF m log L rtrue mF Self-Known Characteristics Publicly Known n = 1000 n = 200 n = 1000 n = 1000 U F = 59 U F = 59 U F = 299 U F = 59 Note: n is the number of agents in the group. U F is the maximum number of friends a person can make. mF is the average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 177 Table A.13: Linear Model with Discrete Characteristics and Unobserved Group Random Effects True parameters Publicly Known Characteristics Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0026 (0.1906) 0.0004 (0.0636) -0.0042 (0.1890) 0.0011 (0.0636) β1 1 1.0013 (0.0233) 0.9999 (0.0106) 1.0013 (0.0232) 1.0000 (0.0106) β2 1 0.9989 (0.0463) 1.0007 (0.0203) 0.9991 (0.0469) 1.0007 (0.0207) λ 0.3 0.3002 (0.0404) 0.2993 (0.0177) 0.2998 (0.0457) 0.2989 (0.0202) σ 1 1.0012 (0.0161) 1.0005 (0.0073) 1.0043 (0.0161) 1.0036 (0.0073) γ 1 1.1136 (0.1657) 1.0412 (0.0623) 1.1110 (0.1686) 1.0413 (0.0636) -1.5145 (0.0156) -1.5129 (0.0072) -1.5175 (0.0156) -1.5158 (0.0072) 0.9500 1.0000 - - m log L rtrue True parameters Self-Known Characteristics Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0071 (0.1924) 0.0008 (0.0637) 0.0377 (0.2010) 0.0410 (0.0681) β1 1 1.0013 (0.0233) 1.0000 (0.0106) 1.0024 (0.0235) 1.0010 (0.0106) β2 1 0.9989 (0.0466) 1.0006 (0.0206) 0.9797 (0.0467) 0.9816 (0.0205) λ 0.3 0.2998 (0.0452) 0.2990 (0.0201) 0.2429 (0.0415) 0.2417 (0.0181) σ 1 1.0012 (0.0161) 1.0005 (0.0073) 1.0037 (0.0162) 1.0029 (0.0073) γ 1 1.1116 (0.1681) 1.0411 (0.0634) 1.1980 (0.1777) 1.1268 (0.0673) -1.5145 (0.0157) -1.5129 (0.0072) -1.5168 (0.0157) -1.5151 (0.0072) 0.9370 1.0000 - - m log L rtrue Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood under the true information structure is larger than that under the misspecified one. The numbers in parentheses are standard deviation. 178 Table A.14: Sample Statistics: Municipalities in North Carolina Variables Mean Standard Deviation Min Max Total Revenue (×106 ) 19.8781 94.5765 0.0043 1.6486×103 Expenditure ($ ×106 ) 7.0020 28.6849 0 280.1064 (0.2752) (0.2261) (0) (0.8821) 2.3462 17.4553 -0.2218 350.2160 (0.0603) (0.0782) (-0.0179) (0.8760) 2.2384 20.6795 0 448.8580 (Proportion) (0.1036) (0.0930) (0) (0.5746) Government 1.5539 6.5083 0.0027 110.4780 (Proportion) (0.2277) (0.2053) (0.0154) (1.5187) Public Safety 4.0027 19.8703 0 357.6710 (Proportion) (0.1829) (0.1315) (0) (0.7449) 2.9412 13.9665 -1.4503 178.4934 (0.1503) (0.1375) (-0.5187) 0.8045 Population ×103 10.2842 45.0819 0.0250 751.9990 Median Household Income $ ×104 4.2092 1.8504 1.1750 15.7297 Related County Public Safety Expenditure ×106 No. of Related Counties 33.5566 41.7575 0 217.6625 1.1008 0.3326 1 4 Distance (kilometer) 216.7976 125.4035 1.2133 767.1423 No. of Observations 506 - - - Utility (Proportion) Debt & Services (Proportion) Transportation Other (Proportion) 179 Table A.15: Empirical Analysis Regression Constant Regression for Municipal Spending on Public Safety (1) (2) (3) (4) (5) (6) (7) 0.1268 (0.4486) -0.0049 (0.5252) 0.0534 (0.5204) 0.2068 (0.4733) 0.2526 (0.4845) 0.5939 (0.5344) 0.6346 (0.5766) Population 0.3079*** (0.0040) 0.3149*** (0.0042) 0.3145*** (0.0042) 0.3135*** (0.0045) 0.3129*** (0.0045) 0.3109*** (0.0042) 0.3106*** (0.0042) Total Revenue 0.0631*** (0.0019) 0.0599*** (0.0020) 0.0601*** (0.0020) 0.0607*** (0.0021) 0.0610*** (0.0021) 0.0617*** (0.0020) 0.0619*** (0.0020) -0.1294* (0.0771) -0.0420 (0.1132) -0.0574 (0.1084) -0.0661 (0.0898) -0.0805 (0.0866) -0.1100 (0.0825) -0.1257 (0.0805) -0.0569*** (0.0169) -0.0649*** (0.0210) -0.0835*** (0.0313) -0.0913** (0.0367) -0.1356 (0.0860) -0.1437 (0.0990) 2.1768*** (0.0166) 2.1495*** (0.0161) 2.1522*** (0.0162) 2.1510*** (0.0165) 2.1540*** (0.0166) 2.1572*** (0.0163) 2.1596*** (0.0164) -1111.6 -1105.2 -1105.8 -1105.6 -1106.3 -1107.0 -1107.6 506 506 506 506 506 506 506 No. of “Neighbors” 10.8656 (4.3634) 10.8656 (4.3634) 28.3636 (9.0854) 28.3636 (9.0854) 94.0474 (29.1823) 94.0474 (29.1823) Cutoff Distance(km) 30 30 50 50 100 100 Median HHI λ σ log L No. of Obs Variables µ Distribution of Median Household Income ω η ρ Estimates 3.3971 0.7742 2.0059 0.1490 Note: Regression (1) is ordinary linear regression without social interactions. Regression (2),(4) and (6) correspond to the linear model when all characteristics are public information. Municipal median household income is assumed to be self-known for municipalities in Regression (3),(5) and (7). Two municipalities are viewed as close “neighbors” if the distance between them is less than 30 kilometers for Regression (2) and (3), or less than 50 kilometers for Regression (4) and (5), or less than 100 kilometers for Regression (6) and (7). Numbers in parentheses are standard deviations. Estimates that are significant at the %10, %5, and %1 levels are marked by “*”, “**”, and “***”, respectively. 180 Publicly Known Discrete Characteristics for Multiple Groups Self−Known Discrete Characteristics for Multiple Groups 2500 2000 prior n=20,G=100,F=3 n=20,G=500,F=3 2000 prior n=20,G=100,F=3 n=20,G=500,F=3 1500 1500 1000 1000 500 500 0 0.2 0.25 0.3 λ 0.35 0.4 0 0.2 0.45 0.25 Publicly Known Discrete Characteristics for a Single Group 0.3 λ 0.35 0.4 0.45 Self−Known Discrete Characteristics for a Single Group 25 20 prior n=200,G=1,F=30 n=1000,G=1,F=30 20 prior n=200,G=1,F=30 n=1000,G=1,F=30 15 15 10 10 5 5 0 −1 −0.8 −0.6 −0.4 −0.2 0 λ 0.2 0.4 0.6 0.8 0 −1 1 −0.8 −0.6 −0.4 −0.2 0 λ 0.2 Figure A.1: Identification for Continuous Choices 181 0.4 0.6 0.8 1 Publicly Known Discrete Characteristics for Multiple Groups Self−Known Discrete Characteristics for Multiple Groups 60 40 prior n=20,G=100,F=3 n=20,G=500,F=3 50 prior n=20,G=100,F=3 n=20,G=500,F=3 35 30 40 25 30 20 15 20 10 10 5 0 −0.5 0 0.5 λ 0 −0.6 1 −0.4 Publicly Known Discrete Characteristics for a Single Group 1.5 −0.2 0 0.2 λ 0.4 0.6 0.8 1 Self−Known Discrete Characteristics for a Single Group 1.4 prior n=200,G=1,F=30 n=1000,G=1,F=30 prior n=200,G=1,F=30 n=1000,G=1,F=30 1.2 1 1 0.8 0.6 0.5 0.4 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 λ 0.2 0.4 0.6 0.8 0 −1 1 −0.8 −0.6 −0.4 −0.2 0 λ Figure A.2: Identification for Binary Choices 182 0.2 0.4 0.6 0.8 1 Appendix B: Appendix to Chapter 3 B.1 Equilibrium Analysis Proof of Proposition 3.3.1. Suppose that if (s1 (XJp1 , 1 ), · · · , sn (XJpn , n )) satisfies Qn Qn 0 n (3.2.4), define ψ : i=1 Ai → C as follows. For any A = (A1 · · · , An ) ∈ i=1 Ai , there exists k1 , · · · , kn ∈ {1, · · · , n} such that Wn,ki i 6= 0 and Ai = XJpk . Set ψ(A) = i 0 (ψ1 (A1 ), · · · , ψn (An )) , where ψi (Ai ) = ψi (XJpk ) i = E[si (XJpi , 1 )|XJpk , z], i for i = 1, · · · , n. Then, we have that ψi (Ai ) =E[max 0 0 0 β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij E[sj (XJpi , i )|XJpi , Z] − i , 0 0 0 0 X 0 0 0 X |Ai , z] j6=i =E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij E[sj (XJpi , i )|XJpi , Z])|Ai , z] j6=i =E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij ψj (XJpi ))|Ai , z]. j6=i In the above equation, the first equality holds because i ’s are independent of each other and also independent of X and Wn . The second equality follows from the definition of ψ. Q Thus, (3.3.3) holds. Conversely, suppose that a function ψ : ni Ai → Cn satisfies (3.3.3). Define a profile of strategies, (s1 (XJp1 , 1 ), · · · , sn (XJpn , n )), by X 0 0 0 si (XJpi , i ) = max β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij ψj (XJpi ) − i , 0 , j6=i for i, j = 1, · · · , n and Wj,i 6= 0. Taking expectations over sj (XJpi , j ) conditional on XJpi and Z = z, we derive that 183 E[sj (XJpi , j )|XJpi , z] h =E max 0 β0 + Xic β1 + 0 Xip β2 0 + X g β3 + λ X Wn,ij ψj (XJpi ) j6=i 0 0 0 =E[H(β0 + Xic β1 + Xip β2 + X g β3 + λ X i − i , 0 XJpi , z Wn,ij ψj (XJpi ))|XJpi , z] j6=i =ψj (Xip ), where the second equality follows from (3.3.3). Hence, we have that X 0 0 0 si (XJpi , i ) = max β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij E[sj (XJpi , j )|XJpi , z] − i , 0 . j6=i Therefore, (3.2.4) is satisfied. 0 Proof of Proposition 3.3.2. For any two functions, ξ, ξ ∈ Ξ, 0 T (ξ) − T (ξ ) = max Z max 1≤i≤n {k:Wn,ki 6=0} 0 X 0 0 0 |E[ H(β0 + Xic β1 + Xip β2 + X g β3 + λ Wn,ij ξj (XJpi )) j6=i 0 0 − H(β0 + Xic β1 + Xip β2 + X g β3 + λ X Wn,ij ξj (XJpi )) |xpJk , z]|dFp (xp ) 0 j6=i Z 0 dH(x) X | Wn,ij E[|ξj (XJpi ) − ξj (XJpi )|xpJk , z]dFp (xp ) ≤ max max |λ| sup | 1≤i≤n {k:Wn,ki 6=0} dx x j6=i Z X 0 dH(x) = max max |λ| sup | | Wn,ij |ξj (XJpi ) − ξj (XJpi )|dFp (X p ) 1≤i≤n {k:Wn,ki 6=0} dx x j6=i 0 dH(x) ≤|λ| kWn k∞ sup | || ψ − ψ . dx x Recall that Fp (·) is a simplified notation of the distribution of Xip ’s conditional on public information Z = z. By calculation, dH(x)/dx = F (x). Therefore, if |λ| kWn k∞ < 1, T : Ξ → Ξ is a contraction mapping on the complete metric space, (Ξ, k · k). Then, it admits a unique fixed point. Because there is a one-to-one correspondence between a BNE and a fixed point of T , there is a unique BNE. 184 B.2 Identification Proofs 0 0 0 Proof of Lemma 3.4.1. Define ci = β0 +Xic β1 +Xip β2 +X g β3 +λ p j6=i Wn,ij E[yj |XJi , z]. P According to (3.2.1), we have that E[I(yi > 0)|X p , z] = P r(ci − i > 0|X p , z) = F (ci ; σ), where the second equality follows from the independence between the idiosyncratic shocks and the exogenous characteristics. Since F (c; σ) is strictly increasing with respect to c, we have the inversion, ci = F−1 (E[I(yi > 0)|X p , z]; σ). Because Z Z E[yi |X p , z] = (ci − c)f (c; σ)dc = ci F (ci ; σ) − c<ci cf (c; σ)dc, c<ci we derive (3.4.3). Proof of Proposition 3.4.1. For X p and z, we define a function S such that S(E[y|X p , z], E[I(y > 0)|X p , z], σ) = E[y|X p , z] − E[I(y > 0)|X p , z]F−1 (E[I(y > 0)|X p , z]; σ) Z + cf (c; σ)dc. c<F−1 (E[I(y>0)|X p ,z];σ) By (3.4.30 ), when data are generated from the true parameter σ0 , S(E[y|X p , z], E[I(y > 0)|X p , z], σ0 ) = 0. We would like to show that given the observed data (E[y|X p , z], E[I(y > 0)|X p , z]), there is a unique σ that satisfies the above equation. That is, σ can be identified from the observed data. To achieve this, we first take the partial derivative, ∂S(E[y|X p , z], E[I(y > 0)|X p , z], σ) ∂σ = F−1 (E[I(y > 0)|X p , z]; σ)f (F−1 (E[I(y > 0)|X p , z]; σ); σ) − E[I(y > 0)|X p , z] ∂F−1 (E[I(y > 0)|X p , z]; σ) ∂σ Z ∂f (c; σ) + c dc. ∂σ c<F−1 (E[I(y>0)|X p ,z];σ) · Since F (F−1 (E[I(y > 0)|X p , z]; σ); σ) = E[I(y ∗ > 0)|X p , z], −1 f (F−1 (E[I(y > 0)|X p , z]; σ); σ) ∂F (E[I(y>0)|X p ,z];σ) ∂σ + ∂F (F−1 (E[I(y>0)|X p ,z];σ);σ) ∂σ = 0, where the derivative with respect to σ in the second term is the derivative of F (c; σ) with 185 respect to σ. Thus, ∂F (F −1 (E[I(y>0)|X p ,z];σ);σ) ∂F−1 (E[I(y > 0)|X p , z]; σ) ∂σ =− . ∂σ f (F−1 (E[I(y > 0)|X p , z]; σ); σ) Additionally, by using integration by parts, Z ∂f (c; σ) c dc ∂σ c<F−1 (E[I(y>0)|X p ,z];σ) ∂F (F−1 (E[I(y > 0)|X p , z]; σ); σ) ∂F (c; σ) − lim c c→−∞ ∂σ ∂σ ∂F (c; σ) dc. ∂σ c<F−1 (E[I(y>0)|X p ,z];σ) =F−1 (E[I(y0)|X p , z]; σ) Z − Hence, ∂S(E[y|X p , z], E[I(y > 0)|X p , z], σ) ∂σ =E[I(y > 0)|X Z − p ∂F (F−1 (E[I(y>0)|X p ,z];σ);σ) ∂σ , z] −1 f (F (E[I(y > 0)|X p , z]; σ); σ) c<F−1 (E[I(y>0)|X p ,z];σ) ∂F (c; σ) dc ∂σ ∂F (F−1 (E[I(y>0)|X p ,z];σ);σ) ∂σ ( −1 −1 p (E[I(y > 0)|X p , z]; σ); σ) f (F c<F (E[I(y>0)|X ,z];σ) Z = Therefore, when Assumption 3.4.4 is satisfied, − ∂F (c;σ) ∂σ f (c; σ) )f (c; σ)dc. ∂S(E[y|X p ,z],E[I(y>0)|X p ,z],σ) ∂σ is either posi- tive or negative. As a result, by the implicit function theorem, for (E[y|X p , z], E[I(y ∗ > 0)|X p , z]), there is a unique σ such that (3.4.30 ) holds. That is, from the moments, (E[y|X p , z], E[I(y ∗ > 0)|X p , z]), we can identify σ. Proof of Proposition 3.4.2. σ is identified from Proposition 3.4.1. As H(x; σ) = R xF (x, σ) − c<x cf (c; σ)dc, ∂H(x;σ) = F (x; σ) > 0 for any x ∈ <1 . Thus, H(x; σ) is strictly ∂x increasing in x and H −1 (·; σ) is well-defined. Therefore, we have 0 0 H −1 (E[yi |X p , z]) = β0 + Xic β1 + Xip β2 + λ X Wn,ij E[yj |XJpi , z]. j6=i 0 0 0 0 0 0 0 e ) are observationally equivalent at W , J and Therefore, if (β0 , β1 , β2 , λ) and (βe0 , βe1 , βe2 , λ X, c p ln , X , X , As c p ln , X , X , Ep 0 0 0 0 e 0 = 0. (β0 − βe0 , β1 − βe1 , β2 − βe2 , λ − λ) Ep 0 0 0 0 0 0 0 e). has full column rank, (β0 , β1 , β2 , λ) = (βe0 , βe1 , βe2 , λ 186 Table B.1: Tobit Model with Discrete Characteristics for Independent Groups Model Specification Characteristics are all Publicly Known Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0017 (0.0599) 0.0004 (0.0261) 0.0014 (0.0647) -0.0000 (0.0285) β1 1 1.0020 (0.0274) 0.9999 (0.0122) 1.0021 (0.0276) 0.9999 (0.0122) β2 1 0.9987 (0.0483) 1.0003 (0.0217) 0.9988 (0.0487) 1.0005 (0.0219) λ 0.3 0.2983 (0.0486) 0.2997 (0.0214) 0.2985 (0.0536) 0.3000 (0.0238) σ 1 0.9997 (0.0194) 0.9998 (0.0090) 1.0016 (0.0195) 1.0017 (0.0090) -1.1611 (0.0163) -1.1618 (0.0072) -1.1627 (0.0163) -1.1633 (0.0072) rtrue 0.8920 0.9990 - - rcensor 0.3209 0.3206 0.3209 0.3206 m log L Model Specification Characteristics are Self-Known Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0016 (0.0643) 0.0003 (0.0283) 0.0588 (0.0605) 0.0578 (0.0265) β1 1 1.0020 (0.0278) 0.9999 (0.0122) 1.0051 (0.0275) 1.0032 (0.0122) β2 1 0.9988 (0.0484) 1.0005 (0.0219) 0.9864 (0.0483) 0.9880 (0.0219) λ 0.3 0.2983 (0.0535) 0.2997 (0.0235) 0.2469 (0.0510) 0.2481 (0.0225) σ 1 0.9998 (0.0194) 0.9999 (0.0090) 1.0014 (0.0194) 1.0014 (0.0090) -1.1614 (0.0163) -1.1620 (0.0072) -1.1627 (0.0163) -1.1633 (0.0072) rtrue 0.8750 0.9970 - - rcensor 0.3208 0.3205 0.3208 0.3205 m log L Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 187 Table B.2: Tobit Model with Continuous Characteristics for Independent Groups Model Specification Characteristics are All Publicly Known Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0030 (0.0385) 0.0002 (0.0172) -0.0636 (0.1122) -0.0592 (0.0494) β1 1 1.0008 (0.0248) 0.9998 (0.0114) 1.0067 (0.0272) 1.0061 (0.0124) β2 1 0.9995 (0.0166) 0.9997 (0.0077) 1.0365 (0.0270) 1.0369 (0.0125) λ 0.3 0.3012 (0.0167) 0.3001 (0.0074) 0.3022 (0.0612) 0.3002 (0.0271) σ 1 0.9988 (0.0191) 0.9998 (0.0085) 1.0759 (0.0222) 1.0769 (0.0097) m log L rtrue rcensor -1.1361 (0.0306) 1.0000 0.2746 -1.1368 (0.0133) 1.0000 0.2749 -1.1937 (0.0330) 0.2746 -1.1942 (0.0142) 0.2749 Model Specification Characteristics are Self-Known Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 -0.0037 (0.0901) 0.0003 (0.0390) 0.4473 (0.0474) 0.4528 (0.0218) β1 1 1.0010 (0.0255) 0.9999 (0.0115) 1.0068 (0.0256) 1.0057 (0.0117) β2 1 1.0014 (0.0210) 0.9998 (0.0094) 1.0777 (0.0167) 1.0774 (0.0077) λ 0.3 0.3003 (0.0487) 0.3000 (0.0209) 0.0431 (0.0209) 0.0414 (0.0094) σ 1 0.9989 (0.0191) 0.9998 (0.0084) 1.0077 (0.0193) 1.0087 (0.0085) -1.1452 (0.0280) 0.9990 0.2682 -1.1459 (0.0121) 1.0000 0.2683 -1.1525 (0.0283) 0.2682 -1.1533 (0.0121) 0.2683 m log L rtrue rcensor True parameters µ η ρ 1 2 0.4 Distribution of Self-Known Characteristics n = 20, G = 100 n = 20, G = 500 1.0054 (0.0037) 1.9964 (0.0637) 0.3969 (0.0384) 0.9993 (0.0570) 2.0000 (0.0280) 0.4000 (0.0164) Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 188 Table B.3: Tobit Model with Discrete Characteristics and Constant Friend Number for A Single Group Model Specification β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue rcensor Characteristics are Publicly Known Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.0561 (0.5469) 0.9963 (0.0835) 1.0023 (0.1600) 0.2416 (0.5368) 0.9881 (0.0630) 0.0217 (0.5333) 1.0013 (0.0380) 0.9997 (0.0700) 0.2777 (0.5242) 0.9980 (0.0270) -0.0100 (0.2483) 1.0012 (0.0381) 0.9994 (0.0699) 0.3094 (0.2410) 0.9981 (0.0271) 0.0651 (0.5794) 0.9964 (0.0832) 1.0009 (0.1600) 0.2330 (0.5713) 0.9884 (0.0632) 0.0039 (0.5602) 1.0014 (0.0379) 0.9995 (0.0701) 0.2758 (0.5516) 0.9981 (0.0270) -0.0091 (0.2818) 1.0012 (0.0381) 0.9994 (0.0699) 0.3085 (0.2740) 0.9983 (0.0270) -1.1512 (0.0532) 0.5400 0.3202 -1.1607 (0.0228) 0.5730 0.3203 -1.1606 (0.0228) 0.5930 0.3204 -1.1515 (0.0533) 0.3202 -1.1607 (0.0228) 0.3203 -1.1608 (0.0228) 0.3204 Characteristics are Self-Known Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 Model Specification n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0648 (0.5785) 0.9967 (0.0836) 1.0012 (0.1603) 0.2331 (0.5706) 0.9884 (0.0633) 0.0254 (0.5605) 1.0012 (0.0379) 0.9993 (0.0700) 0.2746 (0.5518) 0.9980 (0.0270) -0.0094 (0.2825) 1.0012 (0.0380) 0.9994 (0.0701) 0.3088 (0.2745) 0.9980 (0.0270) 0.1136 (0.5529) 0.9964 (0.0840) 1.0015 (0.1604) 0.1850 (0.5441) 0.9884 (0.0632) 0.0794 (0.5410) 1.0012 (0.0380) 0.9993 (0.0699) 0.2211 (0.5323) 0.9980 (0.0270) 0.0560 (0.2490) 1.0013 (0.0381) 0.9982 (0.0701) 0.2448 (0.2423) 0.9982 (0.0270) -1.1513 (0.0533) 0.4890 0.3202 -1.1607 (0.0229) 0.4690 0.3202 -1.1606 (0.0228) 0.5780 0.3203 -1.1513 (0.0532) 0.3202 -1.1607 (0.0229) 0.3202 -1.1608 (0.0228) 0.3203 m log L rtrue rcensor Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 189 Table B.4: Tobit Model with Discrete Characteristics and Random Friend Number for A Single Group Model Specification β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue rcensor mF Characteristics are Publicly Known Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.0066 (0.1596) 1.0042 (0.0714) 1.0002 (0.1488) 0.2910 (0.2459) 0.9894 (0.0501) 0.0070 (0.2746) 1.0014 (0.0385) 0.9996 (0.0700) 0.2927 (0.2698) 0.9993 (0.0286) 0.0015 (0.1402) 1.0016 (0.0384) 1.0001 (0.0702) 0.2978 (0.1354) 0.9993 (0.0287) 0.0077 (0.1729) 1.0045 (0.0714) 1.0002 (0.1481) 0.2888 (0.2692) 0.9903 (0.0503) 0.0045 (0.3041) 1.0014 (0.0385) 0.9997 (0.0700) 0.2949 (0.2983) 0.9994 (0.0286) 0.0051 (0.1532) 1.0016 (0.0384) 0.9997 (0.0701) 0.2943 (0.1487) 0.9998 (0.0287) -1.4070 (0.0508) 0.5760 0.3214 29.4344 -1.1616 (0.0236) 0.5440 0.3206 149.6204 -1.1599 (0.0236) 0.6740 0.3219 29.5245 -1.4079 (0.0509) 0.3214 29.4344 -1.1617 (0.0236) 0.3206 149.6204 -1.1604 (0.0236) 0.3219 29.5245 Characteristics are Self-Known Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 U F = 299 U F = 59 U F = 59 U F = 299 U F = 59 Model Specification n = 200 U F = 59 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0162 (0.3633) 1.0058 (0.0847) 1.0010 (0.1575) 0.2830 (0.3491) 0.9894 (0.0631) 0.0043 (0.3044) 1.0014 (0.0384) 0.9997 (0.0701) 0.2951 (0.2984) 0.9992 (0.0286) 0.0056 (0.1532) 1.0015 (0.0383) 0.9998 (0.0704) 0.2939 (0.1485) 0.9992 (0.0286) 0.0633 (0.3233) 1.0060 (0.0849) 0.9983 (0.1584) 0.2370 (0.3211) 0.9897 (0.0631) 0.0590 (0.2742) 1.0015 (0.0385) 0.9989 (0.0702) 0.2416 (0.2699) 0.9993 (0.0287) 0.0516 (0.1394) 1.0019 (0.0384) 0.9974 (0.0707) 0.2487 (0.1354) 0.9995 (0.0286) -1.1498 (0.0534) 0.5630 0.3215 29.4344 -1.1616 (0.0236) 0.5900 0.3205 149.6204 -1.1600 (0.0236) 0.6250 0.3218 29.5245 -1.1501 (0.0535) 0.3215 29.4344 -1.1617 (0.0236) 0.3205 149.6204 -1.1603 (0.0236) 0.3218 29.5245 m log L rtrue rcensor mF Note: n is the number of agents in the group. U F and mF are respectively the maximum and average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 190 Table B.5: Tobit Model with Continuous Characteristics and Constant Friend Number for A Single Group Model Specification β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue rcensor Characteristics are Publicly Known Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 30 F = 150 F = 30 F = 30 F = 150 F = 30 0.0106 (0.6553) 0.9980 (0.0848) 0.9987 (0.0632) 0.2863 (0.3790) 0.9859 (0.0657) 0.0319 (0.6319) 1.0011 (0.0385) 1.0009 (0.0278) 0.2713 (0.3876) 0.9972 (0.0284) -0.0101 (0.2668) 1.0012 (0.0386) 1.0009 (0.0279) 0.3026 (0.1673) 0.9973 (0.0285) 0.1998 (0.9211) 0.9980 (0.0849) 0.9298 (0.1620) 0.2342 (0.5352) 0.9878 (0.0661) 0.2659 (0.8726) 1.0010 (0.0385) 0.9425 (0.1414) 0.1996 (0.4943) 0.9976 (0.0285) 0.1371 (0.5054) 1.0009 (0.0387) 0.9228 (0.0786) 0.2708 (0.2672) 0.9993 (0.0286) -1.1261 (0.2576) 0.5990 0.2751 -1.1346 (0.2501) 0.6220 0.2755 -1.1345 (0.2499) 0.8120 0.2756 -1.1280 (0.2587) 0.2751 -1.1350 (0.2502) 0.2755 -1.1365 (0.2499) 0.2756 Characteristics are Self-Known Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 F = 150 F = 30 F = 30 F = 150 F = 30 Model Specification n = 200 F = 30 β0 0 β1 1 β2 1 λ 0.3 σ 1 0.0643 (0.9197) 0.9996 (0.0821) 1.0092 (0.1738) 0.2652 (0.5229) 0.9864 (0.0628) 0.0442 (0.8261) 1.0011 (0.0370) 1.0085 (0.1506) 0.2733 (0.4756) 0.9975 (0.0275) -0.0171 (0.4353) 1.0010 (0.0370) 0.9979 (0.0809) 0.3089 (0.2487) 0.9974 (0.0275) 0.3756 (0.6291) 0.9990 (0.0818) 1.0919 (0.0615) 0.0811 (0.3527) 0.9871 (0.0627) 0.3959 (0.6078) 1.0011 (0.0370) 1.0936 (0.0282) 0.0595 (0.3577) 0.9977 (0.0275) 0.3649 (0.2574) 1.0016 (0.0371) 1.0936 (0.0281) 0.0824 (0.1464) 0.9984 (0.0276) -1.1335 (0.2258) 0.5100 0.2696 -1.1430 (0.2163) 0.5270 0.2694 -1.1429 (0.2163) 0.6810 0.2694 -1.1338 (0.2257) 0.2696 -1.1431 (0.2164) 0.2694 -1.1436 (0.2165) 0.2694 m log L rtrue rcensor Note: n is the number of agents in the group. F is the constant number of friends a person can make. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 191 Table B.6: Tobit Model with Continuous Characteristics and Random Friend Number for A Single Group Model Specification β0 0 β1 1 β2 1 λ 0.3 σ 1 m log L rtrue rcensor mF Characteristics are Publicly Known Publicly Known Self-Known n = 200 n = 1000 n = 1000 n = 200 n = 1000 U F = 59 U F = 299 U F = 59 U F = 59 U F = 299 n = 1000 U F = 59 0.0009 (0.3353) 0.9978 (0.0905) 1.0007 (0.0661) 0.2919 (0.2771) 0.9836 (0.0655) -0.0056 (0.2956) 0.9981 (0.0405) 0.9991 (0.0276) 0.2978 (0.2090) 0.9960 (0.0285) -0.0063 (0.1515) 0.9984 (0.0406) 0.9992 (0.0276) 0.3068 (0.1120) 0.9962 (0.0285) 0.0976 (0.4870) 0.9976 (0.0916) 0.9200 (0.1025) 0.2793 (0.2976) 0.9880 (0.0662) 0.1504 (0.4572) 0.9980 (0.0405) 0.9216 (0.0770) 0.2703 (0.2574) 0.9974 (0.0287) 0.1288 (0.2588) 0.9981 (0.0407) 0.9203 (0.0460) 0.2791 (0.1382) 1.0015 (0.0290) -1.1039 (0.2721) 0.7090 0.2895 29.4867 -1.1495 (0.2506) 0.7710 0.2620 149.5960 -1.1480 (0.2503) 0.9170 0.2634 29.5193 -1.1081 (0.2744) 0.2895 29.4867 -1.1508 (0.2513) 0.2620 149.5960 -1.1529 (0.2526) 0.2634 29.5193 Characteristics are Self-Known Self-Known Publicly Known n = 1000 n = 1000 n = 200 n = 1000 n = 1000 U F = 299 U F = 59 U F = 59 U F = 299 U F = 59 Model Specification n = 200 U F = 59 β0 0 β1 1 β2 1 λ 0.3 σ 1 -0.0258 (0.4592) 0.9969 (0.0887) 0.9954 (0.0977) 0.3160 (0.2631) 0.9835 (0.0621) -0.0055 (0.4146) 0.9980 (0.0400) 0.9975 (0.0788) 0.3041 (0.2395) 0.9962 (0.0275) -0.0102 (0.1962) 0.9981 (0.0401) 0.9969 (0.0420) 0.3070 (0.1120) 0.9963 (0.0277) 0.2464 (0.3522) 0.9973 (0.0884) 1.0921 (0.0648) 0.1415 (0.2384) 0.9873 (0.0625) 0.2614 (0.3208) 0.9985 (0.0401) 1.0911 (0.0277) 0.1255 (0.1900) 0.9973 (0.0275) 0.2296 (0.1945) 0.9993 (0.0404) 1.0890 (0.0279) 0.1444 (0.1028) 0.9999 (0.0279) -1.1134 (0.2394) 0.6490 0.2828 29.4867 -1.1551 (0.2189) 0.7180 0.2582 149.5960 -1.1537 (0.2193) 0.8810 0.2595 29.5193 -1.1161 (0.2398) 0.2828 29.4867 -1.1558 (0.2191) 0.2582 149.5960 -1.1565 (0.2199) 0.2595 29.5193 m log L rtrue rcensor mF Note: n is the number of agents in the group. U F and mF are respectively the maximum and average number of friends. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 192 Table B.7: Tobit Model with Discrete Characteristics and Unobserved Group Random Effects Model Specification Characteristics are Publicly Known Publicly Known Self-Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0323 (0.1733) 0.0127 (0.0612) 0.0287 (0.1776) 0.0121 (0.0630) β1 1 1.0023 (0.0287) 1.0001 (0.0128) 1.0021 (0.0286) 0.9999 (0.0129) β2 1 0.9982 (0.0517) 1.0007 (0.0226) 0.9980 (0.0523) 1.0002 (0.0228) λ 0.3 0.3051 (0.0613) 0.3023 (0.0259) 0.3057 (0.0702) 0.3025 (0.0298) σ 1 1.0008 (0.0194) 1.0005 (0.0094) 1.0029 (0.0194) 1.0025 (0.0095) γ 1 1.0514 (0.1364) 1.0194 (0.0513) 1.0477 (0.1371) 1.0190 (0.0516) -1.1667 (0.0344) -1.1655 (0.0149) -1.1682 (0.0344) -1.1669 (0.0149) rtrue 0.8880 0.9960 - - rcensor 0.3509 0.3516 0.3509 0.3516 m log L Model Specification Characteristics are Self-Known Self-Known Publicly Known n = 20, G = 100 n = 20, G = 500 n = 20, G = 100 n = 20, G = 500 β0 0 0.0289 (0.1781) 0.0124 (0.0630) 0.1106 (0.1830) 0.0920 (0.0637) β1 1 1.0024 (0.0289) 1.0001 (0.0129) 1.0030 (0.0291) 1.0006 (0.0129) β2 1 0.9985 (0.0520) 1.0006 (0.0229) 0.9870 (0.0519) 0.9895 (0.0228) λ 0.3 0.3051 (0.0707) 0.3026 (0.0297) 0.2458 (0.0629) 0.2426 (0.0265) σ 1 1.0009 (0.0194) 1.0005 (0.0093) 1.0026 (0.0195) 1.0022 (0.0093) γ 1 1.0481 (0.1362) 1.0192 (0.0520) 1.1095 (0.1454) 1.0756 (0.0553) -1.1669 (0.0345) -1.1657 (0.0149) -1.1680 (0.0346) -1.1668 (0.0149) rtrue 0.8560 0.9950 - - rcensor 0.3509 0.3515 0.3509 0.3515 m log L Note: G is the number of groups. n is the population for each group. m log L is the estimated sample average log likelihood. rtrue is the proportion of simulations for which estimated log likelihood is bigger than that under wrong information structure. rcensor is the censoring rate. The numbers in parentheses are standard deviation. 193 Table B.8: Sample Statistics Variables Mean Standard Deviation Min Max Property Tax Rate Per $100 Valuation 0.3676 (0.1972) 0 0.8200 Population ×103 10.2842 (45.0819) 0.0250 751.9990 Utility Revenue $ ×106 7.3150 (29.6843) 0 344.9110 Median Household Income $ ×104 4.2092 (1.8504) 1.1750 15.7297 Related County Tax Rate Per $100 Valuation 0.6380 (0.1518) 0.1162 0.9900 1.1008 (0.3326) 1 4 216.7976 (125.4035) 1.2133 767.1423 No. of Related County Distance Kilometers No. of Observations 506 194 Table B.9: Tobit Model for Property Tax Competition Regression Regression for City Property Tax Rates (1) (2) (3) (4) (5) Constant β0 0.3436*** (0.0511) 0.2232*** (0.0488) 0.1534*** (0.0535) 0.1610*** (0.0442) 0.0852** (0.0364) Population β1 0.0005*** (0.0001) 0.0006*** (0.0001) 0.0004*** (0.0001) 0.0007*** (0.0001) 0.0004*** (0.0002) Related County Tax Rate β3 0.2822*** (0.0648) 0.1574** (0.0647) 0.1454** (0.0658) 0.1652*** (0.0641) 0.1387** (0.0585) Median Household Income β4 -0.0393*** (0.0043) -0.0334*** (0.0046) -0.0327*** (0.0044) -0.0311*** (0.0044) -0.0301*** (0.0041) Iteraction Intensity λ 0.4754*** (0.1401) 0.6843*** (0.2023) 0.6020*** (0.1311) 0.8517*** (0.1493) Shock Variance σ 0.1848*** (0.0065) 0.1833*** (0.0064) 0.1835*** (0.0064) 0.1849*** (0.0066) 0.1839*** (0.0065) 89.6458 93.5224 92.7961 93.7981 96.5846 506 506 506 506 506 10.8656 (4.3634) 10.8656 (4.3634) 28.3636 (9.0854) 28.3636 (9.0854) 30 30 50 50 η ρ 2.0059 0.1490 Estimated log Likelihood No. of Observations Number of “Neighbors” Cutoff Distance (kilometers) Variables Estimates Distribution of Median Household Income µ ω 3.3971 0.7742 Note: Regression (1) is ordinary Tobit regression without social interactions. Regressions (2) and (4) correspond to the Tobit model when all characteristics are public information. Municipal median household income is assumed to be self-known for municipalities in Regressions (3) and (5). Two municipalities are viewed as close “neighbors” if the distance between them is less than 30 kilometers for Regressions (2) and (3), or less than 50 kilometers for Regressions (4) and (5). Numbers in parentheses are standard deviations. Estimates that are significant at the %10, %5, and %1 levels are marked by “*”, “**”, and “***”, respectively. 195 Appendix C: Appendix to Chapter 4 C.1 Expectations, Equilibria, and Functions In this appendix, we embed conditional expectation functions into a function space. For a group with size n, social relations Wn and information structure J, we define a function Q Q i p space, Ξ(Wn , J), such that each ξ ∈ Ξ(Wn , J ) is a mapping from an ni=1 M m=1 Xi,m to <M , satisfying 1. For all i = 1, · · · , n, m = 1, · · · , Mi , and xpi,m ∈ Xpi,m ξ(xp1,1 , · · · , xp1,M1 , · · · , xpn,1 , · · · , xpn,Mn )i,m = ξi,m (xpi,m ). (C.1.1) 2. Z max max 1≤i≤n 1≤m≤Mi |ξi,m (xpi,m )|dµp < ∞, (C.1.2) where µp represents the L-S measure implied by the conditional distribution of Xip ’s given public information Z = z. Define summation and scalar product in Ξ(Wn , J ) in a conventional way. According to (C.1.2), define the norm on Ξ(Wn , J) as Z kξk = max max 1≤i≤n 1≤m≤Mi |ξi,m (xpi,m )|dµp . (C.1.3) Lemma C.1.1 The norm, k · k, defined in (C.1.3) is well-defined. Proof. It is obviously that kξk ≥ 0 for any ξ ∈ (Ξ(Wn , J ), k · k) and kξk = 0 if and only if ξ = 0 a.e. according to µp . For any real scalar α and ξ ∈ (Ξ(Wn , J ), k · k), Z Z p kαξk = max max |αξi,m (xi,m )|dµp = |α| max max |ξi,m (xpi,m )|dµp = |α|kξk. 1≤i≤n 1≤m≤Mi 1≤i≤n 1≤m≤Mi 196 0 For any two elements, ξ, ξ ∈ (Ξ(Wn , J), k · k), Z 0 0 kξ + ξ k = max max |ξi,m (xpi,m ) + ξi,m (xpi,m )|dµp 1≤i≤n 1≤m≤Mi Z Z 0 p ≤ max max |ξi,m (xi,m )|dµp + max max |ξi,m (xpi,m )|dµp 1≤i≤n 1≤m≤Mi 1≤i≤n 1≤m≤Mi 0 = kξk + kξ k. L1 (Xpi,m , Bi,m , µp ; <1 ) is the space of all real-valued functions on Xpi,m which are meaR R surable under µp such that kχk1 := Xp |χ|dµp < ∞ for all χ ∈ L1 (Xpi,m , Bi,m , µp ; <1 ). i,m This space belongs to the class of Lebesgue spaces. According to Dunford and Schwartz (1958), it is a Banach space. Since ξ is a vector-valued function composing of a finite number of coordinate functions, with the norm defined as the maximal of the absolute integrable of its coordinate functions, ξ ∈ (Ξ(Wn , J ), k · k) if and only if each of its coordinates, ξi,e ∈ L1 (Xpi,m , Bi,m , µp ; <1 ). Based on this finding, we derive the following result. Proposition C.1.1 (Ξ(Wn , J ), k · k) is complete. So it is a Banach space. Proof. Take any Cauchy sequence, ξ k in (Ξ(Wn , J), k · k). That is, R k l (xp )|dµ → 0, as k, l → ∞. Then kξ k − ξ l k = max1≤i≤n max1≤m≤Mi |ξi,m (xpi,m ) − ξi,m p i,m R k l (xp )|dµ → 0 as k, l → ∞ uniformly for all i = 1, · · · , n and m = |ξi,m (xpi,m ) − ξi,m p i,m 1, · · · , Mi in the L1 norm. For any η > 0, for each (i, m), the completeness of L1 (Xpi,m , Bi,m , µp ; <1 ), implied that there is ξi,m ∈ L1 (Xpi,m , Bi,m , µp ; <1 ) such that there R p k (xp ) − ξ is Ki,m (η) >, whenever k > Ki,m (η), Xp |ξi,m i,m (xi,m )|dµp < η. Since the i,m i,m total number of coordinate function is finite, define ξ = (ξi,m )1≤i≤n,1≤m≤Mi . For K = R k (xp )− maxi,m Ki,m (η), whenever k > K, we have that kξ k −ξk = max1≤i≤n max1≤m≤Mi Xp |ξi,m i,m i,m ξi,m (xpi,m )|dµp L1 (Xpi,m , Bi,m , µp ; <1 ) < η. Moreover, since ξi,m ∈ for all (i, m), R max1≤i≤n max1≤m≤Mi Xp |ξi,m | < ∞. That is, ξ ∈ (Ξ(Wn , J ), k · k). i,m Proposition 4.3.1 claims the equivalence between the BNEs and the equilibrium conditional expectations. This proposition is proved below. 197 Proof of Proposition 4.3.1. If se is a BNE, define ξ e by e (xpi,m ) = E[se (XJpi , i )|X pe ξ e (xp1,1 , · · · , xp1,M1 , · · · , xpn,1 , · · · , xpn,Mn )i,m = ξi,m Ji,m = xpi,m , z], for any i, m, and xpi,m ∈ Xpi,m . It follows from (4.3.1) that e ξi,m (xpi,m ) =E[hi (u(Xi ) + λ X =E[hi (u(Xi ) + λ X =E[hi (u(Xi ) + λ X j6=i j6=i j6=i Wn,ij E[sej (XJpj , j )|XJpi , z, i ] − i )|X pe Ji,m Wn,ij E[sej (XJpj , j )|XJpi , z] − i )|X pe Ji,m p e (Xj,m ) − i )|X pe Wn,ij ξj,m j (i) j (i) Ji,m = xpi,m , z] = xpi,m , z] = xpi,m , z]. In the above equation, the second equality comes from the independence among i ’s, and the independence between the idiosyncratic shocks and exogenous covariates and social relations. The third equality follows directly from the way in which ξ e is defined. Note p that Xj,m = XJpi for any j with Wn,ij 6= 0. On the contrary, assume that ξ e is an j (i) equilibrium conditional expectation function, i.e., (4.3.6) and (4.3.7) are satisfied. Define P p e (Xj,m ) − i ). Then we have that se by sei (XJpi , i ) = hi (u(Xi ) + λ j6=i Wn,ij ξj,m j (i) j (i) E[Sje (XJpj , j )|XJpi , z, i ] =E[hj (u(Xj ) + λ X =E[hj (u(Xj ) + λ X p e Wn,jk ξk,m (Xk,m ) − j )|XJpi , z, i ] k (j) k (j) k6=j p e Wn,jk ξk,m (Xk,m ) − j )|XJpi , z] k (j) j (k) k6=j p e =ξj,m (Xj,m ), j (i) j (i) p where Xj,m corresponds to XJpi . The second equality follows from the independence j (i) among all i ’s and the independence between those shocks and exogenous characteristics. The third equality is derived by applying (4.3.7). Therefore, by the above definition of se , sei (XJpi , i ) = hi (u(Xi ) + λ X Wi,j E[Sje (XJpj , j )|XJpi , z, i ] − i ). j6=i 198 C.2 Proofs for Equilibrium Characterizations with Public Characteristics In this section, we discuss in detail the structure of the set of equilibria when all exogenous covariates are public information. We first prove that there is no loss of generality by focusing on regular groups. We impose Assumption C.2.1 in order to apply theorems in differential topology. Assumption C.2.1 The functions, u(·) and Hi (·), for i = 1, · · · , n, are smooth. That is, they have continuous partial derivatives of all orders. In most models used in empirical and theoretical studies, u(·) is linear function of exogenous covariates. Although hi (·) can be discrete, Hi (·) defined in Assumption 4.3.2 is usually smooth as our examples show. In the subsequent discussions, we focus on the following case. Definition C.2.1 For a group (X, Wn ), an equilibrium ξ e is regular if the derivative of S(·; X, Wn ) at ξ e , DS(ξ e ; X, Wn ) is non-singular. A group (X, Wn ) is regular if each of its equilibrium is regular. That is, DS(ξ e ; X, Wn ) is non-singular for any ξ e such that S(ξ e ; X, Wn ) = 0. When a group if regular, applying the Inverse Function Theorem, in a neighborhood of an equilibrium, ξ e , there is no other equilibria. That is to say, all equilibria are isolated from each other. That property is important for the following discussion. We show in the proposition below that there is no loss of generality by focusing on regular groups. Denote the support of X by X. For any social matrix Wn , define function Se : <n × X → <n by e X; Wn ) = S(ξ; X, Wn ). That is, given Wn , S(·; X, Wn ) can be viewed as a family of S(ξ, smooth maps, indexed by the exogenous covariates, X. By Assumption C.2.1, Se is smooth. Proposition C.2.1 follows from the transversality theorem in differential topology. e X; Wn ) has full row rank for all Proposition C.2.1 Given social matrix Wn , if DS(ξ, e e ; X, Wn ) = 0, then for almost every X ∈ X, DS(ξ e ; X, Wn ) is non-singular. (ξ, X) with S(ξ That is, almost all groups are regular. 199 Proof. The result follows from the transversality theorem in the context book by Guillemin e e , X; Wn ) has full row-rank, 0 is a regular value for S(·, e X; Wn ). and Pollack (1974). If DS(ξ e X; Wn ) is transversal to {0}. From the transversality Since {0} is a singleton in <n , S(·, theorem, S(·; X, Wn ) is transveral to {0} for almost every X. The following corollary shows that we can apply Proposition C.2.1 for a large class of models. Corollary C.2.1 Suppose that u(·) is linear in exogenous covariates, i.e., u(Xi ) = β0,0 + 0 0 e X; Wn ) has full X g β0,1 + Xic β1 . If β1 6= 0 and dHi (a)/da 6= 0 for any i and x ∈ <, DS(ξ, row rank for all Wn . Then almost all groups are regular. Proof of Corollary C.2.1. Without loss of generality, suppose that β1,1 6= 0, we have e that DS(ξ, X; Wn ) = λDH − In β1,1 DH ∗ , where DH = diag(dH1 (e x1 )/dx, · · · , dHn (e xn )/dx) is a diagonal matrix whose diagonal is comP posed of the derivatives of Hi ’s evaluated at x ei = u(Xi ) + λ j6=i Wi,j ψj for all i. By the e X; Wn ) are linearly assumption, none of dHi (e xi )/dx is zero. Therefore, the rows for DS(ξ, independent. So it has full row rank. Although the function S(·; X, Wn ) is defined on the whole space, <n , we usually begin searching for an equilibrium in a region. According to Guillemin and Pollack (1974), some properties of the set of solutions for S(ξ; X, Wn ) = 0 inside a region can be derived by analyzing the properties of a function on the boundary of that region. Denote the closed ball in <n with radius r > 0 centered at the origin by B[0, r] = {x ∈ <n : kxkE ≤ r}, where k·kE is the Euclidean norm. Its interior is the open ball, B(0, r) = {x ∈ <n : kxkE < r}. Its boundary, ∂B[0, r] = {x ∈ <n : kxkE = r}, is a sphere, centering at the origin with radium r > 0. In particular, we call the ball a unit sphere if its radius is 1. It is standard to denote a unit sphere in <n as S n−1 . If S(ξ; X, Wn ) 6= 0 for any ξ ∈ ∂B[0, r], we can define b X, Wn ), on ∂B[0, r], as a function, S(·; b X, Wn )i = S(ξ; P Hi (u(Xi ) + λ j6=i Wn,ij ξj ) − ξi S(ξ; X, Wn )i = qP , P n kS(ξ; X, Wn )kE 2 i=1 (Hi (u(Xi ) + λ j6=i Wn,ij ξj ) − ξi ) 200 (C.2.1) b X, Wn ) maps points on ∂B[0, r] to a point in for i = 1, · · · , n. We can see that S(·; b X, Wn ) to a class of functions via homotopy. We S n−1 ⊆ <n . We now associate S(·; analyze properties of this function through deformation. Definition C.2.2 C and D are smooth manifolds in Euclidean spaces.58 Smooth maps, e : C ×[0, 1] → D, R0 : C → D and R1 : C → D are homotopic, if there exists a smooth map, R e 0) = R0 (c) and R(c, e 1) = R1 (c), for all c ∈ C, R e is called a homotopy between such that R(c, R0 and R1 . For any given t ∈ [0, 1], if tHi (u(Xi ) + λ P j6=i Wn,ij ψj ) − ψi 6= 0 for all i = 1, · · · , n and ξ ∈ ∂B[0, r], by setting tHi (u(Xi ) + λ Rt (ξ; X, Wn )i = qP n P j6=i Wn,ij ξj ) i=1 (tHi (u(Xi ) + λ − ξi , (C.2.2) 2 j6=i Wn,ij ξj ) − ξi ) P we can derive a mapping from ∂B[0, r] to S n−1 . If that is possible for all t ∈ [0, 1], we get e ·; X, Wn ) : ∂B[0, r] × [0, 1] → S 1 such that R(·, e ·; X, Wn ) = Rt (ξ; X, Wn ). a homotopy, R(·, b X, Wn ) is homotopic to the function R0 (·; X, Wn ) : ∂B[0, r] → S n−1 : In that case, S(·; −ξi −ξi R0 (ξ; X, Wn )i = pPn = , 2 r i=1 (−ξi ) (C.2.3) for all i = 1, · · · , n. The simplicity of that function makes it convenient to derive properties b X, Wn ). which are invariant to smooth changes in a homotopy and can be applied to S(·; To make sure that this homotopy is well-defined, we impose Assumption 4.4.1 about the function, T , as in (4.4.4). Figure C.14 is a graphic illustration of the homotopy constructed for the H(·) function: H(ξ)i = exp(β(h + Jξj )) − exp(−(β(h + Jξj ))) , exp(β(h + Jξj )) + exp(−(β(h + Jξj ))) (C.2.4) for i, j = 1, 2 and i 6= j, which corresponds to the Binary choice model analyzed by Brock and Durlauf(2007) without imposing rational expectations, ξ1 = ξ2 . We trace out the images of Rt (ξ; X, Wn ) for a point in the circle centerer at the original with a radius of 3, when t takes various values and depict them with the same color and marks. We do that 58 Intuitively speaking, a manifold is a subset of an Euclidean space which looks like an open subset of an Euclidean space locally. Any open subsets of an Euclidean space is, of course a manifold. See Guillemin and Pollack (1974) for a rigorous definition. 201 for four points, (r, 0), (0, r), (−r, 0), and (−r, −r). For each of those points, their images in such a homotopy form an interval in the unit circle in a smooth way. We also depict the image of the original function, S(·; X, Wn ), at a(1, 0), a(0, 1), a(−1, 0), and a(−1, −1), when a runs from 0 to 1. Utlitizing the homotopy (C.2.2), we derive some properties of the set of equilibria, which are summarized in Proposition 4.4.1. It is proved in detail below. Proof of Proposition 4.4.1. The proof is composed of two parts. We first show the existence of an equilibrium in the interior, B(0, r) and then prove finiteness. (1) Under Assumption 4.4.1, when the positive number r is sufficiently large, for any ξ with kξkE = r, we have that kS(ξ; X, Wn )kE = kT (ξ; X, Wn ) − ξkE ≥ |kT (ξ; X, Wn )kE − kξkE | > 0. b X, Wn ) in (C.2.1) is well-defined on ∂B[0, r]. Thus, S(ξ; X, Wn ) 6= 0 on ∂B[0, r] and S(·; Similarly, for any t ∈ [0, 1], ktT (ξ; X, Wn ) − ξkE ≥ |ktT (ξ; X, Wn )kE − kξkE | > 0. Thus, P tHi (u(Xi ) + λ j6=i Wn,ij ψj ) − ψi 6= 0, for all t ∈ [0, 1], i = 1, · · · , n, and all ξ ∈ ∂B[0, r]. e ·; X, Wn ) : ∂B[0, r] × [0, 1] → S 1 such that Therefore, we can define a homotopy, R(·, e ·; X, Wn ) = Rt (ξ; X, Wn ), which is defined in (C.2.2). R0 (ξ; X, Wn ) is a linear transforR(·, b X, Wn ) : mation from ∂B[0, r] to S n−1 with degree (−1)n 6= 0. Therefore, the degree of S(·; ∂B[0, r] → S n−1 is equal to (−1)n 6= 0. (2) If there is no solution to S(ξ; X, Wn ) = 0 in the b X, Wn ) can be extended to the whole closed ball, B[0, r]. According interior, B(0, r), S(·; b X, Wn ) on ∂B[0, r] is equal to zero. to Guillemin and Pollack (1974), the degree of S(·; That is a contradiction. Therefore, there must be at least one point ξ e ∈ IntB[0, r] such that S(ξ e ; X, Wn ) = 0. (3) For any group (X, Wn ), the set of equilibria, E(X, Wn ) is the zeros for the continuous function, S(·; X, Wn ). Therefore, it is closed. As a closed subset of the closed ball B[0, r], E(X, Wn ) is compact. Given regularity, for any ξ e ∈ B[0, r], det(DS(ξ; X, Wn ) 6= 0. By the Inverse Function theorem, S(·; X, Wn ) is a local diffeomorphism around ξ. Thus, there is an open neighborhood, O(ξ e ) ∈ B[0, r] for ξ e such that O(ξ e ) ∩ E(X, Wn ) = {ξ e }. Then the relative open sets, {O(ξ e ) ∩ E(X, Wn )} is an open cover for E(X, Wn ). By compactness, there is a finite subcover. That is, there is an integer, 202 e e e K, such that E(X, Wn ) ⊆ ∪K k=1 O(ξk ) ∩ E(X, Wn ). Because each O(ξk ) ∩ E(X, Wn ) = {ξk } e . is a singleton, E(X, Wn ) contains just a finite number of points, ξ1e , · · · , ξK Proof of Proposition 4.4.2. Pick B[0, r] according to Proposition 4.4.1, define a function b ·; X, Wn ) : B(0, r) × [0, 1] → <n , such that for all t ∈ [0, 1], i = 1, · · · , n, and ξ ∈ <n , R(·, b t; X, Wn )i = tHi (u(Xi ) + λ R(ξ, X Wn,ij ξj ) − ξi . j6=i We do not lose any zeros of S(·; X, Wn ), for there is no zeros for this homotopy on the boundb1 (ξ; X, Wn ) = R(ξ, b 1; X, Wn ) is the restriction of S(·; X, Wn ) in ary ∂B[0, r]. We can see R b0 (ξ; X, Wn ) = R(ξ, b 0; X, Wn ) is just a linear transformation with R b0 (ξ; X, Wn )i = B(0, r). R The restriction, S(·; X, Wn ) : B(0, r) → <n is homotopic −ψi for all i = 1, · · · , n. b0 (·; X, Wn ). By the Sard’s theorem, pick a point b ∈ <n , such that b is a reguto R b Since R b−1 (b) is a closed set in B(0, r) ⊆ B[0, r], it is compact. Due lar value of R. to transversality, Fb−1 (b) is a one-dimension compact submanifold in B(0, r). Therefore, b−1 (b) is zero. Since the boundary, the sum of the orientation numbers at points in ∂ R b is equal to R b0 on B(0, r) × {0} and ∂(B(0, r) × [0, 1]) = (B(0, r) × {0}) ∪ (B(0, r) × {1}, ∂ R b1 on B(0, r) × {1}. Therefore, the intersection numbers of those three functions at {b} R satisfy the following relation: b {b}) = I(R b1 , {b}) − I(R b0 , {b}). I(∂ R, b0 (·; X, Wn ) and R b1 (·; X, Wn ), have the same intersection That is, the two homotopic maps, R numbers at {b}. According to Guillemin and Pollack (1974), since <n is connected and has the same dimension with B(0, r) ∈ <n , the intersection number is invariant with the point picked and is defined as the degree of a function. Therefore, choose point {0}, we get that b1 ) = I(R b1 , {0}) = I(R b0 , {0}) = deg(R b0 ). deg(R b−1 ({0}) = {0} and det(DR b0 ) = (−1)n , we get that the degree of R b0 is also (−1)n , Since R 0 which is equal to +1 when n is even and is equal to −1 when n is odd. Since 0 is a b1 ) = I(R b1 , {0}) b1 which is the restriction of S(·; X, Wn ) on B(0, r), deg(R regular value for R is actually the sum of orientation numbers for points in S −1 (·; X, Wn ), which are model 203 equilibrium expectations. Because the orientation numbers of those points are by definition either +1 or −1, if their sum is either +1 or −1, there must be an odd number of such points. In that case, the equilibria with orientation number +1 (−1) outnumbers the equilibria with orientation number −1 by exactly 1. In addition, if the sign of det(DS(·; X, Wn )) does not change in B(0, r), all the equilibria will have the same orientation numbers, either +1 or −1. If there are more than one equilibria, the absolute value of their sum will be bigger b1 (·; X, Wn ) = (−1)n . Therefore, in that case, there is than 1, which contradicts with deg(R a unique equilibrium. Proof of Lemma 4.4.1. By calculation, dH1 (a1 ) da DS(ξ; X, Wn ) = λ∆H W − In = λ where ai = u(Xi ) + λ P e j6=i Wn,ij ξj 0 ··· 0 0 .. . dH2 (a2 ) da .. . ··· .. . 0 .. . 0 0 ··· dHn (an ) da W − In , for i = 1, · · · , n, and In is the n-dimension identity matrix. DS(ξ; X, Wn ) is an n × n matrix with all diagonal elements equal to −1, i.e., DS(ξ; X, Wn )i,i = −1. All of its off-diagonal elements are equal to their counterparts in λ∆H W , i.e, DS(ξ; X, Wn )i,j = (λ∆H W )i,j . By the Gershgorin circle theorem, every P eigenvalue of DS(ξ; X, Wn ) lies within one of the discs, B[−1, |λ||dHi (ai )/da| j6=i Wi,j ], for i = 1, · · · , n.59 Those circles are all centered at −1. By (4.3.8), all of their radii are strictly less than 1. Therefore, every real eigenvalues of DS(ξ; X, Wn ) is strictly negative. Since the trace of DS(ξ; X, Wn ) is real, if τ is one of DS(ξ; X, Wn )’s eigenvalues, so be its conjugate. But their product is τ τ > 0. Let 2k denote the number of complex eigenvalues. The sign of the product of all eigenvalues, which is equal to the sign of the determinant, is equal to (−1)n−2k = (−1)n . Therefore, sgn(det(DS(ξ; X, Wn ))) = (−1)n for all ξ ∈ <n . 59 P Let A = (aij ) be an n × n matrix. Denote by Ri = j6=i |ai,j | the sum of absolute values of offdiagonal elements in the i-th row. Denote the closed disc centered at aii with a radius Ri by B[aii , Ri ]. By the Gershgorin circle theorem, every eigenvalue of A lies within one of those closed discs, B[aii , Ri ], for i = 1, · · · , n. A brief explanation and proof for this theorem can be found at http://en.wikipedia.org. 204 C.3 Proofs for Identification with Public Characteristics Proof of Proposition 4.4.3. With public information on all exogenous covariates, simply denote (X g , X c ) = X. By calculation, E[Y |X] = E[(In − λW n )−1 Xβ|X]. Therefore, if e λ, e σ (β, λ, σ) and (β, e) are observationally equivalent, e n )−1 X β|X] e E[(In − λW n )−1 Xβ − (In − λW = 0. for any X in its support. Multiply both sides by the non-random matrix, (In − λW n )(In − e n ). Notice that (In − λW n )(In − λW e n ) = (In − λW e n )(In − λW n ), we have that λW e n )Xβ − (In − λW n )X β|X] e E[(In − λW = 0. Denote by ln the n × 1 vector of 1’s, 0 i h 0 0 0 0 e 0,0 β − βe λβe − λβ e |X = 0, E ln W n ln X c W n X c β0,0 − βe0,0 λβe0,0 − λβ 1 1 1 1 which is equivalent to 0 i h c c c c E ln W n ln X W n X ln W n ln X W n X |X 0 0 0 0 0 e e e e e e β0,0 − β0,0 λβ0,0 − λβ0,0 β1 − β1 λβ1 − λβ1 = 0. Taking expectations over X, we get that 0 i h E ln W n ln X c W n X c ln W n ln X c W n X c 0 0 0 0 0 e e e e e e β0,0 − β0,0 λβ0,0 − λβ0,0 β1 − β1 λβ1 − λβ1 = 0. h 0 Xc Xc Xc Xc i is Under assumption (4.4.12), E ln W n ln Wn ln W n ln Wn 0 e 0,0 β 0 − βe0 λβe0 − λβ e 0 implies that β = positive definite. Then β0,0 − βe0,0 λβe0,0 − λβ 1 1 1 1 e If W n is row-normalized, W n ln = ln . Then observationally equivalence implies βe and λ = λ. that 0 i h c c c c E ln X W n X ln X W n X 0 0 0 0 0 e 0,0 β − βe λβe − λβ e = 0. β0,0 − βe0,0 + λβe0,0 − λβ 1 1 1 1 205 e If W n is row-normalized, W n ln = ln . If (4.4.120 ) holds, we will also have β = βe and λ = λ. 0 Because E[(Y − E[Y |X])(Y − E[Y |X]) |X] = σ 2 In . We can identify σ through conditional variance of yi ’s given X. Proof of Lemma 4.4.2. Without loss of generality, suppose that β1,1 > 0. For ω−i ∈ {0, 1}n−1 , choose X (ω)c as c X (ω)c = X c ∈ <nL : Xj,1 (2ωj − 1) ≥ 0, j 6= i . c | → ∞ in X c (ω) ⊆ <nL , with the restriction, We can see that for any j 6= i, as |Xj,1 c (2ω − 1) ≥ 0, u(X ) goes to +∞ when ω is 1; and u(X ) goes to −∞ when ω is 0. Xj,1 j j j j j c c c |→∞,j6=i,X c ∈X c (ω) P (y−i = ω|X ) = 1. Since all X ’s have full support, Therefore, lim|Xj,1 i P (X c ∈ X c (ω)) > 0. Proof of Proposition 4.4.4. By Lemma 4.4.2, lim c |→∞,j6=i,X c ∈X c (ω ) |Xj,1 0 −i = lim c |→∞,j6=i,X c ∈X c (ω ) |Xj,1 0 −i 0 X c0 X P (β0,0 + Xic β1 + λ W n,ij ψj − i > 0|X c ) j6=i P (β0,0 + Xi β1 + λ W n,ij ω0,j − i > 0|X c ). j6=i Therefore, lim log P (yi , ω0 |X c ) lim yi log F (β0,0 + Xic β1 + λ c |→∞,j6=i,X c ∈X c (ω ) |Xj,1 0 = c |→∞,j6=i,X c ∈X c (ω ) |Xj,1 0 0 X W n,ij ω0,j ) j6=i c0 + (1 − yi ) log(1 − F (β0,0 + Xi β1 + λ X W n,ij ω0,j )). j6=i c | ≥ D, for all i 6= j, Under condition (4.4.13), when X c ∈ X c (ω0 ) and |Xj,1 0 c | ≥ D, i 6= j] E[(∂ log P (yi , ω0 |X c )/∂θ)(∂ log P (yi , ω0 |X c )/∂θ) |X c ∈ X c (ω0 ), |Xj,1 0 0 is positive definite. From Rothenberg(1971), θ = (β , λ) can be identified. Proof of Lemma 4.4.3. Similar to the proof of Lemma4.4.2, choose c X1c = X c ∈ <nL : Xj,1 ≥ 0, 1 ≤ j ≤ n . 0 c k → ∞, X c β c In this set, as kXi,1 E i,1 1,1 → +∞. As u(Xi ) = β0,0 + Xi β1 → +∞, in the limit, none outcomes are censored. 206 Proof of Proposition 4.4.5. c | → ∞ in It follows from Lemma4.4.3 that in X1c , as |Xi,1 this set, no choices are censored. Therefore, for the distribution of observed outcomes we have that lim c |→∞,1≤j≤n,X c ∈X c |Xj,1 1 f (Y |X c ) = lim c |→∞,1≤j≤n,X c ∈X c |Xj,1 1 f (Y ∗ |X c ). That is, in the limit, the observed outcomes are the latent variables which are associated with each other just as continuous choices in linear models. Since E[|yi ||X c ] = Hi (u(Xi ) + P λ j6=i W n,ij ξje ) < ∞ and E[|yi∗ ||X c ] = E[yi |X c ], by the Lebesgue Control convergence theorem, lim c |→∞,1≤j≤n,X c ∈X c |Xj,1 1 E[Y |X c ] = = lim E[Y ∗ |X c ] lim (In − λW n )−1 (β0,0 ln + X c β1 ). c |→∞,1≤j≤n,X c ∈X c |Xj,1 1 c |→∞,1≤j≤n,X c ∈X c |Xj,1 1 e λ, e σ Thus, if (β, λ, σ) and (β, e) are observationally equivalent, h lim E ln W n ln X c W n X c c c c |Xj,1 |→∞,1≤j≤n,X ∈X1 0 0 0 0 0 e 0,0 β − βe λβe − λβ e β0,0 − βe0,0 λβe0,0 − λβ 1 1 1 1 i |X c = 0, for all X c ∈ X1c . Similar to the proof of Proposition 4.4.3, we derive that 0 i h c c c c E ln W n ln X W n X lim ln W n ln X W n X c c c |Xj,1 |→∞,1≤j≤n,X ∈X1 e 0,0 β 0 − βe0 λβe0 − λβ e 0 β0,0 − βe0,0 λβe0,0 − λβ 1 1 1 1 0 = 0. Under (4.4.15), there is a positive-measure subset of covariates such that 0 i h c c c c E ln W n ln X W n X ln W n ln X W n X e λ). e Identification of σ follows from Yang, Qu, is positive definite. Therefore, (β, λ) = (β, and Lee (2014). Proof of Proposition 4.4.6. For a group (X, W n ), the sample log likelihood can be written as P log ξe ρ(α, ξ e )f (Y |ξ e ). 207 By calculation, we get that ∂ log L(Y |X c , W n ) ∂ log L(Y |X c , W n ) c |X ] ∂α ∂α0 1 1 =E[( P )4 ( P )2 0 e ; E(X, W ), α)f (Y |ξ e ) e ρ(ξ n ξe ξ e exp(α γ(ξ ; X, W n )) E[ 1 (P exp(α ξe 0 0 γ(ξ e ; X, W n )f (Y |ξ e )) )2 Γ (X c , W n ; β, σ, λ)(D(X, W n ))2 Γ(X c , W n ; β, σ, λ)|X c ]. When there are M equilibria, D(X, W n ) is a M × M diagonal matrix. Its (m, m) element is 0 P 0 e ;X,W )) exp(α γ(ξ e ;X,W n ))f (y|ξ e ) n − Pexp(α γ(ξ . 0 0 e ;X,W ))f (y|ξ e) e e exp(α γ( ξ exp(α γ( ξee ;X,W n )) n ξee ξee We can see that as long as there are multi0 ple equilibria, D(X, W n ) is positive definite. If E[Γ (X c , W n ; β, σ, λ)Γ(X c , W n ; β, σ, λ)|X c ] has full column rank, so does E[ ∂ log L(Y∂α|X c ,W n) ∂ log L(Y |X c ,W n ) |X c ]. ∂α0 From Rothenberg(1971), α can be identified. C.4 Equilibrium with Privately Known Characteristics In this section, we discuss the existence and property of equilibria when some exogenous characteristics are private information. The case in Section 4.5 is one special case. In the Banach space, (Ξ(Wn , J ), k · k). define an operator, T : Ξ(Wn , J ) → Ξ(Wn , J ), such that for all i = 1, · · · , n and m = 1, · · · , Mi , T (ξ)(xpJe , · · · , xpJe 1,1 1,M1 , · · · , xpJe , · · · , xpJe n,1 n,Mn )i,m = E[Hi (u(Xi ) + λ X Wn,ij ξj,Ji (XJpi ))|xpJe , z]. i,m j6=i (C.4.1) An equilibrium conditional expectation function, ξ e , corresponds to one of T ’s fixed points. To apply the Schauder fixed point theorem (Proposition C.4.4), we need to capture a compact set in the function space, (Ξ(Wn , J ), k·k). Analogous to conventional discussions on Banach spaces, we focus on a weaker condition, relative compactness. Definition C.4.1 A set in a metric space is relatively compact if its closure is compact. Simon (1987) introduces a property about relative compact sets, which is used for our proof. it is cited as the lemma below. 208 Lemma C.4.1 A set V is a normed space U is relatively compact if and only if for any η > 0, there are a finite subset {v1 , · · · , vL } ⊆ V such that for any v ∈ V , there is vi for some i = 1, · · · , n with kv − vi kU < η. Thus, as long as we find a set which is relatively compact, all of its own points and its limit points form a new set which is compact. Because ξ ∈ (Ξ(Wn , J ), k·k) if and only if each of its coordinate functions, ξi,m is a function in the Lebesgue space, L1 (Xpi,m , Bi,m , µp ; <1 ), we may utilize the properties of a relatively compact set in L1 (Xpi,m , Bi,m , µp ; <1 ). That is possible due to the following lemma. Lemma C.4.2 Γ0 is a relatively compact subset of (Ξ(Wn , J ), k · k) if and only if n o Q Γ0,i,m = ξi,m : Xpi,m → <1 : (ξi,m , ξ−im ) ∈ Γ0 for some ξ−im : (i0 ,m0 )6=(i,m) Xpi0 ,m0 → <M −1 is relatively compact in L1 (Xi,m , Bi,m , µp ; <1 ) for all i = 1, · · · , n, m = 1, · · · , Mi . Proof. Suppose that Γ0,i,m is relatively compact in L1 (Xpi,m , Bi,m , µp ; <1 ) for each (i, m). By Lemma C.4.1, for an arbitrarily chosen η > 0, for any (i, m), there is (ξi,m,1 , · · · , ξi,m,Li,m ) in Γ0,i,m , such that for any ξi,m ∈ Γ0,i,m , there is ξi,m,l for some 1 ≤ l ≤ Li,m with kξi − ξi,l k1 < η. Construct a finite subset of Γ0 as Γ0b = {(ξ1,1,l1 , · · · , ξn,mn ,ln ) : 1 ≤ li ≤ Li,m for all i = 1, · · · , n, m = 1, · · · , Mi }. Then for any ξ = (ξ1,1 , · · · , ξn,Mn ) ∈ Γ0 , for each (i, m), pick li,m such that kξi,m − R ξi,m,li,m k1 = |ξi,m − ξi,m,li,m |dµp < η. Denoting ξ b = (ξ1,1,l1,1 , · · · , ξn,Mn ,ln,Mn ) ∈ Γ0b , we have that kξ − ξ b k = max1≤i≤n max1≤m≤Mi kξi,m − ξi,m,li,m k1 < η. Therefore, Γ0 is relatively compact in (Ξ(Wn , J ), k · k). On the contrary, suppose that Γ0 is relatively compact in (Ξ(Wn , J ), k · k). If for some (i0 , m0 ), Γ0,i0 ,m0 is not relatively compact in L1 (Xpi0 ,m0 , Bi0 ,m0 , µp ; <1 ), there is η0 > 0, such that for any finite subset of Γ0,i0 ,m0 , n o ξi0 ,m0 ,1 , · · · , ξi0 ,m0 ,Li0 ,m0 , there is ξi∗0 ,m0 ∈ Γ0,i0 ,m0 with kξi∗0 ,m0 − ξi0 ,m0 ,l k1 > η for all 1 ≤ o n l ≤ Li0 ,m0 . For any finite subset of Γ0 , ξ 1 , · · · , ξ L , ξi10 ,m0 , · · · , ξiL0 ,m0 is a finite subset of Γ0,i0 ,m0 . Take ξ ∗ = (ξi∗0 ,m0 , ξ −i0 m0 ) ∈ Γ0 . We have that kξ ∗ − ξ l k ≥ kξi∗0 ,m0 − ξil0 ,m0 k1 > η0 , contradicting with Γ0 being relatively compact. Therefore, each Γ0,i,m is relatively compact in L1 (Xpi,m , Bi,m , µp ; <1 ) for all (i, m). 209 Owing to Lemma C.4.2, to characterize relatively compact sets in (Ξ(Wn , J ), k · k), we need to capture the relatively compact sets in the Lebesgue space L1 (XJei,m , Σi,m , µp ; <1 ). There are some characterizations for Lebesgue spaces of functions whose domains are general measurable spaces and ranges are Banach spaces, such as the results by Brooks and Dinculeanu(1979) and more recently, the Diaz-Mayoral Theorem (See van Neerven(2014) for an elementary proof). Here, we apply the classical results by Dunford and Schwartz(1958). If Ω = <1 , BR is the Borel σ-algebra, µ is the Lebesgue measure, m, and Y is a Banach space with norm k·kY . Dunford and Schwartz (1958) have a characterization for a relatively compact set in the Lebesgue space, Lq (<1 , BR , m; Y), the space consisting of mappings from <1 to Y, integrable under m, with the Lq norm. Proposition C.4.1 For 1 ≤ q < ∞, Υ0 ⊆ Lq (<1 , BR , m; Y) is relatively compact if and only if: 1. It is bounded, i.e., supχ∈Υ0 ( 2. R +∞ −∞ R +∞ −∞ kχ(t)kqY dt)1/q < U for some U > 0; kχ(t + s) − χ(t)kqY dt → 0 as s → 0 uniformly for all χ ∈ Υ0 ; and R +∞ R −r 3. ( r + −∞ )kχ(t)kqY dt → 0 as r → ∞ uniformly for all χ ∈ Υ0 . For Ω = [a, b], (1) and (2) and necessary and sufficient conditions for relative compactness. Proof. See Dunford and Schwartz(1958) Theorem IV.8.20 (pp.298). The results above can be extended to the n-dimension Euclidean space. Proposition C.4.2 For 1 ≤ q < ∞, Υ0 ⊆ Lq (<n , B, m; Y) is relatively compact if and only if: R 1. It is bounded, i.e., supχ∈Υ0 ( <n kχ(t)kqY dt)1/q < U for some U > 0; 2. R +∞ −∞ ··· R +∞ −∞ kχ(t1 + s1 , · · · , tn + sn ) − χ(t1 , · · · , tn )|qY dt1 · · · dtn → 0, as s = (s1 , · · · , sn ) → 0 uniformly for all χ ∈ Υ0 ; and 3. R <n −Cr kχ(t)kqY dt → 0 as r → ∞ uniformly for all χ ∈ Υ0 , where the cube Cr = {t = (t1 , · · · , tn ) ∈ <n : −r ≤ ti ≤ r ∀i = 1, · · · , n}. 210 If Ω = Qn i=1 [ai , bi ], (1) and (2) are necessary and sufficient for relative compactness. Proof. See Dunford and Schwartz(1958) Theorem IV.8.21 (pp.301). Although those results are very general, they are about Lebesgue measure. In our model, particularly, each coordinate function ξi for ξ ∈ Ξ is defined on a subset of the Euclidean space with the probability measure, µp , which is induced by the distribution of Xip ’s conditional on the public information Z = z. We show that when there is a pdf for this conditional distribution, we can apply Proposition C.4.1 and Proposition C.4.2. Lemma C.4.3 Let Ω be a subset of <n . When µ << m, dµ/dm = f is a strictly positive on Ω, Υ0 ∈ L1 (Ω, BΩ , µ; Y) is relatively compact if and only if f Υ0 = {f χ : χ ∈ Υ0 } is relatively compact in L1 (Ω, ΣB , m; B). Proof. On one hand, if f Υ0 is relatively compact in L1 (Ω, ΣB , m; B), by Lemma C.4.1, for any η > 0, there are a finite subset, χ1 , · · · , χK in L1 (Ω, BΩ , m; Y), such that for any χ ∈ R L1 (Ω, BΩ , m; Y), there is χk in that finite set with Ω kχ − χk kY dm < η. χ1 /f, · · · , χK /f R is a finite subset of L1 (Ω, BΩ , µ; Y). For any χ e ∈ Υ0 , f χ e ∈ f Υ0 . Therefore, Ω kf χ e− R R χl kY dm = Ω ke χ − χl /f kY f dm = Ω ke χ − χl /f kY dµ < η. Therefore, Υ0 is relatively compact in L1 (Ω, BΩ , µ; Y). On the other hand, if Υ0 is relatively compact, for any η > 0, 1 there is a finite subset χ e ,··· ,χ eK such that for any χ e ∈ Υ0 , there is a function χ ek in 1 R that finite subset such that Ω ke χ−χ ek kY dµ < η. Note that f χ e , · · · , fχ eK is a finite set in L1 (Ω, BΩ , m; Y). Take χ ∈ L1 (Ω, BΩ , m; Y), χ/f ∈ L1 (Ω, BΩ , µ; Y). Hence, we have R R R that Ω kχ − f χ ek kY dm = Ω kχ/f − χ ek kY f dm = Ω kχ/f − χ ek kY dµ < η. Therefore, f Υ0 is relatively compact in L1 (Ω, BΩ , m; Y). Corollary C.4.1 Suppose that Ω = Qn i=1 [ai , bi ] for −∞ ≤ ai < bi ≤ +∞. µ << m. dµ/dm = f is strictly positive on Ω. When |ai | < ∞ and |bi | < ∞ for all i = 1, · · · , n, Υ0 ∈ L1 (Ω, BΩ , µ; Y) is relatively compact if and only if: 1. supχ∈Υ0 R Ω kχ(t)kB dt < U for some U > 0; 211 2. Z b1 Z bn ··· a1 |χ(t1 + s1 , · · · , tn + sn )f (t1 + s1 , · · · , tn + sn ) − χ(t1 , · · · , tn )f (t1 , · · · , tn )|dt1 · · · dtn → 0, an as s = (s1 , · · · , sn ) → 0, uniformly for all χ ∈ Υ0 . When |ai | = ∞ for some i or |bj | = ∞ for some j, the necessary and sufficient conditions for Υ0 ∈ L1 (Ω, BΩ , µ; Y) to be relatively compact include both (1) and (2), as well as (3): R ω−Cr kχ(t1 , · · · , tn )kY f (t1 , · · · , tn )dt1 · · · dtn → 0, as r → ∞ uniformly for all χ ∈ Υ0 , where the cube Cr = {t = (t1 , · · · , tn ) ∈ Ω : −r ≤ ti ≤ r ∀i = 1, · · · , n}. Combining Lemma C.4.2 and Corollary C.4.1, we derive a characterization of a relatively compact subset of (Ξ(Wn , J ), k · k) using properties of functions. Proposition C.4.3 Suppose that conditional on public information Z = z, the support of Xip , Xpi , is a cube in <kp (It can be bounded or unbounded), and the joint distribution of Xip ’s has a pdf fp (·). 1. When all the Xpi ’s are bounded, Γ0 is relatively compact in (Ξ(Wn , J ), k · k), if and only if, (a) (uniformly bounded) there is a real number B > 0 such that R supξ∈Γ0 kξk = supξ∈Γ0 max1≤i≤n max1≤m≤Mi Xp |ξi,m (x)|fp (x)dx ≤ B; i,m (b) max1≤i≤n max1≤m≤Mi R Xpi,m |ξi,m (x + x e)fp (x + x e) − ξi,m (x)fp (x)|dx → 0 as x e→0 uniformly for any ξ ∈ Γ0 . 2. When some of the support is unbounded, the necessary and sufficient conditions for Γ0 to be relatively compact in (Ξ(Wn , J ), k · k) include (1) and (2) as well as (3): R max1≤i≤n max1≤m≤Mi Xp −Cr,i,m |ξi,m (x)|dx → 0 as r → ∞ uniformly for all ξ ∈ Υ0 , ni,m o where the cube Cr,i,m = x ∈ Xpi,m : |xj | ≤ r ∀j with Ji,m (j) = 1 . Proof. (a) Suppose that Γ0 is relatively compact in (Ξ(Wn , J ), k · k). Then for each i and m, the set, Γ0,i,m = {ξi,m : (ξi,m , ξ−im ) ∈ Γ0 for some ξ−im } 212 is relatively compact in L1 (Xpi,m , Bi,m , µp ; <1 ) according to LemmaC.4.2. It follows from R Corollary C.4.1 that for any i and m, there is U i,m , such that supξi,m Γ0,i,m Xp |ξi (x)|fp (x)dx < i,m U i,m . Therefore, Z sup kξk = sup max max ξ∈Γ0 1≤i≤n 1≤m≤Mi ξ∈Γ0 ≤ max Xpı,m |ξi,m (x)|fp (x)dx max U i,m 1≤i≤n 1≤m≤Mi =U , where U = max1≤i≤n max1≤m≤Mi U i,m < ∞. Additionally, for any η > 0, there is δi,m > 0, R such that Xp |ξ(x + x ei,m )f (x + x ei,m − ξ(x)f (x)|dx < η when ke xi,m kE < δi,m for all ξ ∈ Γ0 . i,m Take δ = min 1 ≤ i ≤ n max1≤m≤Mi δi,m > 0. For a vector 0 0 0 0 0 x e = (e x1,1 , · · · , x e1,M1 , · · · , x en,1 , · · · , x en,Mn ) , when ke xkE < δ, ke xi,m kE < δi,m for all (i, m). R ei,m )f (x + x ei,m ) − ξi,m (x)f (x)|dx < η for all Then max1≤i≤n max1≤m≤Mi Xp |ξi,m (x + x i,m ξ ∈ Γ0 . When some of the Xip ’s have an unbounded support, for any η > 0, there is R Ri,m > 0, such that for all (i, m), Xp −Cr,i,m |χ(x)|dx < η for all r > R and all ξi,m . Take i,m R = max1≤i≤n max1≤m≤Mi Ri,m < +∞. Then when r > R, R max1≤i≤n max1≤m≤Mi Xp −Cr,i,m |ξi,m (x)|dx < η for all ξ ∈ Υ0 . i,m (b) One the contrary, if (1) and (2) hold, suppose that there is (i0 , m0 ), such that for any R U > 0, there is ξi0 ,m0 ∈ Γ0,i0 ,m0 , such that kξi0 ,m0 k1 = Xp |ξi0 ,m0 (x)|fp (x)dx > U . Then i0 ,m0 pick any ξ−i0 m0 ∈ Γ0,−i0 m0 such that ξ = (ξi0 ,m0 , ξ−i0 m0 ) ∈ Γ0 . Then R R kξk = max1≤i≤n max1≤m≤Mi Xp |ξi,m (x)|fp (x)dx ≥ Xp |ξi0 ,m0 (x)|fp (x)dx > U , i,m i0 ,m0 which is a contradiction. Therefore, every Γ0,i,m is uniformly bounded under the k · k1 norm. δi0 ,m0 Similarly, suppose that there is some (i0 , m0 ), with some ηi0 ,m0 > 0, for any Rb > 0, there is x ei0 ,m0 with ke xi0 kE < δi0 ,m0 and ξi0 ,m0 ∈ Γ0,i0 ,m0 , such that a |ξi0 ,m0 (x + x ei0 ,m0 )fp (x + x ei0 ,m0 ) − ξi0 ,m0 (x)f (x)|dx > ηi0 ,m0 . Pick any ξ−i0 m0 ∈ Γ0,−i0 m0 such that R ξ = (ξi0 ,m0 , ξ−i0 m0 ) ∈ Γ0 . Then max1≤i≤n max1≤m≤Mi Xp |ξi,m (x + x ei,m )fp (x + x ei,m ) − i,m R ξi,m (x)f (x)|dx ≥ Xp |ξi0 ,m0 (x + x ei0 ,m0 )fp (x + x ei0 ,m0 ) − ξi0 ,m0 (x)f (x)|dx > η, which i0 ,m0 contradicts that Γ0 satisfies (2). some ηi0 ,m0 Similarly, suppose that there is some (i0 , m0 ), with R > 0, for any R > 0, there is r > R, such that Xp |ξi0 ,m0 (x)|fp (x)dx > i0 ,m0 213 ηi0 ,m0 . Then max1≤i≤n max1≤m≤Mi R Xpi,m −Cr,i,m |ξi,m (x)|dx ≥ R Xpi 0 ,m0 |ξi0 ,m0 (x)|fp (x)dx > η, which is also a contradiction. By Corollary C.4.1, each Γ0,i,m is relatively compact in L1 (Xpe , Bi,m , µp ; <1 ). It then follows from LemmaC.4.2 that Γ0 is relatively compact in Ji,m (Ξ(Wn , J), k · k). The existence of an equilibrium is established by Schauder fixed point theorem, which is cited below. Proposition C.4.4 [Schauder fixed point theorem] Let K be a nonempty, closed, and convex subset of a normed space. Let T be a continuous mapping from K into a compact subset of K. Then T has a fixed point in K. In order to apply this theorem to operator T defined in (C.4.1), we impose two assumptions, Assumptions 4.5.2 and 4.5.3. Lemma C.4.4 Under Assumption 4.5.2, T is continuous. Actually, it is a Lipschitz function. 0 Proof. For any ξ, ξ ∈ (Ξ(Wn , J ), k · k), 0 kT (ξ) − T (ξ )k Z h i X X 0 = max max E Hi (u(Xi ) + λ Wn,i ξj,Ji (xpJi )) − Hi (u(Xi ) + λ Wn,i ξj,Ji (xpJi ))|xpi,m , z dFp 1≤i≤n 1≤m≤Mi ≤ max max sup |dHi (c)/dc||λ| 1≤i≤n 1≤m≤Mi = max j6=i c max sup |dHi (c)/dc||λ| 1≤i≤n 1≤m≤Mi c X j6=i Z Wn,ij h 0 E |ξi,Ji (xpJi ) − ξi,Ji (xpJi )|xpi,Je i,m i , z dFp 6=i X Z Wn,ij 0 |ξi,Ji (xpJi ) − ξi,Ji (xpJi )|dFp 6=i 0 ≤ max sup |dHi (c)/dc||λ|kWn k∞ kξ − ξ k. 1≤i≤n c That is to say, T is a Lipschitz function. Thus, it is continuous in (Ξ(Wn , J ), k · k). Lemma C.4.5 Under Assumption 4.5.3, there is r0 > 0, such that there is no equilibria out of the closed ball, B[0, r0 ] = {ξ ∈ (Ξ(Wn , J ), k · k) : kξk ≤ r0 }. In addition, for any ξ ∈ B[0, r0 ], T (ξ) ∈ B[0, r0 ]. 214 Proof. Because kT (ξ) − ξk ≥ kT (ξ)k − kξk, under Assumption 4.5.3, there is r1 > 0, such that kT (ξ) − ξk > 0 for all ξ with kξk > r1 . Now we show that there is r2 > 0, such that for any r ≥ r2 , T (B[0, r]) ⊆ B[0, r]. If this statement does not hold, for any positive r, there is r∗ ≥ r and ξ r∗ with kξ r∗ k ≤ r∗ , kT (ξ r∗ )k > r∗ ≥ kξ r∗ k. Then kT (ξ r∗ )k/kξ r∗ k > 1, which contradicts Assumption 4.5.3. Choose r0 = max {r1 , r2 }, we get the results. Proposition C.4.5 Under Assumptions 4.5.2 and 4.5.3, if in addition, Z max max |T (ξ)i,m (x + x e)fp (x + x e) − T (ξ)i,m (x)fp (x)|dx → 0, (C.4.2) as x e → 0, uniformly for any ξ ∈ B[0, r0 ]; and Z max max |T (ξ)i,m (x)|fp (x)dx → 0, (C.4.3) 1≤i≤n 1≤m≤Mi Xp 1≤i≤n 1≤m≤Mi Xp −Cr as r → ∞, uniformly for all ξ ∈ B[0, r0 ], the set of equilibria, E(X, Wn ) is a nonempty and compact subset of (Ξ(Wn , J ), k·k) and is contiained in the closed ball B[0, r0 ]. In particular, (C.4.2) and (C.4.3) are satisfied, if 0 0 1. Hi (·)’s are uniformly bounded, i.e., max1≤i≤n supa∈<1 |Hi (a)| ≤ B , for some B ; 2. E[Xip |Z = z] < ∞, for all i; and 3. For some δ0 > 0, for each (i, m), there is an function gi,m (x, x b) such that R R b)|dxdb x < ∞, |fp,i,m (x + x e, x b)| ≤ gi,m (x, x b), a.e., for any x e in the Xp Xp |gi,m (x, x i,m Ji p cube Cδ0 , where fp,i,m (·, ·) is the joint density of Xi,m = XJpi,m and XJpi conditional on public information Z = z.60 Proof. Choose the closed ball B[0, r0 ] satisfying the properties stated in Lemma C.4.5. 0 It is nonempty, closed, and convex in space (Ξ(Wn , J ), k · k). For any ξ ∈ T (B[0, r0 ]), 0 ξ = T (ξ), for some ξ ∈ B[0, r0 ]. Therefore, Z 0 0 |ξi,m (x + x e)fp (x + x e) − ξi,m (x)fp (x)|dx max max 1≤i≤n 1≤m≤Mi Xp Z |T (ξ)i,m (x + x e)fp (x + x e) − T (ξ)i,m (x)fp (x)|dx; = max max 1≤i≤n 1≤m≤Mi Xp p It is possible that they have overlaps between Xi,m and XJpi . For example, both agent 1 and agent 2 p know X3 . We write fp,i,m (x, y) in order to simplify notations. 60 215 Z max 1≤i≤n 1≤m≤Mi Z 0 max Xp −Cr |ξi,m (x)|fp (x)dx = max |T (ξ)i,m (x)|fp (x)dx. max 1≤i≤n 1≤m≤Mi Xp −Cr Therefore, if (C.4.2) and (C.4.3) are satisfied, according to Proposition C.4.3, T (B[0, r0 ]) is relatively compact in the normed space (Ξ(Wn , J ), k·k). Its closure, T (B[0, r0 ]), is compact. Moreover, T (B[0, r0 ]) ⊆ B[0, r0 ], for B[0, r0 ] is closed. By the Schauder fixed point, T has a fixed point in B[0, r0 ]. Thus, the set of equilibria, E(X, Wn ), is nonempty. Since it is the set of fixed points for the continuous operator T , E(X, Wn ) is closed. As a closed subset of the compact set T (B[0, r0 ]), E(X, Wn ) is compact. In particular, if Hi (·)’s are uniformly bounded, Z max max 1≤i≤n 1≤m≤Mi Xp i,m Z = max max 1≤i≤n 1≤m≤Mi Xp i,m T (ξ)i,m (x + x e)fp (x + x e) − T (ξ)i,m (x)fp (x)dx X E[Hi (u(X g , Xic , x b) + λ Wn,ij ξj (b x))|x + x e, z]fp (x + x e) j6=i − E[Hi (u(X g , Xic , x b) + λ X Wn,ij ξj (b x))|x, z]fp (x)dx j6=i Z = max max 1≤i≤n 1≤m≤Mi Xp i,m Z Xp J Hi (u(X g , Xic , x b) + λ max i sup |Hi (a)| 1≤i≤n 1≤m≤Mi a∈<1 Wn,ij ξj (b x))(fp (x + x e, x b) − fp (x, x b))db x|dx j6=i Z Z ≤ max X Xp i,m Xp J |fp (x + x e, x b) − fp (x, x b))|db xdx, i which, following from the Lebesgue dominated convergence theorem, goes to 0 uniformly for all ξ’s as x e → 0, under the above distribution assumption. Similarly, for any r > 0, Z max max |T (ξ)i,m (x)|fp (x)dx 1≤i≤n 1≤m≤Mi Xp −Cr Z Z X = max max | Hi (u(X g , Xic , x b) + λ Wn,ij ξj (y))fp (b x|x)db x|fp (x)dx 1≤i≤n 1≤m≤Mi Xp −Cr XpJ j6=i i Z ≤ max max sup |Hi (a)| 1≤i≤n 1≤m≤Mi a∈<1 ≤ max max sup |Hi (a)| 1≤i≤n 1≤m≤Mi a∈<1 ≤ max max sup |Hi (a)| 1≤i≤n 1≤m≤Mi a∈<1 fp (x)dx Xp −Cr X P (|Xjp | > r) j:Ji,m (j)=1 X E[|Xjp |z]/r, j:Ji,m (j)=1 (C.4.4) 216 where the last inequality follows from the Chebyshev’s inequality. As r → ∞, the above formula goes to zero uniformly for all ξ. It is obvious that if Xip ’s have a bounded support and their joint density conditional on public information is continuous, |fp,i,m (x + x e, y)| can be dominated by the constant function on the support, which is Lebesgue integrable. Now we show that we can dominate the density for jointly normal random vectors. 0 0 0 0 0 0 Lemma C.4.6 Suppose that (X1 , X2 ) are jointly normal with mean (µ1 , µ2 ) and variancecovariance matrix, Σ11 Σ12 Σ= . Σ21 Σ22 Take δ0 > 0 arbitrarily. Then there is a function g : <k1 × <k2 → <1 , such that the joint density f (x + x e, x b) ≤ g(x, x b) a.e. for any x e ∈ Cδ0 . Proof. Denote the inverse of Σ by e e Σ11 Σ12 Σ−1 = . e 21 Σ e 22 Σ The joint density takes the following form: f (x + x e, x b) n o 0 0 0 0 0 =(2π)−(k1 +k2 )/2 (det(Σ))−1/2 exp −(1/2)(x + x e − µ1 , x b − µ2 )Σ−1 (x + x b − µ1 , x b − µ2 ) o n 0 e e 22 − Σ e 21 Σ e −1 Σ x − µ2 ) =(2π)−(k1 +k2 )/2 (det(Σ))−1/2 exp −(1/2)(b x − µ2 ) ( Σ 11 12 )(b n o 0 −1/2 e e −1/2 Σ e 12 (b e e · exp −(1/2)(e x + (x − µ1 ) + Σ x − µ )) Σ (e x + (x − µ ) + Σ Σ (b x − µ )) . 2 11 1 12 2 11 11 For any (x, x b), since the cube Cδ0 is compact in an Euclidean space, we can define −1/2 e e ge(x, x b) = minxe∈Cδ0 (e x + (x − µ1 ) + Σ 11 0 −1/2 e e 11 (e e Σ12 (b x − µ2 )) Σ x + (x − µ1 ) + Σ 11 Σ12 (b x − µ2 )). This function is defined on the basis of an optimization problem about a quadratic form with respect to linear constraints. For any c ≥ 0, define the lower contour set, n o 0 e −1/2 Σ e 12 (b e 11 (e e −1/2 Σ e 12 (b L(c, x, x b) = x e : (e x + (x − µ1 ) + Σ x − µ2 )) Σ x + (x − µ1 ) + Σ x − µ2 )) ≤ c . 11 11 217 Each of those sets is an aera composed of an ellipse and its interior. It is convex. e −1/2 Σ e 12 (b Fixing (x, x b), we either have a solution inside Cδ0 , −(x − µ1 ) − Σ x − µ2 ), or a 11 corner solution at a boundary point of the cube Cδ0 . In the space for x e, fixing c, when (x, x b) varies, the center of the ellipse moves; while the axes do not change. Therefore, we can divide the space for (x, x b) into several regions, such that two points in the same region either both have interior solutions or corner solutions on the same edge of Cδ0 .61 e −1/2 Σ e 12 (b If −(x − µ1 ) − Σ x − µ2 ) ∈ Cδ0 , ge(x, x b) = 0. Otherwise, its value depends on the 11 minimal boundary point. In this case, as the edges of Cδ0 are bounded, for any x e on the boundary of Cδ0 , 0 e −1/2 Σ e 12 (b e 11 (e e −1/2 Σ e 12 (b x + (x − µ1 ) + Σ x − µ2 )) Σ x + (x − µ1 ) + Σ x − µ2 )) (e 11 11 0 e −1/2 Σ e 12 (b e 11 ((x − µ1 ) + Σ e −1/2 Σ e 12 (b − ((x − µ1 ) + Σ x − µ )) Σ x − µ )) b)k2E → 0 /k(x, x 2 2 11 11 as k(x, x b)kE → ∞. Therefore, there is R > 0, such that when k(x, x b)kE > R, 0 e −1/2 Σ e 12 (b e 11 ((x − µ1 ) + Σ e −1/2 Σ e 12 (b ge(x, x b) ≥ (1/4)((x − µ1 ) + Σ x − µ2 )) Σ x − µ2 )). 11 11 Define g(x, x b) = (2π)−(k1 +k2 )/2 (det(Σ))−1/2 exp {−(1/2)e g (x, x b)}. Then f (x+e x, x b) ≤ g(x, x b), for any (x, x b). By the Maximum Theorem, ge(x, x b) is continuous, so does g(x, x b). Moreover, −1/2 e e if −(x − µ1 ) − Σ 11 Σ12 (b x − µ2 ) ∈ Cδ0 , g(x, x b) = (2π)−(k1 +k2 )/2 (det(Σ))−1/2 ; otherwise, g(x, x b) is continuous when k(x, x b)kE ≤ R, and when k(x, x b)kE > R, g(x, x b) ≤(2π)−(k1 +k2 )/2 (det(Σ))−1/2 n o 0 e −1/2 Σ e 12 (b e 11 ((x − µ1 ) + Σ e −1/2 Σ e 12 (b · exp −(1/8)((x − µ1 ) + Σ x − µ2 )) Σ x − µ2 )) . 11 11 Hence, g(x, x b) is integrable. C.5 Equilibrium for Peer Effects e Proof of Proposition 4.6.1. On one hand, suppose that ξ satisfying (4.6.6). With a e e regular group, we can define ξj,m = Λj (ξ m(j) ) m for all j = 1, · · · , n and m = 1, · · · , M0 . 61 This can be seen clearly for the special case when k1 = k2 = 1. In that case, for any (x, x b), if −(x − e −1/2 Σ e 12 (b e −1/2 Σ e 12 (b µ1 ) − Σ x − µ2 ) < −δ0 , the minimizing solution is −δ0 ; if −δ0 ≤ −(x − µ1 ) − Σ x − µ2 ) ≤ δ0 , 11 11 e −1/2 Σ e 12 (b the minimizing solution is interior; if −(x − µ1 ) − Σ x − µ2 ) > δ0 , the minimizing solution is δ0 . 11 218 (4.6.6) implies that e ξ m(i) (xpJ,m(i) ) = = n X j=1 n X e e p p p ) − λ(Λj (ξ m(j) ))m(j) (XJ,m(j) ))|XJ,m(i) = xpJ,m(i) , z] E[Hj (u(Xj ) + λξ m(j) (XJ,m(j) e Λj (ξ m(j) ) m(i) (xpJ,m(i) ) j=1 = n X e (xpJ,m(i) ). ξj,m(i) j=1 for all i and xpJ,m(i) ∈ XpJ,m(i) . Therefore, e e ξi,m (xpJ,m ) = Λi (ξ m(i) ) m (xpJ,m ) e p p p e =E[Hi (u(Xi ) + λξ m(i) (XJ,m(i) ) − λξi,m(i) (XJ,m(i) ))|XJ,m = xpJ,m , z] X p p e =E[Hi (u(Xi ) + λ ξj,m(i) (XJ,m(i) ))|XJ,m = xpJ,m , z]. j6=i e e e , · · · , ξe On the other hand, given that ξ e = (ξ1,1 1,M0 , · · · , ξn,1 , · · · , ξn,M0 ) satisfies (4.6.3), e e e define ξ = (ξ 1 , · · · , ξ M0 ) such that ξ e (xpJ,1 , · · · , xpJ,M0 )m = e ξ m (xpJ,m ) = n X e ξi,m (xpJ,m ), i=1 for all m and xpJ,m ∈ XpJ,m . Then we get that e ξi,m (xpJ,m ) =E[Hi (u(Xi ) + λ X p p e ξj,m(i) (XJ,m(i) ))|XJ,m = xpJ,m , z] j6=i e p p p e =E[Hi (u(Xi ) + λξ m(i) (XJ,m(i) ) − λξi,m(i) (XJ,m(i) ))|XJ,m = xpJ,m , z]. e e Applying the Implicit Function theorem in Banach spaces, ξi,m = (Λi (ξ m(i) ))m , for all i = 1, · · · , n and m = 1, · · · , M0 . Therefore, 219 e ξ m (xpJ,m ) = n X e ξi,m (xpJ,m ) i=1 = n X e (Λi (ξ m(i) ))m (xpJ,m ) i=1 = n X e p p p e ))|XJ,m = xpJ,m , z] ) − λξi,m(i) (XJ,m(i) E[Hi (u(Xi ) + λξ m(i) (XJ,m(i) i=1 = n X e e p E[Hi (u(Xi ) + λξ m(i) (XJpi ) − λ(Λi (ξ m(i) ))(XJpi ))|XJ,m = xpJ,m , z]. i=1 C.6 Proofs for Equilibrium Set Characterization in Binary Choice Models Proof for Proposition 4.7.1. Let ξ denote the total expected outcome in the group. Pn Pn That is, ξ = 1 E[yi ]. Given ξ, for agent i, K(ξi , ξ; ui , λ) = Φ(ui + λξ − i=1 ξi = λξi ) − ξi = 0. K(0, ξ; ui , λ) > 0. K(1, ξ; ui , λ) < 0. ∂K ∂ξi = −(λφ(ui + λξ − λξi ) + 1). If λφ(ui +λξ −λξi )+1 > 0, for each ξ, there is a unique ξi ∈ (0, 1) such that K(ξi , ξ; ui , λ) = 0. Thus, individual expected outcomes are determined by a function, ξi = G(ui , ξ), such that ∂G(ui , ξ) λφ(ui + λξ − λG(ui , ξ)) = , ∂ξ λφ(ui + λξ − λG(ui , ξ)) + 1 ∂ 2 G(ui , ξ) ∂ξ 2 =− λ2 φ(ui + λξ − λG(ui , ξ))(ui + λξ − λG(ui , ξ)) . [λφ(ui + λξ − λG(ui , ξ)) + 1]3 ξ is determined by S(u, ξ) = n X G(ui , ξ) − ξ. i=1 As a result, ∂S ∂ξ = Pn i=1 ∂G(ui ,ξ) ∂ξ e − 1 and ∂2S 2 ∂ξ = Pn i=1 ∂ 2 G(ui ,ξ) ∂ξ 2 . As ξi ∈ [0, 1] for i = 1, · · · , n, the valid equilibrium ξ ∈ [0, n]. Since G(ui , ξ) ∈ (0, 1), S(u, 0) > 0 and S(u, n) < 0. Therefore, there must be an equilibrium between 0 and n. √ • When − 2π < λ < 0, λφ(ui + λξ − λG(ui , ξ)) + 1 > 0. Thus, and ∂S ∂ξ < 0. As a result, there is a unique equilibrium. 220 ∂G(ui ,ξ) ∂ξ < 0 for all i • When λ > 0, ∂G(ui ,ξ) ∂ξ > 0 for all i. If min1≤i≤n ui > λ 2, Φ(ui − λ2 ) − is, K( 12 , 0; ui , λ) > 0. Then for all i, G(ui , ξ) ≥ G(ui , 0) > λξ − λG(ui , ξ) = Φ−1 (G(ui , ξ)) > 0. It follows that Similarly, if max1≤i≤n ui < K( 21 , n; ui , λ) < 1 2. λ 2 ∂ 2 G(ui ,ξ) ∂ξ 2 1 2. Φ−1 (G(ui , ξ) < 0. In this case, ∂ 2 G(ui ,ξ) 2 ∂ξ 1 2. > 0. That Therefore, ui + < 0 for any ξ ∈ [0, n]. − λn, for any i, Φ(ui + λn − λ2 ) − Then G(ui , ξ) ≤ G(ui , n) < 1 2 1 2 < 0. That is, Then ui + λξ − λG(ui , ξ) = > 0 for all ξ ∈ [0, n]. In both cases, ∂ 2 G(ui ,ξ) ∂ξ 2 does not change sign as ξ runs in [0, n]. Hence, there is a unique equilibrium. Proof of Lemma 4.7.1. It is easy to see that c(a; λ) = c(−a; λ). c(0; λ) = λφ(0) + 1 > 0. 0 lima→+∞ c(a; λ) = lima→−∞ c(a; λ) = −∞. c (a; λ) = a(3λφ(a) − 2 − 2λa2 φ(a)). When 0<λ< √ 2 2π 3 , 0 0 3λφ(a) − 2 − 2λa2 φ(a) < 0. Thus, c (a; λ) < 0 for a > 0 and c (a; λ) > 0 for a < 0. Therefore, there is a+ > 0 with the claimed properties. Additionally, da+ dλ = 2 (2a +1)φ(a) − a(3λφ(a)−2−2λa 2 φ(a)) > 0. Proof of Proposition 4.7.2. For any i, ∂ 3 G(ui , ξ) ∂ξ ∂3S 3 ∂ξ = Pn i=1 3 ∂ 3 G(ui ,ξ) ∂ξ 3 =− λ3 φ(ui + λξ − λG(ui , ξ))c(ui + λξ − λG(ui , ξ); λ) . [λφ(ui + λξ − λG(ui , ξ)) + 1]5 . As λ > 0, ∂G(ui ,ξ) ∂ξ > 0 for all i. If (4.7.4) holds, for any i, Φ(ui − λΦ(a+ (λ))) − Φ(a+ (λ)) > 0. That is, K(Φ(a+ (λ)), 0; ui , λ) > 0. Therefore, G(ui , ξ) ≥ G(ui , 0) > Φ(a+ (λ)) for all ξ ∈ [0, n]. It follows that ui + λξ − λG(ui , ξ) = Φ−1 (G(ui , ξ)) > a+ (λ). From Lemma 4.7.1, c(ui +λξ −λG(ui , ξ); λ) < 0, for all i and ξ ∈ [0, n]. Analogously, under condition (4.7.5), for any i, Φ(ui + λn − λΦ(−a+ (λ))) − Φ(−a+ (λ)) < 0. Then K(Φ(−a+ (λ)), n; ui , λ) < 0. Therefore, G(ui , ξ) ≤ G(ui , n) < Φ(−a+ (λ)). That is, for any i. ui +λξ−λG(ui , ξ) = Φ−1 (G(ui , ξ)) < −a+ (λ). By Lemma 4.7.1, c(ui +λb−λG(ui , ξ); λ) < 0, for all i and ξ ∈ [0, n]. Thus, in both cases, are more than three equilibria, ∂2S 2 ∂ξ ∂3S 3 ∂ξ > 0 when ξ runs from 0 to n. If there must change its sign, which is impossible if ∂3S 3 ∂ξ keeps on being positive. Consequently, under condition (4.7.4) or (4.7.5), there are at most three equilibria. 221 e i , ξ; ui , λ) = 2Φ(ui +λξ−λξi )−1−ξi = 0. Proof of Proposition 4.7.3. Fix λ, for any i, K(ξ e ξ; ui , λ) < 0, and e ξ; ui , λ) > 0, K(1, As K(−1, e ∂K ∂ξi = −(2λφ(ui + λξ − λξi ) + 1), when e i , ξ; ui , λ) = 2λφ(ui +λξ −λξi )+1 > 0, for each ξ, there is a unique ξi ∈ [−1, 1] such that K(ξ e i , ξ; λ). By 0. Thus, ξi is implicitly a function of ξ and ui . Denote this function as G(u computation, e i , ξ) e i , ξ)) ∂ G(u 2λφ(ui + λξ − λG(u = , e i , ξ)) + 1 ∂ξ 2λφ(ui + λξ − λG(u e i , ξ) ∂ 2 G(u ∂ξ 2 =− e i , ξ))(ui + λξ − λG(u e i , ξ)) 2λ2 φ(ui + λξ − λG(u . e i , ξ)) + 1]3 [2λφ(ui + λξ − λG(u ξ is determined by e ξ) = S(u, n X e i , ξ) − ξ. G(u i=1 As a result, e ∂S ∂ξ = Pn i=1 e i ,ξ) ∂ G(u −1 ∂ξ e and e ∂2S 2 ∂ξ = Pn e i ,ξ) ∂ 2 G(u i=1 ∂ξ 2 . As ξi ∈ [−1, 1] for i = 1, · · · , n, e i , ξ) ∈ (−1, 1), S(u, e −n) > 0 and S(u, e n) < 0. the valid equilibrium ξ ∈ [−n, n]. Since G(u Therefore, there must be an equilibrium between −n and n. √ • When − and e ∂S ∂ξ 2π 2 e i , ξ)) + 1 > 0. Then < λ < 0, 2λφ(ui + λξ − λG(u e i ,ξ) ∂ G(u ∂ξ < 0 for all i < 0. Thus, in this case, there is a unique equilibrium. • When λ > 0, e i ,ξ) ∂ G(u ∂ξ > 0 for all i. If min1≤i≤n ui > λn, 2Φ(ui − λn) − 1 > 0. e −n; ui , λ) > 0. Then for all i, G(u e i , ξ) ≥ G(u e i , −n) > 0. Therefore, That is, K(0, e i , ξ) = Φ−1 ( G(ui ,ξ)+1 ) > 0. It follows that ui + λξ − λG(u 2 e e i ,ξ) ∂ 2 G(u ∂ξ 2 < 0 for any ξ ∈ [−n, n]. Similarly, if max1≤i≤n ui < −λn, for any i, 2Φ(ui + λn) − 1 < 0. That e i , n) < 0. Then ui + λξ − λG(u e i , ξ) = e n; ui , λ) < 0. Hence, G(u e i , ξ) ≤ G(u is, K(0, Φ−1 ( G(ui2,ξ)+1 ) < 0. In this case, e both cases, e i ,ξ) ∂ 2 G(u 2 ∂ξ e i ,ξ) ∂ 2 G(u ∂ξ 2 > 0 for all ξ ∈ [−n, n]. That is to say, in does not change its sign as ξ runs in [−n, n]. Hence, there is a unique equilibrium in these two cases. Proof of Lemma 4.7.2. First, it is easy to see that e c(a; λ) = e c(−a; λ). e c(0; λ) = 2λφ(0)+ 1 > 0. lima→+∞ e c(a; λ) = lima→−∞ e c(a; λ) = −∞. 222 de c(a;λ) da = 2a(3λφ(a) − 1 − 2λφ(a)a2 ). √ If 0 < λ < 2π 3 , 3λφ(a) − 1 − 2λφ(a)a2 < 0. Then de c(a;λ) da > 0 if a < 0; and de c(a;λ) da <0 if a > 0. Thus, a = 0 is the unique peak for e c(·; λ). By symmetry, there is e a+ > 0 such that e c(a; λ) > 0 if −e a+ < a < e a+ ; e c(e a+ ; λ) = e c(−e a+ ; λ) = 0; and e c(a; λ) < 0 for a > e a+ or φ(e a+ )(1+2e a2 ) a+ + a < −e a+ . In addition, from e c(e a+ ; λ) = 0, de > 0. dλ = e a+ (2λφ(e a+ )e a2+ +1−3λφ(e a+ )) P 3 3e Proof of Proposition 4.7.4. ∂ 3se = ni=1 ∂ G3 , where ∂ξ e ∂3G ∂ξ 3 =− ∂ξ e i ; λ))e e i ; λ); λ) c(ui + λξ − λG(u 2λφ(ui + λξ − λG(u . e i ; λ)) + 1)5 (2λ3 φ(ui + λξ − λG(u e ∂3G 3 ∂ξ e i ; λ); λ). If is determined by the sign of e c(ui + λξ − λG(u min1≤i≤n ui > λ(2Φ(e a+ (λ))−1+n)+e a+ (λ), for any i, 2Φ ui −λn−λ(2Φ(e a+ (λ))−1) −1 > When λ > 0, the sign of e a+ (λ)) − 1, −n; ui , λ) > 0. Thus, for any ξ ∈ [−n, n], 2Φ(e a+ (λ)) − 1. That is, K(2Φ(e a+ (λ)) − 1. Therefore, ui + λξ − λG(ui , ξ) = Φ−1 ( G(ui2,ξ)+1 ) > G(ui , ξ) ≥ G(ui , −n) > 2Φ(e e i ; λ); λ) < 0 for all i and e a+ . It then follows from Lemma 4.7.2 that e c(ui + λξ − λG(u ξ ∈ [−n, n]. Similarly, if max1≤i≤n ui < λ(2Φ(−e a+ (λ))−1−n)−e a+ (λ), for all i, 2Φ ui +λn− e λ(2Φ(−e a+ (λ)) − 1) − 1 < 2Φ(−e a+ (λ)) − 1. That is, K(2Φ(−e a+ (λ)) − 1, n; ui , λ) < 0. Then G(ui , ξ) ≤ G(ui , n) < 2Φ(−e a+ (λ)) − 1. Therefore, ui + λξ − λG(ui , ξ) = Φ−1 ( G(ui2,ξ)+1 ) < e i ; λ); λ) < 0 for all i and ξ ∈ [−n, n]. That is −e a+ . Applying Lemma 4.7.2, e c(ui + λξ − λG(u to say, when (4.7.11) or (4.7.12) holds, this case, ∂ 2 se 2 ∂ξ e ∂3G 3 ∂ξ > 0 for all i and ∂ 3 se 3 ∂ξ > 0 for all ξ ∈ [−n, n]. In does not change its sign in [−n, n] and there are at most three equilibria. 223 Table C.1: Binary Choice I: Estimation Comparison I II III True parameters G = 100 G = 200 G = 100 G = 200 G = 100 G = 200 β0 0 0.9376 (0.2174) 0.9409 (0.1566) 0.1851 (0.2127) 0.1792 (0.1470) 0.0030 (0.2537) 0.0056 (0.1773) β1 1 0.7397 (0.0957) 0.7470 (0.0662) 0.9237 (0.1050) 0.9281 (0.0674) 1.0210 (0.1299) 1.0165 (0.0848) β2 1 0.7957 (0.4130) 0.7859 (0.3037) 0.9103 (0.4138) 0.9280 (0.2889) 0.9935 (0.4648) 1.0116 (0.3134) λ 0.8 0.6263 (0.0037) 0.6266 (0.0004) 0.8259 (0.1047) 0.8113 (0.0833) -0.2401 (0.0218) -0.2407 (0.0150) -0.2356 (0.0231) -0.2371 (0.0165) - - 0.6267 (4.4465 × 10−16 ) 0.6267 (4.4465 × 10−16 ) 0.6744 (0.0496) 0.6735 (0.0343) m log L |λ| -0.3288 (0.0258) -0.3288 (0.0180) runctr Note: Regression I corresponds to the conventional regression without social interactions. Regression II and II allow for interactions through social relations. Regression II imposes a sufficient condition on λ, assumes equilibrium uniqueness, and uses the method of contraction mapping iteration for equilibrium computation. Regression III does not impose restrictions on the interaction intensity, λ, and uses the homotopy continuation method for equilibrium computation. |λ| is the upper bound on the intensity of social interactions, corresponding to the sufficient condition for contraction mapping in Yang and Lee(2014). runctr denotes the proportion of groups in which that condition is violated under the true parameter values. 224 Table C.2: Binary Choice II: Estimation Comparison for Moderate Interactions True parameters I II III IV β0 0 0.0298 (0.0741) -0.0010 (0.0483) -0.0011 (0.0484) 0.0014 (0.0493) β1 1 1.0322 (0.0976) 1.0219 (0.1004) 1.0219 (0.1005) 1.0241 (0.0979) β2 1 1.2152 (0.3356) 0.9906 (0.2666) 0.9894 (0.2678) 1.0175 (0.2848) λ 0.2 0.1962 (0.0294) 0.1962 (0.0294) 0.1707 (0.0695) -0.4594 (0.0246) -0.4594 (0.0246) -0.4620 (0.0264) m log L -0.4804 (0.0247) |λ| 0.3133 (1.6758 × 10−16 ) 0 (0) runctr ne 1 (0) 0 (0) rm n be 1 (0) 0 (0) rbm Note: In each simulation, the number of independent groups is G = 100. The population of every group is n = 5. Regression I corresponds to the conventional regression without social interactions. Regressions II, III, and IV take social interactions into account. Regressions II and III assumes equilibrium uniqueness. Regression II restricts the interaction intensity to satisfy the sufficient condition in Yang and Lee(2014) and uses contraction mapping iterations to solve for the equilibrium. Regression III does not restrict the interaction intensity and computes the equilibrium through solving nonlinear equations by the Newton’s method. Regression IV allows for equilibrium multiplicity, uses the homotopy continuation method to solve for the equilibrium set, and select an equilibrium according to the expected total utilities to complete the model. |λ| stands for the upper bound on interaction intensity which ensures equilibrium uniqueness in a sample according to Yang and Lee(2014). runctr represents the proportion of groups which violate that sufficient condition. ne and n be refer to respectively the average sample and estimated number of equilibria in a group. rm and rbm report the sample and estimated proportion of groups with multiple equilibria respectively. Numbers in parentheses are standard deviations. 225 Table C.3: Binary Choice II: Estimation Comparison for Large Interactions True parameters I II III IV β0 0 0.0899 (0.1339) 0.0021 (0.0431) -0.0042 (0.0525) 0.0439 (0.0694) β1 1 0.4767 (0.0525) 0.5191 (0.0636) 0.5748 (0.2164) 1.0544 (0.2492) β2 1 1.1328 (0.4606) 0.6596 (0.2650) 0.6197 (0.3941) 0.9346 (0.3442) λ 0.8 0.3109 (0.0039) 0.3900 (0.0277) 0.8325 (0.0968) α 1 m log L 1.0776 (0.6221) -0.6031 (0.0200) -0.4312 (0.0349) |λ| -0.3989 (0.0477) -0.1218 (0.0235) 0.3133 (2.2232 × 10−16 ) 1 (0) runctr ne 2.2918 (0.0784) 0.7811 (0.0407) rm n be 2.2487 (0.1716) 0.7642 (0.1015) rbm Note: In each simulation, the number of independent groups is G = 100. The population of every group is n = 5. Regression I corresponds to the conventional regression without social interactions. Regressions II, III, and IV take social interactions into account. Regressions II and III assumes equilibrium uniqueness. Regression II restricts the interaction intensity to satisfy the sufficient condition in Yang and Lee(2014) and uses contraction mapping iterations to solve for the equilibrium. Regression III does not restrict the interaction intensity and computes the equilibrium through solving nonlinear equations by the Newton’s method. Regression IV allows for equilibrium multiplicity, uses the homotopy continuation method to solve for the equilibrium set, and select an equilibrium according to the expected total utilities to complete the model. |λ| stands for the upper bound on interaction intensity which ensures equilibrium uniqueness in a sample according to Yang and Lee(2014). runctr represents the proportion of groups which violate that sufficient condition. ne and n be refer to respectively the average sample and estimated number of equilibria in a group. rm and rbm report the sample and estimated proportion of groups with multiple equilibria respectively. Numbers in parentheses are standard deviations. 226 The Haar Basis Functions 5 4 3 2 τ7 τ τ3 τ0 1 0 −1 τ1 τ5 τ2 τ6 −2 τ4 −3 −4 −5 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 Figure C.1: The Haar Basis Functions 227 0.8 0.9 1 n=2, u=1 n=5, u=1 10 n=10, u=1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 10 2 5 0 0 0 −2 −5 −5 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −10 −15 −10 −5 0 tpsi 5 10 15 Figure C.2: Equilibrium Illustration for Binary Choice I with Influences from Peers A Note: In this figure, “tpsi” refers to the expected total outcomes in a group of symmetric members for the Binary Choice Model I with influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and homogeneous individual utility, u, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, u = 2; n = 5, u = 2; and n = 10, u = 2. 228 n=2, max u=2, min u=1 n=5, max u=2, min u=1 10 n=10, max u=2, min u=1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 10 2 5 0 0 0 −2 −5 −5 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −10 −15 −10 −5 0 tpsi 5 10 15 Figure C.3: Equilibrium Illustration for Binary Choice I with Influences from Peers B Note: In this figure, “tpsi” refers to the expected total outcomes in a group of asymmetric members for the Binary Choice Model I with Influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and heterogeneous individual utilities, ui ’s, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, maxi ui = 2, mini ui = 1; n = 5, maxi ui = 2, mini ui = 1; and n = 10, maxi ui = 2, mini ui = 1. 229 n=2, u= −2 n=5, u= −2 8 n=10, u= −2 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 6 20 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 15 4 10 2 5 5 0 0 0 −2 −5 −4 −5 −10 −6 −8 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −15 −15 −10 −5 0 tpsi 5 10 15 Figure C.4: Equilibrium Illustration for Binary Choice I with Influences from Peers C Note: In this figure, “tpsi” refers to the expected total outcomes in a group of symmetric members for the Binary Choice Model I with influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and homogeneous individual utility, u, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, u = −2; n = 5, u = −2; and n = 10, u = −2. 230 n=2, max u= −2, min u= −3 n=5, max u= −2, min u= −3 8 n=10, max u= −2, min u= −3 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 6 20 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 15 4 10 2 5 5 0 0 0 −2 −5 −4 −5 −10 −6 −8 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −15 −15 −10 −5 0 tpsi 5 10 15 Figure C.5: Equilibrium Illustration for Binary Choice I with Influences from Peers D Note: In this figure, “tpsi” refers to the expected total outcomes in a group of asymmetric members for the Binary Choice Model I with Influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and heterogeneous individual utilities, ui ’s, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, maxi ui = −2, mini ui = −3; n = 5, maxi ui = −2, mini ui = −3; and n = 10, maxi ui = −2, mini ui = −3. 231 Equilibria as the Interaction Intensity Increases 1 0.8 0.6 0.4 ne rm 0.2 ru 0 −0.2 0 0.1 0.2 0.3 0.4 0.5 λ 0.6 0.7 0.8 0.9 1 Figure C.6: Equilibrium Illustration for Binary Choice I with General Social Relations Note: This figure shows the features of the equilibrium set in Binary Choice Model I as the interaction intensity, λ, increases, for a sample of G = 100 groups with homogeneous group population n = 5. ne represents the average number of equilibria of the sample. rm is the ratio of groups with more than one equilibria. ru stands for the proportion of groups whose social relation matrix does not satisfy the sufficient condition for contraction mapping in Yang and Lee(2014). 232 Equilibrium Outcomes as Interaction Intensity Increases 1 0.8 0.6 0.4 0.2 me 0 r1 r0 −0.2 0 0.1 0.2 0.3 0.4 0.5 λ 0.6 0.7 0.8 0.9 1 Figure C.7: Equilibrium Outcomes for Binary Choice I with General Social Relations Note: This figure shows the features of the equilibrium outcomes in Binary Choice Model I as the interaction intensity, λ, increases, for a sample of G = 100 groups with homogeneous group population n = 5. me , r1 , and r0 , respectively represent the average expected (individual) outcomes, the ratio of agents who choose “1”, and the ratio of agents who choose “0”, of the unique equilibrium. 233 n=2, u=n+1 n=5, u=n+1 10 n=10, u=n+1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 2 10 0 0 5 −2 −5 0 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −5 −15 −10 −5 0 tpsi 5 10 15 Figure C.8: Equilibrium Illustration for Binary Choice II with Influences from Peers A Note: In this figure, “tpsi” refers to the expected total outcomes in a group of symmetric members for the Binary Choice Model II with influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and homogeneous individual utility, u, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, u = n + 1; n = 5, u = n + 1; and n = 10, u = n + 1. 234 n=2, max u=n+3, min u=n+1 n=5, max u=n+3, min u=n+1 10 n=10, max u=n+3, min u=n+1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 2 10 0 0 5 −2 −5 0 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −5 −15 −10 −5 0 tpsi 5 10 15 Figure C.9: Equilibrium Illustration for Binary Choice II with Influences from Peers B Note: In this figure, “tpsi” refers to the expected total outcomes in a group of asymmetric members for the Binary Choice Model II with Influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and heterogeneous individual utilities, ui ’s, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, maxi ui = n + 3, mini ui = n + 1; n = 5, maxi ui = n + 3, mini ui = n + 1; and n = 10, maxi ui = n + 3, mini ui = n + 1. 235 n=2, u=1 n=5, u=1 10 n=10, u=1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 10 2 5 0 0 0 −2 −5 −5 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −10 −15 −10 −5 0 tpsi 5 10 15 Figure C.10: Equilibrium Illustration for Binary Choice II with Influences from Peers C Note: In this figure, “tpsi” refers to the expected total outcomes in a group of symmetric members for the Binary Choice Model II with influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and homogeneous individual utility, u, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, u = 1; n = 5, u = 1; and n = 10, u = 1. 236 n=2, max u=2, min u=1 n=5, max u=2, min u=1 10 n=10, max u=2, min u=1 15 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 8 25 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 10 λ=0 λ=0.2 λ=0.4 λ=0.6 λ=0.8 λ=1 20 6 15 4 5 10 2 5 0 0 0 −2 −5 −5 −4 −6 −6 −4 −2 0 tpsi 2 4 6 −10 −10 −5 0 tpsi 5 10 −10 −15 −10 −5 0 tpsi 5 10 15 Figure C.11: Equilibrium Illustration for Binary Choice II with Influences from Peers D Note: In this figure, “tpsi” refers to the expected total outcomes in a group of asymmetric members for the Binary Choice Model II with Influences from peers. For each value of λ, an equilibrium expected total outcome is a zero of a nonlinear function, whose graph is depicted as a curve. The characteristics of the equilibrium set may differ as the group population, n, and heterogeneous individual utilities, ui ’s, vary. The left, middle, and right diagrams respectively correspond to three cases: n = 2, maxi ui = 2, mini ui = 1; n = 5, maxi ui = 2, mini ui = 1; and n = 10, maxi ui = 2, mini ui = 1. 237 3 ne rm r 2.5 u 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 λ 0.6 0.7 0.8 0.9 1 Figure C.12: Equilibrium Illustration for Binary Choice II with General Social Relations Note: This figure shows the features of the equilibrium set in Binary Choice Model II as the interaction intensity, λ, increases, for a sample of G = 100 groups with homogeneous group population n = 5. ne represents the average number of equilibria of the sample. rm is the ratio of groups with more than one equilibria. ru stands for the proportion of groups whose social relation matrix does not satisfy the sufficient condition for contraction mapping in Yang and Lee(2014). 238 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 me my −0.6 rp rm −0.8 −1 0 0.1 0.2 0.3 0.4 0.5 λ 0.6 0.7 0.8 0.9 1 Figure C.13: Equilibrium Outcomes for Binary Choice II with General Social Relations Note: This figure shows the features of the equilibrium outcomes in Binary Choice Model II as the interaction intensity, λ, increases, for a sample of G = 100 groups with homogeneous group population n = 5. me , my , rp , and rm , respectively represent the average expected (individual) outcomes, the average individual choices, the ratio of agents who choose “1”, and the ratio of agents who choose “-1”, of the unique/selected equilibrium. 239 4 (r,0) (0,r) (−r,0) (0,−r) S(a(1,0)) S(a(0,1)) S(a(−1,0)) S(a(0,−1)) 3 2 m*2 1 0 −1 −2 −3 −4 −4 −3 −2 −1 0 m*1 1 2 3 4 Figure C.14: Homotopic Mappings on Sphere for Binary Choices 240
© Copyright 2025 Paperzz