9 *Francesco Vidoli **Claudio Mazziotta Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications Key words: Spatial Data Analysis, Composite Indicators Disaggregation, Chow-Lin approach. Abstract The present paper aims to verify a statistical procedure that allows for the transition from aggregate indicators to disaggregate indicators, whereas in this case, the terms aggregate/disaggregate refer to territorial areas and not to categories. More specifically, an attempt was made to reconstruct the indicators of the infrastructural endowment at the provincial level (in Italy) on the basis of analogous indicators that are available at the regional level. For such purpose following Chow-Lin’s approach, we test some coavriates representing the territorial demand for infrastructure, related to productive, demographic and tourist aspects. The comparison between the results obtained through the application of the mentioned approach and the “real” data (available at the provincial level) in effect allows for the verification of the coherence among factors of demand and the real distribution of the infrastructure in Italy. Introduction The present paper aims to verify if and to what extent the methods of spatial disaggregation, in the form derived from Chow-Lin’s approach, allow for the reconstruction in an appreciably precise way of the infrastructural index at the disaggregate level on the basis of the corresponding indicator at the superior territorial level. This essentially means verifying if the infrastructural levels of territorial units (the Italian provinces) may be “explained” by what should be their natural factors of generation, “demand”, in particular the factors of demographic and economic nature. More specifically, having obtained an estimate of the provincial indicators of infrastructures on the basis of the aforementioned methods, such indicators are then compared with the “real” ones, available on the basis of the statistics provided by ISTAT (2006). As soon as this comparison presents a good approximation between the “real” data and the estimated ones, it might deduce that the infrastructural provincial endowment conforms to corresponding demographic and economic factors of the demand; otherwise opposite conclusions shall be drawn in case of mismatch between the two data series. * Francesco Vidoli, “Roma Tre” University, Department of Public Istitutions, Economy and Society (fvidoli@sose.it) ** Claudio Mazziotta, “Roma Tre” University, Department of Public Istitutions, Economy and Society (c.mazziotta@uniroma3.it) 10 Francesco Vidoli and Claudio Mazziotta THE METHODOLOGICAL APPROACH In order to derive indicators at the disaggregate level from the ones at the aggregate level1 we propose a model based on the approach presented by Chow-Lin (1971), that is founded on three fundamental hypotheses: • • • Structural similarity: the aggregate model and the disaggregate model are structurally similar. This implies that the relationships between the variables observed at the aggregate level are the same as those at the disaggregate level, that is the regression parameters in both models remain the same. Error similarity: the spatially correlated errors present the same structure both at the aggregate level and the disaggregate level; that is to say that the spatial correlations are not significantly different. Reliable indicators: the interpolating variables have sufficiently large predictive power at both the aggregate and disaggregate level, or R2 (or the F test) of the regression model significantly differs from zero. To be noted that violation of hypothesis 1 leads to attaining systematically biased estimates; violating hypothesis 2, instead, involves spillover effects that largely contribute to the estimates violating hypothesis 3 implies that the disaggregate estimates mirror the simple proportion of the aggregate ones. Such models have been prevalently used for the construction of monthly or quarterly series, starting with the annual series, but, in recent years, these have been used in some applications in the spatial field as well. As for Italy, it is worth mentioning the work of Bollino and Polinori (2007), where they present the reconstruction of the Value Added at the municipal level in Umbria from the point of view of the convergence between suburban townships and urban municipalities that benefit from a higher growth, explained by contiguity factors and by agglomeration mechanisms. The model is characterized both by an econometric relationship between the composite indicator at the provincial level and a series of explanatory variables observable at the disaggregate level (and also obviously at the aggregate one), and by a methodology of inferring unknown parameters. Mazziotta and Vidoli (2009b) tested a first application of the model to infrastructural data. The model is based on the assumption that at the disaggregate level a linear econometric relationship is valid: yd=Xd βd+εd (1) where: yd represents a vector (n*1) of observations of the composite indicator at the disaggregate level, Xd is a matrix (n*k) of observations of k explanatory variables observable at the disaggregate level and n is the number of provinces. 1 For the sake of clarity, the term “aggregate indicators” refers to an aggregate territorial measure (for example the average value at regional level) of a disaggregate indicator (at provincial level) . “Aggregate” should not be confused with the term “composite”, where the latter is used to describe a summary measure of simple indicators always with the same territorial level. Formally, the relationship (for example, through the operator mean) between the aggregate indicator with disaggregated ones can be described as: n ∑ Idi i=1 Ia= , ∀i∊I (a) n where I(a) represents the territorial area (region) in which the individual units (provinces) are included. Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications 11 It is assumed that C is a matrix of dimension (n*N), where n is the number of the Italian provinces, capable of transforming the disaggregate observations into aggregate ones; such transformation may obviously be obtained through any operator. In particular, if you choose the sum operator, regional estimates are obtained for comparison with the corresponding provincial values (ya =∑ yd) and the generic element Ci,j will be constructed as: Ci,j = 1, if province i ∈ region j 0, elsewhere If you choose the arithmetic mean operator, on the contrary, C should be built as: 1 _ , if province i ∈ region j, where k = number provinces belonging to the region j Ci,j = k 0 , elsewhere and regional estimates will be reconstructed through the average of provincial estimation2 (ya =E (yd)). Therefore, assuming the hypotheses of structural similarity (βd = β̂a), it is possible to write: ya =Xa βd + εa (2) Under the following aggregation constraints ya = Cyd , Xa = CXd and εa=Cεd . Recently Polasek and Sellner (2008) presented an advancement, or rather a very interesting generalization on the model, introducing a spatial auto-correlation term3 into a classical multivariate regression (equation (2)). From an application point of view, this means that the value of the dependent variable to a specific area depends not only on its own independent variables, but also by the level of the variable in the neighbouring areas. Indeed, assuming that effects of spatial correlation exist in competitive levels among the provinces, but also and especially within the very similar provinces, then (see for example Anselin, 1988), given a matrix of spatial weights WN and a spatial lag parameter ρ∊[0,1], it is possible to hypothesize at the disaggregate level a “mixed regressive spatial auto regressive” relationship: yd = ρd Wn yd + Xd βd + εd with εd ~ N [0,σ d2 IN] (3) The reduced form of equation (3) lets us better appreciate the spatial component through which the contribution of Xd has been filtered. yd = (I - ρd WN) -1 Xd βd + (I - ρd WN) -1 εd (4) 2 In this specific case, the arithmetic average is used. 3 For a preliminary introduction of statistical applications in the field of urban planning, please see: Bickman et al. (1998) and for more recent approaches Ayuga-Téllez et al. (2011). 12 Francesco Vidoli and Claudio Mazziotta More specifically, such spatial filter is applied proportionally to the distance, in fact, if the expression (I - ρd WN) -1 is developed in series (similarly to an inverse matrix of Leontief), the following is obtained: E (yd | Xd) = (1+ρd WN + ρ d2 W N2 + ....) Xd βd (5) From equation (5), it is noted more easily how all the contiguous areas are involved in the estimate of yd and that this occurs through a coefficient proportional to the distance (distance decay). It is possible, therefore, to rewrite the reduced form of equation (4) with RN = (I - ρd WN). yd = RN-1 Xd βd + RN-1εd , εd ∊ N [0,∑d] (6) with the ∑d matrix of the covariance equal to: ∑d = σ 2d (R'N RN)-1 (7) The unknown terms of the models at the disaggregate level are therefore, the ρd, βd and the σ 2d covariance. To estimate these unknown, it is possible, according to the basic hypotheses, to exploit the relationship between y and X at the aggregate level and to estimate the mixed autoregressive model (see, for example LeSage, 1998) at the aggregate level, in the form: ya = ρa WN ya + CXd βa + εa , εa ~ N [0,∑ 2a IN] (8) obtaining ρ̂a and σ̂ 2a . As for structural similarity (ρd = ρ̂a , βd = β̂a) and error similarity (σ 2d = σ̂ a2 ) , hypotheses, it is possible to substitute the estimated parameters in equations (3) and (6). Regarding the estimate of βa as Chow-Lin’s classic method, the following is obtained4: β̂a, GLS = (X'a (C∑ˆ d C') -1) Xa) -1 X'a (C∑ˆ d C') -1 ya (9) and the estimate of yd at the disaggregate level can be constructed as: yˆd= RN-1 Xd β̂a 1° term + ˆ -1 C' X β̂ ) ∑ˆ d C' (C∑ˆ d C') -1 (ya - C R N a a 2° term (10) The first term of equation (10) therefore represents the naïve estimate of the unknown vector yd, meanwhile the second term of the equation represents the estimate error distribution at the aggregate level and it’s named “gain projection matrix” G (Goldberger, 1962): G = ∑ˆ d C' (C∑ˆ d C') -1 (11) 4 Please note that β̂ a, GLS are not dependent by σ̂ 2a , but rather by ρ̂a. Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications 13 This amount crucially depends on the spatial lag parameter ρ̂a at the aggregate level; note that if ρ̂a = 0, the ∑ˆ d matrix is equal to the matrix identity and it is reduced to the projection matrix transpose: G = C' (CC') -1 as in the base model provided in equation (1); the ρd parameter and the WN matrix therefore let not the 1/N part of the residual at the aggregate level be assigned equally to all the municipalities; instead it is filtered through the spatial weights matrix. An application to the infrastructure indicators The application5 of the model illustrated above assumes the availability of the following information: i) synthetic infrastructural indicators at the regional level (and provincial, for the subsequent verification of the model’s accuracy); ii) demographic and economic variables correlated with infrastructural needs, at the provincial and regional level. The first ones are derived from the application of a particular method of synthesizing elementary indicators (source: ISTAT) applied by the authors in a previous work (Mazziotta and Vidoli, 2009a)6. The second ones are reported in Table 1. Table 1 Generation factors of infrastructure endowment - used sources) Variable Source Year Gross domestic product Ist. Tagliacarne 2007 Share of population residing in municipalities with more than 50 thousand inhabitants ISTAT 2007 Accommodation capacity of low to mid range hotels per inhabitant ISTAT 2006 On the basis of such data, we estimate a spatial simultaneous autoregressive lag and mixed model at the aggregate level (regional) ya = ρa WN ya + CXd βa + εa in order to obtain a good level of efficiency, according to hypothesis n°3 of the reliable indicators. The selection of variables in the model has been made following two paths: firstly an economic criteria, choosing dimensions logically related with the dependent variable and secondly a statistic criteria, choosing a model with a good predictive capability and statistically significant. 5 The analysis has been developed in R (package spdep). Code R, written by the authors, is available, on request, by the authors. 6 ISTAT indicators refer to the publication “Atlante statistico territoriale delle infrastrutture”, ISTAT 2008, available at http://www3.istat.it/dati/catalogo/20080805_01/. The logic that links territorial development and infrastructure indicators lies in the idea that, in view of regional analysis (see, in particular, the approach of the regional development potential prepared by Biehl, 1994), differences in the starting points weigh heavily the growth opportunities in a specific area. Among these differences, the infrastructural one have a great importance, for its characteristics of (relatively) immovable property factor, which affects the efficiency of production processes. A better provision of public capital increases productivity and lowers acquisition costs of private production factors (i.e., private capital and skilled labour), making them more profitable and thus increasing the probability of attracting them or keep them in a given region. 14 Francesco Vidoli and Claudio Mazziotta Table 2 Estimated model results7 at the regional level,* ρ = 0,37, R2 = 0,43, AIC= -26.497 Estimation Std. Error8 z value Pr(>|z|) Gross domestic product 7.893E-06 0.000 2.1024 0.0355 Share of the population residing in municipalities with more than 50 thousand inhabitants 3.916E-01 0.181 2.1587 0.0309 Accommodation capacity of low to mid range hotels -1.182E-02 per inhabitant 0.005 -2.2568 0.0240 Variable * See equation (8) The estimated regression shows that the model is satisfactory (Table 2 shows the results for the coefficients of independent variables that are all significant, the value of ρ equal to 0.37; p-value 0.043 and goodness of fit estimated by R2 equal to 0.43) both for the meaning of the variables included as regressors (proxy of the economic development, of demographic density and of the supply of qualified tourism) and for the verified statistical properties. Having once obtained β̂ a , and the parameter ρ̂a , thanks to the hypothesis of structural similarity and of error similarity, such parameters have been substituted in the equation (10) for the purpose of obtaining the estimated infrastructural indicator at the disaggregate level. The results obtained indicate a considerable gap when compared to the “real” data available at the provincial level; in other words it is deduced both from the graphic examination of the distributions (Figure 1) and from the application of a specific indicator of spatial robustness (called IRS, see Table 3). As far as the latter, it involves a spatial robustness indicator applied to the ranks (in this case, the positions held by individual provinces on the bases of infrastructural indicators – the “real” ones and those resulting from the model) that varies from 0 to 1 and is created in such a way as to highlight not only the average differences of rank, but also, among the units territorially identified, which differences have manifested themselves. For greater detail, it is worthwhile to refer to a previous work (Mazziotta and Vidoli, 2009b), but here it is sufficient considering that such indicator is the result of the product of two matrices: the first (contiguity matrix, W) identifies the territorial contiguity of the units (the provinces, in our case) among them; the second (transition matrix, T) highlights the differences in the ranks and from which unit and towards which other unit this difference has manifested itself. Multiplying the I-W matrix by T yields an index that, comparing the ranks held by the province in the two situations considered (“real” infrastructural indicator and the calculated ones using the model), highlights changes in the ranks that only involved units spatially not contiguous. The indicator of spatial robustness (IRS) – please see the Appendix - between two ranking distributions, in algebric form, may be expressed as follows: IRSR ,R = 0 1 ∑i,j Ti,j (1-wi,j) MaxI It should be noted that the maximum of the proposed index (MaxI) equals the worst situation from the perspective of conformity of the two rankings, that is to say the one in which the unit i that was in the first position in the R0 ranking finds itself in the last position in the R1, ranking and so on, for as many non contiguous units (n-1) * (n-2) * …, or as many times as there is a value greater than zero in the matrix T(I-W). 7 Please note that R2 value for a spatial lag regression (or a spatial error regression) is not defined in the same way as for ordinary least squares. We report also AIC value, that we use to compare different models (please note that AIC can be negative and the most negative one is the best). 8 Numerical Hessian approximate standard errors. Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications 15 In Table 3 and Figure 1, there are many marked differences between the two rankings, given that the IRS is equal to 0.38 with an average change in the ranks among different areas (in this specific case, regions) that is particularly high (25). Table 3 Index of spatial robustness (IRS) of “real” indicators vs the results from the model IRS N° of extra-area ranking shifts Extra-area mean ranking difference 0.384 75 25 Figure 1 Ranking of Italian provinces, based on “real” indicators (on the left) and “calculated” indicators (on the right). CONCLUSIONS The objective of the work was to recreate the infrastructural indicators (of land transportation, in this specific case), at a level of high territorial detail (provincial) counting on the availability of the same indicators at a higher territorial level (regions) and proceeding with their disaggregation through the application of a model derived from the Chow-Lin approach, applied according to the version supplemented by Polasek and Sellner. Because of the used variables in the model application at the provincial level, it may be sustained that the obtained results tend to identify the levels of infrastructure that the factors of demand in each province require. Actually, the use in such models of socio-economic variables like regressors (infrastructure demand factors), confers to the comparison of the two rankings – the one created on the basis of “real” provincial infrastructural data and the one “recreated” with the model – the meaning of comparison 16 Francesco Vidoli and Claudio Mazziotta between the supply and demand of infrastructures at the territorially disaggregate level. Obviously, this meaning is founded as the model presents a higher level of statistical fitness. In our application, the used goodness-of-fit measures indicate that the model results can be estimated statistically satisfactory, considering the cross-section application and the territorial units. A better (or larger) selection of variables would be an important improvement in the model application. Currently, in any case, a considerable gap between the estimated levels and those assumed to be “real” may be interpreted, even conservatively, as confirmation of mismatch existing between the demand and the supply of infrastructures expressed by the territory. And this seems, ultimately analysis, to be the result that is statistically more evident and economically more interesting: the territorial distribution of the infrastructural endowment of transport is not in line with the “theoretic” factors of generation that are present in the Italian provinces. References Anselin L. (1988) Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dordrecht. Ayuga-Téllez, E., Contato-Carol, M., González, C., Grande-Ortiz, M., and Velázquez, J. (2011). ”Applying Multivariate Data Analysis as Objective Method for Calculating the Location Index for Use in Urban Tree Appraisal.” J. Urban Plann. Dev., 137(3), 230–237. Bickman L., Rog D.J, (1998), “Handbook of applied social research methods”,eds. Sage Publications. Biehl, D., (1994). “The role of infrastructure in regional policy”. OECD, Working Party No. 6, Regional Development Policies, Paris. Bollino C. A., Polinori P. (2007) Ricostruzione del valore aggiunto su scala comunale e percorsi di crescita a livello micro-territoriale: il caso dell’Umbria, Rivista di Scienze regionali, fascicolo 2. Chow G. C., Lin, A. (1971) Best linear unbiased interpolation, distribution, and extrapolation of time series by related series, The Rev. of Economics and Statistics, 53(4): 372-375. Goldberger A. S. (1962) Best linear unbiased prediction in the generalized linear regression model, American Statistical Association J., 57: 369-375. ISTAT. (2006) Le infrastrutture in Italia. Un’analisi provinciale della dotazione e della funzionalità, Roma. LeSage J. P. (1998) Spatial econometrics, Technical report, University of Toledo. Mazziotta C., Vidoli F. (2009a) La costruzione di un indicatore sintetico ponderato. Un’applicazione della procedura Benefit of Doubt al caso della dotazione infrastrutturale in Italia, Italian J. of Regional Science, vol 8, n°1, Franco Angeli. Mazziotta C, Vidoli F. (2009b) Robustezza e stabilità spaziale di indicatori di dotazione infrastrutturale: una verifica per le province italiane, XXX Conferenza Italiana di Scienze Regionali, Firenze. Polasek W., Sellner R. (2008) Spatial Chow-Lin methods: Bayesian and ML forecast comparisons, Rimini Centre for Economic Analysis (RCEA), working paper 38-08. 17 Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications AppendiX Spatial Robustness Indicator (IRS) The main purpose of the Spatial Robustness Indicator (IRS) is to analyze the stability of the results obtained from the comparison of alternative rankings of territorial units, or the calculation of the rankings permanence within wider geographical areas. To clarify the meaning of the proposed approach, we propose an example of four units belonging to two different geographical areas that have a composite indicator whose order is equal to R0: Unit Rank A 1 B 2 C 3 D 4 Area Area1 R0 = Area2 After some changes in key assumptions for the construction of the indicator or comparisons between methods you can get two cases, R1 and R2: Unit Rank A 2 Area Unit Rank A 4 B 2 C 3 D 1 Area1 R1 = B 1 C 4 Area1 R2 = Area2 D 3 Area Area2 Both situations have a mean rank differences equals, but sorting R1 is preferable to R2 because more spatially stable (in R1 there is only one permutation within the areas, while in R2 permutations take place between different areas). A robustness indicator applied to spatial ranks should, therefore, not only highlight the mean rank differences, but also highlight where, or rather, between which units territorially identified, these differences have arisen. To achieve this purpose, always keeping in mind the example proposed, it may be introduced in the analysis two matrices. The first (W) identifies the considered units belonging to a wider geographical area (from regions to provinces, for example), or identifies the territorial unity contiguity (contiguity matrix). 18 Francesco Vidoli and Claudio Mazziotta W= A B C D A - 1 0 0 B 1 - 0 0 C 0 0 - 1 D 0 0 1 - The second matrix, defined as “transition” matrix, highlights the differences in the ranks both versus to which this difference manifested itself. The transition matrices respectively between R0 and R1 and between R0 and R2, are therefore: TR = 0,R1 A B C D A B C D A 0 1 0 0 A 0 0 0 3 B 1 0 0 0 B 0 1 0 0 C 0 0 0 1 C 0 0 1 0 D 0 0 1 0 D 3 0 0 0 TR = 0,R2 Analyzing these matrices is easy. The numbers that appear in each cell identify the intensity of the displacements between a territorial unit and the other as a result of changes to the indicators: the number 3, which you can read in the second matrix, for example, shows that the territorial unit A has lost 3 places in the transition from R0 to R2, moving from rank 1 to rank 4. Furthermore, the position of the number 3 at the intersection between the row headed to A and the column headed to D, means that the displacement of rank has affected these two territorial units: A has lost 3 positions in favour of D, which mutually earned the same 3 positions to the detriment of A. Multiplying the matrix (I-W) for T we obtain an indicator that shows the rank changes for units not belonging to the same geographical area (or that have affected units spatially non-contiguous, depending on the type of matrix W used). The spatial robustness indicator (IRS), in algebric form, can therefore be written as: IRSR ,R = 0 1 ∑i,j Ti,j (1-wi,j) maxI Spatial Composite and Disaggregate Indicators: Chow-Lin Methods and Applications 19 As regards the estimation of the maximum level of the indicator (MaxI) we must place it in the worst condition, namely one in which the unit i was in the first place goes to the last one and so on, as many times as the units that are not contiguous, (n-1)*(n-2)*..., or as many times as times it has a value greater than zero in the matrix T(I-W). In the matrix W formulation, proposed in this paper, the distances between different territorial units are unweighted, regardless of the more or less larger distance elapsing between them. Removing this limitation is possible by calculating a symmetric matrix which represents a distance matrix between territorial units (i.e. in terms of kilometres between centroids) in order to consider a greater weight for units not adjacent. P, for example, may be calculated as: P= A B C D A 0 40 60 80 B 40 0 30 45 C 60 30 0 70 D 80 45 70 0 With this further specification the spatial robustness indicator could be written as: IRS PR ,R = 0 1 ∑i,j Ti,j (1-wi,j) pi,j maxI P where (maxI ) is equal to (n-1) * (n-2) * ... as many times as it has a value greater than zero in the matrix TR(I-W)P. P
© Copyright 2024 Paperzz