asymptotic distribution of jive in a heteroskedastic iv regression with

Econometric Theory, 28, 2012, 42–86.
doi:10.1017/S0266466611000120
ASYMPTOTIC DISTRIBUTION OF
JIVE IN A HETEROSKEDASTIC IV
REGRESSION WITH MANY
INSTRUMENTS
JOHN C. CHAO
University of Maryland
NORMAN R. SWANSON
Rutgers University
JERRY A. HAUSMAN AND WHITNEY K. NEWEY
MIT
TIEMEN WOUTERSEN
Johns Hopkins University
This paper derives the limiting distributions of alternative jackknife instrumental variables (JIV) estimators and gives formulas for accompanying consistent
standard errors in the presence of heteroskedasticity and many instruments. The
asymptotic framework includes the many instrument sequence of Bekker (1994,
Econometrica 62, 657–681) and the many weak instrument sequence of Chao
and Swanson (2005, Econometrica 73, 1673–1691). We show that JIV estimators
are
√ asymptotically normal and that standard errors are consistent provided that
K n /rn → 0 as n → ∞, where K n and rn denote, respectively, the number of
instruments and the concentration parameter. This is in contrast to the asymptotic
behavior of such classical instrumental variables estimators as limited information maximum likelihood, bias-corrected two-stage least squares, and two-stage
least squares, all of which are inconsistent in the presence of heteroskedasticity,
unless K n /rn → 0. We also show that the rate of convergence and the form of the
asymptotic covariance matrix of the JIV estimators will in general depend on the
strength of the instruments as measured by the relative orders of magnitude of rn
and K n .
Earlier versions of this paper were presented at the NSF/NBER conference on weak and/or many instruments at
MIT in 2003 and at the 2004 winter meetings of the Econometric Society in San Diego, where conference participants provided many useful comments and suggestions. Particular thanks are owed to D. Ackerberg, D. Andrews,
J. Angrist, M. Caner, M. Carrasco, P. Guggenberger, J. Hahn, G. Imbens, R. Klein, N. Lott, M. Moriera, G.D.A.
Phillips, P.C.B. Phillips, J. Stock, J. Wright, two anonymous referees, and a co-editor for helpful comments and
suggestions. Address correspondence to: Whitney K. Newey, Department of Economics, MIT, E52-262D, Cambridge, MA 02142-1347, USA; e-mail: wnewey@mit.edu.
42
c Cambridge University Press 2011
JIVE WITH HETEROSKEDASTICITY
43
1. INTRODUCTION
It has long been known that the two-stage least squares (2SLS) estimator is biased
with many instruments (see, e.g., Sawa, 1968; Phillips, 1983; and the references
cited therein). In large part because of this problem, various approaches have
been proposed in the literature to reduce the bias of the 2SLS estimator. In recent
years, there has been interest in developing procedures that use “delete-one” fitted
values in lieu of the usual first-stage ordinary least squares fitted values as the instruments employed in the second stage of the estimation. A number of different
versions of these estimators, referred to as jackknife instrumental variables (JIV)
estimators, have been proposed and analyzed by Phillips and Hale (1977), Angrist, Imbens, and Krueger (1999), Blomquist and Dahlberg (1999), Ackerberg
and Devereux (2009), Davidson and MacKinnon (2006), and Hausman, Newey,
Woutersen, Chao, and Swanson (2007).
The JIV estimators are consistent with many instruments and heteroskedasticity
of unknown form, whereas other estimators, including limited information maximum likelihood (LIML) and bias-corrected 2SLS (B2SLS) estimators are not
(see, e.g., Bekker and van der Ploeg, 2005; Ackerberg and Devereux, 2009; Chao
and Swanson, 2006; Hausman et al., 2007). The main objective of this paper is
to develop asymptotic theory for the JIV estimators in a setting that includes the
many instrument sequence of Kunitomo (1980), Morimune (1983), and Bekker
(1994) and the many weak instrument sequence of Chao and Swanson (2005). To
be precise,
√ we show that JIV estimators are consistent and asymptotically normal
when K n /rn → 0 as n → ∞, where K n and rn denote the number of instruments
and the so-called concentration parameter, respectively. In contrast, consistency
of LIML and B2SLS generally requires that Krnn → 0 as n → ∞, meaning that the
number of instruments is small relative to the identification strength. We show that
both the rate of convergence of the JIV estimator and the form of its asymptotic
covariance matrix depend on how weak the available instruments are, as measured
by the relative order of magnitude of rn vis-à-vis K n . We also show consistency
of the standard errors under heteroskedasticity and many instruments.
Hausman et al. (2007) also consider a jackknife form of LIML that is slightly
more difficult to compute but is asymptotically efficient relative to JIV under
many weak instruments and homoskedasticity. With heteroskedasticity, any of
the estimators may outperform the others, as shown by Monte Carlo examples
in Hausman et al. Hausman et al. also propose a jackknife version of the Fuller
(1977) estimator that has fewer outliers.
This paper is a substantially altered and revised version of Chao and Swanson
(2004), in which we now allow for the many instrument sequence of Kunitomo
(1980), Morimune (1983), and Bekker (1994). In the process of showing the
asymptotic normality of JIV, this paper gives a central limit theorem for quadratic
(and, more generally, bilinear) forms associated with an idempotent matrix. This
theorem can be used to study estimators other than JIV. For example, it has already
been used in Hausman et al. (2007) to derive the asymptotic properties of the
44
JOHN C. CHAO ET AL.
jackknife versions of the LIML and Fuller (1977) estimators and in Chao,
Hausman, Newey, Swanson, and Woutersen (2010) to derive a moment-based
test.
The rest of the paper is organized as follows. Section 2 sets up the model and
describes the estimators and standard errors. Section 3 lays out the framework for
the asymptotic theory and presents the main results of our paper. Section 4 comments on the implications of these results and concludes. All proofs are gathered
in the Appendixes.
2. THE MODEL AND ESTIMATORS
The model we consider is given by
y = X
δ0 + ε ,
n×G G×1 n×1
n×1
X = ϒ + U,
where n is the number of observations, G is the number of right-hand-side variables, ϒ is the reduced form matrix, and U is the disturbance matrix. For the
asymptotic approximations, the elements of ϒ will implicitly be allowed to
depend on n, although we suppress the dependence of ϒ on n for notational
convenience. Estimation of δ0 will be based on an n × K matrix, Z , of instrumental variable observations with rank(Z ) = K . Let Z = (ϒ, Z ) and assume that
E[ε|Z] = 0 and E[U |Z] = 0.
This model allows for ϒ to be a linear combination of Z (i.e., ϒ = Z π, for
some K × G matrix π). Furthermore, some columns of X may be exogenous,
with the corresponding column of U being zero. The model also allows for Z
to approximate the reduced form. For example, let X i , ϒi , and Z i denote the
ith row (observation) for X, ϒ, and Z , respectively. We could let ϒi = f 0 (wi )
be a vector of unknown functions of a vector wi of underlying instruments and
let Z i = ( p1K (wi ), . . . , p K K (wi )) for approximating functions pk K (w), such as
power series or splines. In this case, linear combinations of Z i may approximate
the unknown reduced form (e.g., Newey, 1990).
To describe the estimators, let P = Z (Z Z )−1 Z and Pij denote the (i, j)th ele¯ −i = (Z Z − Z i Z )−1 (Z X − Z i X ) be the reduced
ment of P. Additionally, let i
i
form coefficients obtained by regressing X on Z using all observations except the
ith. The JIV1 estimator of Phillips and Hale (1977) is obtained as
δ̃ =
n
∑
i=1
−1
¯ −i Z i X i
n
∑ ¯ −i Z i yi .
i=1
Using standard results on recursive residuals, it follows that
¯ −i Z i = X Z (Z Z )−1 Z i − Pii X i
(1 − Pii ) = ∑ Pij X j /(1 − Pii ).
j=i
JIVE WITH HETEROSKEDASTICITY
45
Then, we have that
δ̃ = H̃ −1 ∑ X i Pij (1 − Pj j )−1 y j ,
i= j
H̃ =
∑ X i Pij (1 − Pj j )−1 X j ,
i= j
where i= j denotes the double sum ∑i ∑ j=i . The JIV2 estimator proposed by
Angrist et al. (1999), JIVE2, has a similar form, except that −i = (Z Z )−1
¯ −i . It is given by
(Z X − Z i X i ) is used in place of δ̂ = Ĥ −1 ∑ X i Pij y j ,
i= j
Ĥ =
∑ X i Pij X j .
i= j
To explain why JIV2 is a consistent estimator, it is helpful to consider JIV2 as
a minimizer of an objective function. As usual, the limit of the minimizer will be
the minimizer of the limit under appropriate regularity conditions. We focus on δ̂
to simplify the discussion. The estimator δ̂ satisfies δ̂ = arg minδ Q̂(δ), where
Q̂(δ) =
∑ ( yi − X i δ)Pij ( yj − X j δ).
i= j
Note that the difference between the 2SLS objective function ( y − X δ)P( y −
n
X δ) and Q̂(δ) is ∑i=1
Pii ( yi − X i δ)2 . This is a weighted least squares object
that is a source of bias in 2SLS because its expectation is not minimized at δ0
when X i and εi are correlated. This object does not vanish asymptotically relative
to E[ Q̂(δ)] under many (or many weak) instruments, leading to inconsistency of
2SLS. When observations are mutually independent, the inconsistency is caused
by this term, so removing it to form Q̂(δ) makes δ̂ consistent.
To explain further, consider the JIV2 objective function Q̂(δ). Note that for
Ũi (δ) = εi − Ui (δ − δ0 )
Q̂(δ) = Q̂ 1 (δ) + Q̂ 2 (δ) + Q̂ 3 (δ),
Q̂ 2 (δ) = −2 ∑
i= j
Ũi (δ)Pij ϒ j (δ − δ0 ),
Q̂ 1 (δ) =
∑ (δ − δ0 ) ϒi Pij ϒj (δ − δ0 ),
i= j
Q̂ 3 (δ) =
∑ Ũi (δ)PijŨ j (δ).
i= j
Then by the assumptions E[Ũi (δ)] = 0 and independence of observations,
we have E[ Q̂(δ)|Z] = Q 1 (δ). Under the regularity conditions in Section 3, ∑i= j
ϒi Pij ϒ j is positive definite asymptotically, so Q 1 (δ) is minimized at δ0 . Thus,
the expectation Q 1 (δ) of Q̂(δ) is minimized at the true parameter δ0 ; in the
terminology of Han and Phillips (2006), the many instrument “noise” term in
the expected objective function is identically zero.
For consistency of δ̂, it is also necessary that the stochastic components of Q̂(δ)
do not dominate asymptotically. The size of Q̂ 1 (δ) (for δ = δ0 ) is proportional to
the concentration parameter that we denote by rn . It√turns out that Q̂ 2 (δ) has size
smaller than Q̂ 1 (δ) asymptotically but Q̂ 3 (δ) is O p ( K n ) (Lemma A1 shows that
the variance of Q̂ 3 (δ) is proportional to K n ). Thus, to ensure that the expectation
46
JOHN C. CHAO ET AL.
of
√ Q̂(δ) dominates the stochastic part of Q̂(δ), it suffices to impose the restriction
K n /rn → 0, which we do throughout the asymptotic theory. This condition was
formulated in Chao and Swanson (2005).
The estimators δ̃ and δ̂ are consistent and asymptotically normal
√ with heteroskedasticity under the regularity conditions we impose, including K n /rn →
0. In contrast, consistency of LIML and Fuller (1977) require K n /rn → 0 when
Pii is asymptotically correlated with E[X i εi |Z]/E[εi2 |Z], as discussed in Chao
and Swanson (2004) and Hausman et al. (2007). This condition is also required
for consistency of the bias-corrected 2SLS estimator of Donald and Newey (2001)
when Pii is asymptotically correlated with E[X i εi |Z], as discussed in Ackerberg
and Devereux (2009). Thus, JIV estimators are robust to heteroskedasticity and
many instruments (when K n grows as fast as rn ), whereas LIML, Fuller (1977),
or B2SLS estimators are not.
Hausman et al. (2007) also consider a JIV form of LIML, which is obtained by
minimizing Q̂(δ)/[( y − X δ) ( y − X δ)]. The sum of squared residuals in the denominator makes computation somewhat more complicated; however, like LIML,
it has an explicit form in terms of the smallest eigenvalue of a matrix. This JIV
form of LIML is asymptotically efficient relative to δ̂ and δ̃ under many weak
instruments and homoskedasticity. With heteroskedasticity, δ̂ and δ̃ may perform
better than this estimator, as shown by Monte Carlo examples in Hausman et al.;
they also propose a jackknife version of the Fuller (1977) estimator that has fewer
outliers than the JIV form of LIML.
To motivate the form of the variance estimator for δ̂ and δ̃, note that for ξi =
(1 − Pii )−1 εi , substituting yi = X i δ0 + εi in the equation for δ̃ gives
δ̃ = δ0 + H̃ −1 ∑ X i Pij ξ j .
(1)
i= j
After appropriate normalization, the matrix H̃ −1 will converge and a central limit
theorem will apply to ∑i= j X i Pij ξ j ,which leads to a sandwich form for the asymptotic variance. Here H̃ −1 can be used to estimate the outside terms in the sandwich. The inside term, which is the variance of ∑i= j X i Pij ξ j , can be estimated
by dropping terms that are zero from the variance,
the expectation, and
removing
replacing ξi with an estimate, ξ̃i = (1 − Pii )−1 yi − X i δ̃ . Using the independence of the observations, E[εi |Z] = 0, and the exclusion of the i = j terms in
the double sums, it follows that
E ∑ X i Pij ξ j ∑ X i Pij ξ j |Z
i= j
=E
i= j
∑ ∑
i, j k ∈{i,
/ j}
Pik Pjk X i X j ξk2 + ∑ Pij2 X i ξi X j ξ j |Z .
i= j
Removing the expectation and replacing ξi with ξ̃i gives
˜ =∑
∑
i, j k ∈{i,
/ j}
Pik Pjk X i X j ξ̃k2 + ∑ Pij2 X i ξ̃i X j ξ̃ j .
i= j
JIVE WITH HETEROSKEDASTICITY
47
The estimator of the asymptotic variance of δ̃ is then given by
˜ H̃ −1 .
Ṽ = H̃ −1 This estimator is robust to heteroskedasticity, as it allows Var(ξi |Z) and E[X i ξi |Z]
to vary over i.
A vectorized form of Ṽ is easier to compute. Note that for X̃ i = X i /(1 − Pii ),
we have H̃ = X P X̃ − ∑i X i Pii X̃ i . Also, let X̄ = P X, Z̃ = Z (Z Z )−1 , and Z i
and Z̃ i equal the ith row of Z and Z̃ , respectively. Then, as shown in the proof of
Theorem 4, we have
˜ =
n
∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ξ̂i2
i=1
K
+∑
K
n
∑ ∑ Z̃ ik Z̃ i X i ξ̂i
k=1 =1
n
∑ Z jk Z j X j ξ̂j
i=1
.
j=1
This formula can be computed quickly by software with fast vector operations,
even when n is large.
An asymptotic variance estimator for δ̂ can be formed in an analogous way.
Note that Ĥ = X P X − ∑i X i Pii X i . Also for ε̂i = yi − X i δ̂, we can estimate the
middle matrix of the sandwich by
ˆ =
n
∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ε̂i2
i=1
K
+∑
K
n
∑ ∑ Z̃ ik Z̃ i X i ε̂i
k=1 =1
i=1
n
∑ Z jk Z j X j ε̂j
.
j=1
The variance estimator for δ̂ is then given by
ˆ Ĥ −1 .
V̂ = Ĥ −1 Here Ĥ is symmetric because P is symmetric, so a transpose is not needed for
the third matrix in V̂ .
3. MANY INSTRUMENT ASYMPTOTICS
Our asymptotic theory combines the many instrument asymptotics of Kunitomo
(1980), Morimune (1983), and Bekker (1994) with the many weak instrument
asymptotics of Chao and Swanson (2005). All of our regularity conditions are
conditional on Z = (ϒ, Z ). To state the regularity conditions, let Z i , εi ,Ui , and
ϒi denote the ith row of Z , ε,U, and ϒ, respectively. Also let a.s. denote almost
surely (i.e., with probability one) and a.s.n denote a.s. for n large enough (i.e.,
with probability one for all n large enough).
48
JOHN C. CHAO ET AL.
Assumption 1. K = K n → ∞, Z includes among its columns a vector of ones,
for some C < 1, rank(Z ) = K , and Pii ≤ C, (i = 1, . . . , n) a.s.n.
In this paper, C is a generic notation for a positive constant that may be bigger
or less than 1. Hence, although in Assumption 1 C is taken to be less than 1,
in other parts of the paper it might not be. The restriction that rank(Z ) = K is a
normalization that requires excluding redundant columns from Z . It can be verified in particular cases. For instance, when wi is a continuously distributed scalar,
Z i = p K (wi ), and pk K (w) = wk−1 , it can be shown that Z Z is nonsingular with
probability one for K < n.1 The condition Pii ≤ C < 1 implies that K /n ≤ C
n
because K /n = ∑i=1
Pii /n ≤ C.
Now, let λmin (A) denote the
√ smallest eigenvalue of a symmetric matrix A and
for any matrix B, let B
= tr(B B).
√
Assumption 2. ϒi = Sn z i / n where Sn = S̃n diag (μ1n , . . . , μGn ), S̃n is G × G
and bounded, and the smallest eigenvalue of S̃n S̃n is bounded away from zero.
2
√
√
min μ jn →
Also, for each j, either μ jn = n or μ jn / n → 0, rn =
1≤ j≤G
n
√
/n ≤ C and
∞, and
K
/r
→
0.
Also,
there
is
C
>
0
such
that
z
z
∑
n
i
i
i=1
n
λmin ∑i=1
z i z i /n ≥ 1/C a.s.n.
This condition is similar to Assumption 2 of Hansen, Hausman, and Newey
(2008). It accommodates linear models where included instruments (e.g., a
constant) have fixed reduced form coefficients and excluded instruments have coefficients that can shrink as the sample size grows. A leading example of such a
model is a linear structural equation with one endogenous variable of the form
δ01 + δ0G X iG + εi ,
yi = Z i1
(2)
where Z i1 is a G 1 × 1 vector of included instruments (e.g., including a constant)
and X iG is an endogenous variable. Here the number of right-hand-side variables
is G 1 + 1 = G. Let the reduced form be partitioned conformably with δ, as ϒi =
, ϒ ) and U = (0,U ) . Here the disturbances for the reduced form for
(Z i1
iG
i
iG
Z i1 are zero because Z i1 is taken to be exogenous. Suppose that the reduced form
for X iG depends linearly on the included instrumental variables Z i1 and on an
excluded instrument z iG as in
X iG = ϒiG + UiG ,
rn /n z iG .
ϒiG = π1 Z i1 +
Here we normalize z iG so that rn determines how strongly δG is identified, and
we absorb into z iG any other terms, such as unknown coefficients. For Assump , z ) and require that the second moment matrix of z is
tion 2, we let z i = (Z i1
iG
i
bounded and bounded away from zero. This normalization allows rn to determine
the strength of identification of δG . For example, if rn = n, then the coefficient on
z iG does not shrink, which corresponds to strong identification of δG . If rn grows
√
more slowly than n, then δG will be more weakly identified. Indeed, 1/ rn will
JIVE WITH HETEROSKEDASTICITY
49
be the convergence rate for estimators of δG . We require rn → ∞ to avoid the
weak instrument setting of Staiger and Stock (1997), where δG is not asymptotically identified.
For this model, the reduced form is
I 0 I√0
Z i1
Z√
i1
=
.
ϒi =
π1 1 0 rn /n
π1 Z i1 + rn /nz iG
z iG
This reduced form is as specified in Assumption 2 with
√
√
I 0
1 ≤ j ≤ G 1,
,
μ jn = n,
μGn = rn .
S̃n =
π1 1
Note how this somewhat complicated specification is needed to accommodate
fixed reduced form coefficients for included instrumental variables and excluded
instruments with identifying power that depend on n. We have been unable to
simplify Assumption 2 while maintaining the generality needed for such important
cases.
We will not require that z iG be known, only that it be approximated by a lin , Z ) . Implicitly, Z
ear combination of the instrumental variables Z i = (Z i1
i1
i2
and z iG are allowed to depend on n. One important case is where the excluded
instrument z iG is an unknown linear combination of the instrumental variables
, Z ) . For example, the many weak instrument setting of Chao and
Z i = (Z i1
i2
Swanson (2005) is one where the reduced form is given by
√
ϒiG = π1 Z i1 + (π2 / n) Z i2
for a K − G 1 dimensional vector Z i2 of excluded instrumental variables. This
model can be folded into our framework by specifying that
K − G1,
rn = K − G 1 .
z iG = π2 Z i2
Assumption 2 will then require that
2
/n = (K − G 1 )−1 ∑(π2 Z i2 )2
∑ ziG
i
n
i
is bounded and bounded away from zero. Thus, the second moment ∑i (π2 Z i2 )2 /n
of the term in the reduced form that identifies δ0G must grow linearly
√ in K , just
as in Chao and Swanson (2005), leading to a convergence rate of 1/ K − G 1 =
√
1/ rn .
In another important case, the excluded instrument z iG could be an unknown
function that can be approximated by a linear combination of Z i . For instance,
suppose that z iG = f 0 (wi ) for an unknown function f 0 (wi ) of variables wi . In this
def
case, the instrumental variables could include a vector p K (wi ) = ( p1K (wi ), . . . ,
p K −G 1 ,K (wi )) of approximating functions, such as polynomials or splines. Here
50
JOHN C. CHAO ET AL.
, p K (w ) ) . For r = n,
the vector of instrumental variables would be Z i = (Z i1
i
n
this example is like Newey (1990) where Z i includes approximating functions for
the reduced form but the number of instruments can grow as fast as the sample
size. Alternatively, if rn /n → 0, it is a modified version where δG is more weakly
identified.
Assumption 2 also allows for multiple endogenous variables with a different
strength of identification for each one, i.e., for different convergence rates. In the
preceding example, we maintained the scalar endogenous variable for simplicity.
The rn can be thought of as a version of the concentration parameter; it
determines the convergence rate of estimators of δ0G just as the concentration
√
parameter does in other settings. For rn = n, the convergence rate will be n
where Assumptions 1 and 2 permit K to grow as fast as the sample size. This corresponds to a many instrument asymptotic approximation like Kunitomo (1980),
Morimune (1983), and Bekker (1994).√For rn growing more slowly than n, the
convergence rate will be slower than 1/ n, which leads to an asymptotic approximation like that of Chao and Swanson (2005).
Assumption 3. There is a constant, C, such that conditional on Z = (ϒ, Z ),
the observations (ε1 ,U1 ), . . . , (εn ,Un ) are independent, with E[εi |Z] = 0 for all
i, E[Ui |Z] = 0 for all i, supi E[εi2 |Z] < C, and supi E[
Ui 2 |Z] ≤ C, a.s.
In other words, Assumption 3 requires the second conditional moments of the
disturbances to be bounded.
2
n z
Assumption 4. There is a π K such that ∑i=1
i − π K Z i /n → 0 a.s.
This condition allows an unknown reduced form that is approximated by a
linear combination of the instrumental variables. These four assumptions give the
consistency result presented in Theorem 1.
−1/2 Sn (δ̃ −
THEOREM 1. Suppose that Assumptions 1–4 are satisfied. Then, rn
p
p
p
p
−1/2 δ0 ) → 0, δ̃ → δ0 , rn
Sn (δ̂ − δ0 ) → 0, and δ̂ → δ0 .
The following additional condition is useful for establishing asymptotic normality and the consistency of the asymptotic variance.
n z 4
Assumption 5. There is a constant, C > 0, such that ∑i=1
/n 2 → 0,
i
4
4
supi E[εi |Z] < C, and supi E[
Ui |Z] ≤ C a.s.
To give asymptotic normality results, we need to describe the asymptotic variances. We will outline results that do not depend on the convergence of various
moment matrices, so we writethe asymptotic
variances as a function of n (rather
than as a limit). Let σi2 = E εi2 |Z where, for notational simplicity, we have
suppressed the possible dependence of σi2 on Z. Moreover, let
H̄n =
n
∑ zi zi /n,
i=1
¯n =
n
∑ zi zi σi2 /n,
i=1
JIVE WITH HETEROSKEDASTICITY
51
¯ n = Sn−1 ∑ Pij2 E[Ui Ui |Z]σ j2 (1 − Pj j )−2
i= j
Hn =
+ E[Ui εi |Z](1 − Pii )−1 E[ε j U j |Z](1 − Pj j )−1 Sn−1 ,
n
∑ (1 − Pii )zi zi /n,
i=1
n =
n
∑ (1 − Pii )2 zi zi σi2 /n,
i=1
n = Sn−1 ∑ Pij2 E[Ui Ui |Z]σ j2 + E[Ui εi |Z]E[ε j U j |Z] Sn−1 .
i= j
When K /rn is bounded, the conditional asymptotic variance given Z of Sn (δ̃ −δ0 )
is
¯ n +
¯ n ) H̄n−1 ,
V̄n = H̄n−1 (
and the conditional asymptotic variance of Sn (δ̂ − δ0 ) is
Vn = Hn−1 (n + n )Hn−1 .
To state our asymptotic normality results, let A1/2 denote a square root matrix
for a positive semidefinite matrix A, satisfying A1/2 A1/2 = A. Also, for nonsingular A, let A−1/2 = (A1/2 )−1 .
THEOREM 2. Suppose that Assumptions 1–5 are satisfied, σi2 ≥ C > 0 a.s.,
and K /rn is bounded. Then V̄n and Vn are nonsingular a.s.n, and
d
V̄n−1/2 Sn (δ̃ − δ0 ) → N (0, IG ),
d
Vn−1/2 Sn (δ̂ − δ0 ) → N (0, IG ).
The entire Sn matrix in Assumption 2 determines the convergence rate of the
estimators, where
Sn (δ̂ − δ0 ) = diag (μ1n , . . . , μGn ) S̃n (δ̂ − δ0 )
is asymptotically normal. The convergence rate of the linear combination ej S̃n (δ̂ −
δ0 ) will be 1/μ jn , where e j is the jth unit vector. Note that
yi = X i δ0 + u i = z i diag (μ1n , . . . , μGn ) S̃n δ0 + Ui δ0 + εi .
The expression following the second equality is the reduced form for yi . Thus,
the linear combination of structural parameters ej S̃n δ0 is the jth reduced form
√ coefficient for yi that corresponds to the variable μ jn / n z ij . This reduced form
coefficient is estimated at the rate 1/μ jn by the linear combination ej S̃n δ̂ of the
√
instrumental variables (IV) estimator δ̂. The minimum rate is 1/ rn , which is the
inverse square root of the rate of growth of the concentration parameter. These
rates will change when K grows faster than rn .
52
JOHN C. CHAO ET AL.
The rate of convergence in Theorem 2 corresponds to the rate found by Stock
and Yogo (2005) for LIML, Fuller’s modified LIML, and B2SLS when rn grows
at the same rate as K and more slowly than n under homoskedasticity.
¯ n in the asymptotic variance of δ̃ and the term n in the asymptotic
The term variance of δ̂ account for the presence of many instruments. The order of these
terms is K /rn , so if K /rn → 0, dropping these terms does not affect the asymptotic variance. When K /rn is bounded but does not go to zero, these terms have
the same order as the other terms, and it is important to account for their presence
in the standard errors. If K /rn → ∞, then these terms dominate and slow down
the convergence rate√of the estimators. In this case, the conditional asymptotic
variance given Z of rn /K Sn (δ̃ − δ0 ) is
¯ n H̄n−1 ,
V̄n∗ = H̄n−1 (rn /K )
and the conditional asymptotic variance of
√
rn /K Sn (δ̂ − δ0 ) is
Vn∗ = Hn−1 (rn /K )n Hn−1 .
When K /rn → ∞, the (conditional) asymptotic variance matrices, V̄n∗ and Vn∗ ,
may be singular, especially when some components of X i are exogenous or when
different identification strengths are present. To allow for this singularity, our
asymptotic normality results are stated in terms of a linear combination of the
estimator. Let L n be a sequence of × G matrices.
THEOREM 3. Suppose that Assumptions 1–5 are satisfied
and K
/rn → ∞.
If L n is bounded and there is a C > 0 such that λmin L n V̄n∗ L n ≥ C a.s.n
then
−1/2 d
L n rn /K Sn (δ̃ − δ0 ) → N (0, I ).
L n V̄n∗ L n
Also, if there is a C > 0 such that λmin L n Vn∗ L n ≥ C a.s.n, then
L n Vn∗ L n
−1/2
Ln
d
rn /K Sn (δ̂ − δ0 ) → N (0, I ).
√
Here the convergence rate is related to the size of
rn /K Sn . In the simple
√
case where δ is a scalar, we can take Sn = √rn , which gives a convergence rate of
√
K /rn . Then the theorem states that rn / K (δ̃ − δ0 ) is asymptotically normal.
√
It is interesting that K /rn → 0 is a condition for consistency in this setting and
also in the context of Theorem 1.
From Theorems 2 and 3, it is clear that the rates of convergence of both JIV
estimators depend in general on the strength of the available instruments relative
to their number, as reflected in the relative orders of magnitude of rn vis-à-vis K .
Note also that, whenever rn grows
√ at a slower rate than n, the rate of convergence
is slower than the conventional n rate of convergence. In this case, the available
JIVE WITH HETEROSKEDASTICITY
53
instruments are weaker than assumed in the conventional strongly identified case,
where the concentration parameter is taken to grow at the rate n.
When Pii = Z i (Z Z )−1 Z i goes to zero uniformly in i, the asymptotic variances
n
of the two JIV estimators will get close in large samples. Because ∑i=1
Pii =
tr(P) = K , Pii goes to zero when K grows more slowly than n, though precise
conditions for this convergence depend on the nature of Z i . As a practical matter,
Pii will generally be very close to zero in applications where K is very small
relative to n, making the jackknife estimators very close to each other.
Under homoskedasticity, we can compare the asymptotic variances of the two
JIV estimators. In this case, the asymptotic variance of δ̃ is
V̄n1 = σ 2 H̄n−1 ,
V̄n = V̄n1 + V̄n2 ,
V̄n2 = Sn−1 σ 2 E[Ui Ui ] ∑ Pij2 /(1− Pj j )2 Sn−1
i= j
+ Sn−1 E[Ui εi ]E[Ui εi ]Sn−1
∑ Pij2 (1 − Pii )−1 (1 − Pj j )−1 .
i= j
Also, the asymptotic variance of δ̂ is
Vn = Vn1 + Vn2 ,
Vn2
=
Sn−1
Vn1
=σ
2
Hn−1
n
∑ (1 − Pii )
i=1
σ
2
E[Ui Ui ] + E[Ui εi ]E[Ui εi ]
2
z i z i /n
Hn−1 ,
Sn−1 ∑ Pij2 .
i= j
By the fact that (1 − Pii )−1 > 1, we have that V̄n2 ≥ Vn2 in the positive semidefinite sense. Also, note that Vn1 is the variance of an IV estimator with instruments
z i (1 − Pii ) whereas V̄n1 is the variance of the corresponding least squares estimator, so V̄n1 ≤ Vn1 . Thus, it appears that in general we cannot rank the asymptotic
variances of the two estimators.
Next, we turn to results pertaining to the consistency of the asymptotic variance
estimators and to the use of these estimators in hypothesis testing. We impose the
following additional conditions.
Assumption 6. There exist πn and C > 0 such that a.s. maxi≤n z i −πn Z i
→ 0
and supi z i ≤ C.
The next result shows that our estimators of the asymptotic variance are
consistent after normalization.
THEOREM 4. Suppose that Assumptions 1–6 are satisfied. If K /rn is bounded,
p
p
then Sn Ṽ Sn − V̄n → 0 and Sn V̂ Sn − Vn → 0. Also, if K /rn → ∞, then
p
p
rn Sn Ṽ Sn /K − V̄n∗ → 0 and rn Sn V̂ Sn /K − Vn∗ → 0.
A primary use of asymptotic variance estimators is conducting approximate
inference concerning coefficients. To that end, we introduce Theorem 5.
54
JOHN C. CHAO ET AL.
THEOREM 5. Suppose that Assumptions 1–6 are satisfied and that a(δ) is an
× 1 vector of functions such that
(i) a(δ) is continuously differentiable in a neighborhood of δ0 ;
(ii) there is a square matrix, Bn , such that for A = ∂a(δ0 )/∂δ , Bn ASn−1 is
bounded; and
p
(iii) for any δ̄k → δ0 , (k = 1, . . . , ) and Ā = [∂a1 (δ̄)/∂δ, . . . , ∂a (δ̄)/∂δ] , we
p
have Bn ( Ā − A)Sn−1 → 0.
Also suppose that there is C > 0 such that λmin (Bn ASn−1 V̄n Sn−1 A Bn ) ≥ C
if K /rn is bounded or λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) ≥ C if K /rn → ∞ a.s.n. Then
for à = ∂a(δ̃)/∂δ,
d
( Ã Ṽ Ã )−1/2 a(δ̃) − a(δ0 ) → N (0, I ).
If there is C ≥ 0 such that λmin (Bn ASn−1 V̄n Sn−1 A Bn ) ≥ C if K /rn is bounded or
λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) ≥ C if K /rn → ∞ a.s.n, then for  = ∂a(δ̂)/∂δ,
d
( Â V̂ Â )−1/2 a(δ̂) − a(δ0 ) → N (0, I ).
Perhaps the most important special case of this result is a single linear combination. This case will lead to t-statistics based on the consistent variance estimator
having the usual standard normal limiting distribution. The following result considers such a case.
COROLLARY 1. Suppose that Assumptions 1–6 are satisfied and c and bn are
such that bn c Sn−1 is bounded. If there is a C > 0 such that bn2 c Sn−1 V̄n Sn−1 c ≥ C
if K /rn is bounded or bn2 c Sn−1 V̄n∗ Sn−1 c ≥ C if K /rn → ∞ a.s.n, then
c (δ̃ − δ0 ) d
→ N (0, 1).
c Ṽ c
Also if there is a C ≥ 0 such that bn2 c Sn−1 Vn Sn−1 c ≥ C if K /rn is bounded or
bn2 c Sn−1 Vn∗ Sn−1 c ≥ C if K /rn → ∞ a.s.n, then
c (δ̂ − δ0 ) d
→ N (0, 1).
c V̂ c
To show how the conditions of this result can be checked, we return to the
previous example with one right-hand-side endogenous variable. The following
result gives primitive conditions in that example for the conclusion of Corollary
1, i.e., for the asymptotic normality of a t-ratio.
COROLLARY 2. If equation (2) holds, Assumptions 1–6 are satisfied for z i =
, z ), c = 0 is a constant vector, either
(Z i1
iG
JIVE WITH HETEROSKEDASTICITY
55
(i) rn = n or
(ii) K /rn is bounded and (−π1 , 1)c = 0 or
2 |Z] is bounded away from zero, and the
(iii) K /rn → ∞, (−π1 , 1)c = 0, E[UiG
sign of E[εi UiG |Z] is constant a.s., then
c (δ̂ − δ0 ) d
c (δ̃ − δ0 ) d
→ N (0, 1), → N (0, 1).
c Ṽ c
c V̂ c
The proof of this result shows how the hypotheses concerning bn in Corollary 1
can be checked. The conditions of Corollary 2 are quite primitive. We have previously described how Assumption 2 is satisfied in the model of equation (2).
Assumptions 1 and 3–6 are also quite primitive.
This result can be applied to show that t-ratios are asymptotically correct when
the many instrument robust variance estimators are used. For the coefficient δG
of the endogenous variable, note that c = eG , so (−π1 , 1)c = 1 = 0. Therefore,
2 |Z] is bounded away from zero and the sign of E[ε U |Z] is constant, it
if E[UiG
i iG
follows from Corollary 2 that
δ̂G − δ0G d
→ N (0, 1).
V̂GG
Thus, the t-ratio for the coefficient of the endogenous variable is asymptotically
correct across a wide range of different growth rates for rn and K . The analogous
result holds for each coefficient δ j , j ≤ G 1 , of an included instrument as long
as π1 j = 0 is not zero. If π1 j = 0, then the asymptotics are more complicated.
For brevity, we will not discuss this unusual case here. The analogous results also
hold for δ̃G .
4. CONCLUDING REMARKS
In this paper, we derived limiting distribution results for two alternative JIV estimators. These estimators are both consistent and asymptotically normal in the
presence of many instruments under heteroskedasticity of unknown form. In the
same setup, LIML, 2SLS, and B2SLS are inconsistent. In the process of showing the asymptotic normality of JIV, this paper gives a central limit theorem
for quadratic (and, more generally, bilinear) forms associated with an idempotent matrix. This central limit theorem has already been used in Hausman et al.
(2007) to derive the asymptotic properties of the jackknife versions of the LIML
and Fuller (1977) estimators and in Chao et al. (2010) to derive a moment-based
test that allows for heteroskedasticity and many instruments. Moreover, this new
central limit theorem is potentially useful for other analyses involving many
instruments.
56
JOHN C. CHAO ET AL.
NOTE
1. The observations w1 , . . . , wn are distinct with probability one and therefore, by K < n, cannot
all be roots of a K th degree polynomial. It follows that for any nonzero a there must be some i with
a Z i = a p K (wi ) = 0, implying a Z Z a > 0.
REFERENCES
Abadir, K.M. & J.R. Magnus (2005) Matrix Algebra. Cambridge University Press.
Ackerberg, D.A. & P. Devereux (2009) Improved JIVE estimators for overidentified models with and
without heteroskedasticity. Review of Economics and Statistics 91, 351–362.
Angrist, J.D., G.W. Imbens, & A. Krueger (1999) Jackknife instrumental variables estimation. Journal
of Applied Econometrics 14, 57–67.
Bekker, P.A (1994) Alternative approximations to the distributions of instrumental variable estimators.
Econometrica 62, 657–681.
Bekker, P.A. & J. van der Ploeg (2005) Instrumental variable estimation based on grouped data.
Statistica Neerlandica 59, 506–508.
Billingsley, P. (1986) Probability and Measure, 2nd ed. Wiley.
Blomquist, S. and M. Dahlberg (1999) Small sample properties of LIML and jackknife IV estimators:
Experiments with weak instruments. Journal of Applied Econometrics 14, 69–88.
Chao, J.C., J.A. Hausman, W.K. Newey, N.R. Swanson, & T. Woutersen (2010) Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity. Working paper, MIT.
Chao, J.C. & N.R. Swanson (2004) Estimation and Testing Using Jackknife IV in Heteroskedastic
Regressions with Many Weak Instruments. Working paper, Rutgers University.
Chao, J.C. & N.R. Swanson (2005) Consistent estimation with a large number of weak instruments.
Econometrica 73, 1673–1692.
Chao, J.C. & N.R. Swanson (2006) Asymptotic normality of single-equation estimators for the case
with a large number of weak instruments. In D. Corbae, S.N. Durlauf, & B.E. Hansen (eds.),
Frontiers of Analysis and Applied Research: Essays in Honor of Peter C. B. Phillips, pp. 82–124.
Cambridge University Press.
Davidson, R. & J.G. MacKinnon (2006) The case against JIVE. Journal of Applied Econometrics 21,
827–833.
Donald, S.G. & W.K. Newey (2001) Choosing the number of instruments. Econometrica 69,
1161–1191.
Fuller, W.A. (1977) Some properties of a modification of the limited information estimator. Econometrica 45, 939–954.
Han, C. & P.C.B. Phillips (2006) GMM with many moment conditions. Econometrica 74, 147–192.
Hansen, C., J.A. Hausman, & W.K. Newey (2008) Estimation with many instrumental variables.
Journal of Business & Economic Statistics 26, 398–422.
Hausman, J.A., W.K. Newey, T. Woutersen, J. Chao, & N.R. Swanson (2007) IV Estimation with
Heteroskedasticity and Many Instruments. Working paper, MIT.
Kunitomo, N. (1980) Asymptotic expansions of distributions of estimators in a linear functional relationship and simultaneous equations. Journal of the American Statistical Association 75, 693–700.
Magnus, J.R. & H. Neudecker (1988) Matrix Differential Calculus with Applications in Statistics and
Econometrics. Wiley.
Morimune, K. (1983) Approximate distributions of k-class estimators when the degree of overidentifiability is large compared with the sample size. Econometrica 51, 821–841.
Newey, W.K. (1990) Efficient instrumental variable estimation of nonlinear models. Econometrica 58,
809–837.
Phillips, G.D.A. & C. Hale (1977) The bias of instrumental variable estimators of simultaneous
equation systems. International Economic Review 18, 219–228.
JIVE WITH HETEROSKEDASTICITY
57
Phillips, P.C.B. (1983) Small sample distribution theory in econometric models of simultaneous equations. In Z. Griliches & M.D. Intriligator (eds.), Handbook of Econometrics, vol. 1, pp. 449–516.
North-Holland.
Sawa, T. (1968) The exact sampling distribution of ordinary least squares and two-stage least squares
estimators. Journal of the American Statistical Association 64, 923–937.
Staiger, D. & J.H. Stock (1997) Instrumental variables regression with weak instruments. Econometrica 65, 557–586.
Stock, J.H. and M. Yogo (2005) Asymptotic distributions of instrumental variables statistics with
many weak instruments. In D.W.K. Andrews & J.H. Stock (eds.), Identification and Inference
for Econometric Models: Essays in Honor of Thomas J. Rothenberg, pp. 109–120. Cambridge
University Press.
APPENDIX A: Proofs of Theorems
We define a number of notations and abbreviations that will be used in Appendixes A
and B. Let C denote a generic positive constant and let M, CS, and T denote the Markov
inequality, the Cauchy–Schwarz inequality, and the triangle inequality, respectively. Also,
for random variables Wi , Yi , and ηi and for Z = (ϒ, Z ), let w̄i = E[Wi |Z], W̃i = Wi − w̄i ,
ȳi = E[Yi |Z], Ỹi = Yi − ȳi , η̄i = E[ηi |Z], η̃i = ηi − η̄i , ȳ = ( ȳ1 , . . . ., ȳn ) , w̄ =
(w̄1 , . . . , w̄n ) ,
μ̄Y = max | ȳi | ,
μ̄η = max |η̄i | ,
μ̄W = max |w̄i | ,
1≤i≤n
1≤i≤n
1≤i≤n
2
2
σ̄Y = max Var Yi |Z , and
σ̄W = max Var Wi |Z ,
i ≤n
i ≤n
σ̄η2 = max Var ηi |Z ,
i ≤n
where, to simplify notation, we have suppressed dependence on Z for the various quanti2 , σ̄ 2 , and σ̄ 2 ) defined previously. Furthermore,
ties (w̄i , W̃i , ȳi , Ỹi , η̄i , η̃i , μ̄W , μ̄Y , μ̄η , σ̄W
η
Y
for random variable X , define X L 2 ,Z = E X 2 |Z .
We first give four lemmas that are useful in the proofs of consistency, asymptotic normality, and consistency of the asymptotic variance estimator. We group them together here
for ease of reference because they are also used in Hausman et al. (2007).
LEMMA A1. If, conditional on Z = (ϒ, Z ), (Wi , Yi )(i = 1, . . . , n) are independent
matrix of rank K , then
a.s., Wi and
Yi are scalars, and P is a symmetric, idempotent
for w̄ = E (W1 , . . . , Wn ) |Z , ȳ = E (Y1 , . . . , Yn ) |Z , σ̄Wn = maxi≤n Var (Wi |Z)1/2 ,
2 σ̄ 2 + σ̄ 2 ȳ ȳ + σ̄ 2 w̄ w̄, there exists a
σ̄Yn = maxi≤n Var (Yi |Z)1/2 , and Dn = K σ̄W
Wn
Yn
n Yn
positive constant C such that
2
∑ Pij Wi Y j − ∑ Pij w̄i ȳ j i= j
i= j
≤ C Dn
a.s.
L 2 ,Z
Proof. Let W̃i = Wi − w̄i and Ỹi = Yi − ȳi . Note that
∑ Pij Wi Y j − ∑ Pij w̄i ȳj = ∑ Pij W̃i Ỹ j + ∑ Pij W̃i ȳj + ∑ Pij w̄i Ỹ j .
i= j
i= j
i= j
i= j
i= j
58
JOHN C. CHAO ET AL.
2 σ̄ 2 . Note that for i = j and k = , E W̃ Ỹ W̃ Ỹ |Z is zero unless i = k
Let D1n = σ̄W
i j k n Yn
and j = or i = and j = k. Then by CS and ∑ j Pij2 = Pii ,
2 E ∑i= j Pij Ỹi W̃ j |Z = ∑ ∑ Pij Pk E W̃i Ỹ j W̃k Ỹ |Z
i= j k=
=
∑ Pij2
i= j
≤ 2D1n
E[W̃i2 |Z]E[Ỹ j2 |Z] + E[W̃i Ỹi |Z]E[W̃ j Ỹ j |Z]
∑ Pij2 ≤ 2D1n ∑ Pii = 2D1n K .
i= j
i
Also, for W̃ = (W̃1 , . . . , W̃n ) , we have ∑i= j Pij W̃i ȳ j = W̃ P ȳ − ∑i Pii ȳi W̃i . By
2 I a.s., so
independence across i conditional on Z, we have E W̃ W̃ |Z ≤ σ̄W
n
n
2 ȳ P ȳ ≤ σ̄ 2 ȳ ȳ,
E[( ȳ P W̃ )2 |Z] = ȳ P E[W̃ W̃ |Z]P ȳ ≤ σ̄W
Wn
n
2
∑i Pii ȳi W̃i |Z = ∑ Pii2 E[W̃i2 |Z] ȳi2 ≤ σ̄W2 n ȳ ȳ.
E
i
Then by T we have
2
∑i= j Pij W̃i ȳ j L 2 ,Z
2
≤ ȳ P W̃ L 2 ,Z
2
+ ∑i Pii ȳi W̃i L 2 ,Z
2 ȳ ȳ
≤ C σ̄W
n
a.s. PZ .
2
≤ C σ̄Y2n w̄ w̄ a.s. The
Interchanging the roles of Yi and Wi gives ∑i= j Pij w̄i Ỹ j L 2 ,Z
n
conclusion then follows by T.
LEMMA A2. Suppose that, conditional on Z, the following conditions hold a.s.:
(i) P = P(Z) is a symmetric, idempotent matrix with rank(P) = K and Pii ≤ C < 1;
n E W W |Z
(ii) (W1n ,U1 , ε1 ), . . . , (Wnn ,Un , εn ) are independent, and Dn = ∑i=1
in in
satisfies Dn ≤ C a.s.n;
|Z = 0, E[Ui |Z] = 0, E[εi |Z] = 0, and there exists a constant C such that
(iii) E Win
E[
Ui 4 |Z] ≤ C and E[εi4 |Z] ≤ C;
n E W 4 |Z a.s.
(iv) ∑i=1
→ 0; and
in
(v) K → ∞ as n → ∞.
Then for
¯ n def
=
∑ Pij2
i= j
E[Ui Ui |Z]E[ε 2j |Z] + E[Ui εi |Z]E[ε j U j |Z] /K
and any sequences c1n and c2n depending on Z of conformable vectors with c1n ≤ C,
D c + c ¯
c2n ≤ C, and n = c1n
n 1n
2n n c2n > 1/C a.s.n, it follows that
√
n
d
−1/2
Yn = n
c1n ∑ Win + c2n ∑ Ui Pij ε j
K → N (0, 1) , a.s.;
i=1
a.s.
i= j
i.e., Pr(Yn ≤ y|Z) → ( y) for all y.
JIVE WITH HETEROSKEDASTICITY
59
Proof. The proof of Lemma A2 is long and is deferred to Appendix B.
The next two results are helpful in proving consistency of the variance estimator. They
use the same notation as Lemma A1.
LEMMA A3. If, conditional on Z , (Wi , Yi )(i = 1, . . . , n) are independent and Wi and
Yi are scalars, then there exists a positive constant C such that
2
∑i= j Pij2 Wi Y j − E ∑i= j Pij2 Wi Y j |Z L 2 ,Z
≤ C Bn
a.s.,
2 σ̄ 2 + σ̄ 2 μ̄2 + μ̄2 σ̄ 2 .
where Bn = K σ̄W
Y
W Y
W Y
Proof. Using the notation of the proof of Lemma A1, we have
∑ Pij2 Wi Y j − ∑ Pij2 w̄i ȳj = ∑ Pij2 W̃i Ỹ j + ∑ Pij2 W̃i ȳj + ∑ Pij2 w̄i Ỹ j .
i= j
i= j
i= j
i= j
i= j
As before, for i = j and k = , E W̃i Ỹ j W̃k Ỹ |Z is zero unless i = k and j = or i = and j = k. Also, Pij ≤ Pii < 1 by CS and Assumption 1, so Pij4 ≤ Pij2 . Also, ∑ j Pij2 = Pii ,
so
2 2 E W̃ Ỹ W̃ Ỹ |Z
E ∑i= j Pij2 W̃i Ỹ j |Z = ∑ ∑ Pij2 Pk
i j k i= j k=
=
∑ Pij4
i= j
E W̃i2 |Z E Ỹ j2 |Z +E W̃i Ỹi |Z E W̃ j Ỹ j |Z
2 σ̄ 2
≤ 2σ̄W
Y
∑ Pij4 ≤ 2K σ̄W2 σ̄Y2
a.s.
i= j
Also, ∑i= j Pij2 W̃i ȳ j = W̃ P̃ ȳ − ∑i Pii2 ȳi W̃i where P̃ij = Pij2 . By independence across i
2 I , so
conditional on Z, we have E[W̃ W̃ |Z] ≤ σ̄W
n n
2 ȳ P̃ 2 ȳ
E[( ȳ P̃ W̃ )2 |Z] = ȳ P̃E[W̃ W̃ |Z] P̃ ȳ ≤ σ̄W
n
2
= σ̄W
n
∑
2 μ̄2
ȳi Pik2 Pkj2 ȳ j ≤ σ̄W
Y
i, j,k
2 μ̄2
= σ̄W
Y∑
k
∑ Pik2
i
∑ Pkj2
j
2 2 μ̄2
E ∑i Pii2 ȳi W̃i |Z =∑ Pii4 E[W̃i2 |Z] ȳi2 ≤ K σ̄W
Y
∑
Pik2 Pkj2
i, j,k
2 μ̄2
2
2 2
= σ̄W
Y ∑ Pkk ≤ K σ̄W μ̄Y
a.s.,
k
a.s.
i
2
Then by T, we have ∑i= j Pij2 W̃i ȳ j 2
+ ∑i Pii2 ȳi W̃i L 2 ,Z
L 2 ,Z
L ,Z
2 2
2
2
2
≤ C K σ̄W μ̄Y a.s. Interchanging the roles of Yi and Wi gives ∑i= j Pij w̄i Ỹ j L 2 ,Z
2
≤ W̃ P̃ ȳ ≤ C K μ̄2W σ̄Y2 a.s. The conclusion then follows by T.
As a notational convention, let ∑i= j=k denote ∑i ∑ j=i ∑k ∈{i,
/ j} .
n
60
JOHN C. CHAO ET AL.
LEMMA A4. Suppose that there is C > 0 such that, conditional on Z,(W1 ,Y1 , η1 ) , . . . ,
√
√
(Wn , Yn , ηn ) are independent with E[Wi |Z] = ai / n, E[Yi |Z] = bi / n, |ai | ≤ C, |bi | ≤
C, E[ηi2 |Z] ≤ C, Var(Wi |Z) ≤ C/rn , and Var(Yi |Z) ≤ C/rn and there exists πn such
a.s.
√
that maxi≤n ai − Z πn → 0 and K /rn → 0. Then
i
∑
An = E
i= j=k
Wi Pik ηk Pkj Y j |Z = O p (1),
∑
i= j=k
p
Wi Pik ηk Pkj Y j − An → 0.
n
Proof. Given in Appendix B.
LEMMA A5. If Assumptions 1–3 are satisfied, then
(i) Sn−1 H̃ Sn−1 = ∑ z i Pij (1 − Pj j )−1 z j /n + o p (1),
i= j
√
(ii) Sn−1 ∑ X i Pij (1 − Pj j )−1 ε j = O p (1 + K /rn ),
i= j
(iii) Sn−1 Ĥ Sn−1 = ∑ z i Pij z j /n + o p (1),
i= j
√
(iv) Sn−1 ∑ X i Pij ε j = O p (1 + K /rn ).
i= j
Proof. Let ek denote the kth unit vector and apply Lemma A1 with Yi = ek Sn−1 X i =
√
−1 for some k and . By Assumption 2,
z ik / n + ek Sn−1 Ui and Wi =e Sn−1
X i (1 − Pii )
√
√
λmin (Sn ) ≥ C rn , implying Sn−1 ≤ C/ rn . Therefore a.s.
√
E[Yi |Z] = z ik / n,
Var(Yi |Z) ≤ C/rn ,
√
Var(Wi |Z) ≤ C/rn .
E[Wi |Z] = z i / n(1 − Pii ),
Note that a.s.
√
√
K σ̄Wn σ̄Yn ≤ C K /rn → 0,
σ̄Yn
√
−1/2
w̄ w̄ ≤ Crn
∑
i
σ̄Wn
−1/2
ȳ ȳ ≤ Crn
∑ zik2 /n → 0,
i
−1/2
2
−2
z i (1− Pii ) /n ≤ Crn
(1−max Pii )−2
i
∑ zi2 /n →0.
i
Because ek Sn−1 H̃ Sn−1 e = ek Sn−1 ∑i= j X i Pij X j Sn−1 e /(1 − Pj j ) = ∑i= j Yi Pij W j and
Pij w̄i ȳ j = Pij z ik z j /n(1 − Pj j ), applying Lemma A1 and the conditional version of M,
we deduce that for any υ > 0 and An = ek Sn−1 H̃ Sn−1 e − ∑i= j ek z i Pij (1 − Pj j )−1
a.s.
z j e /n ≥ υ , P (An |Z) → 0. By the dominated convergence theorem, P (An ) =
E [P (An |Z)] → 0. The preceding argument establishes the first conclusion for the (k, )th
element. Doing this for every element completes the proof of the first conclusion.
For the second conclusion, apply Lemma A1 with Yi = ek Sn−1 X i as before and Wi =
εi /(1 − Pii ). Note that w̄i = 0 and σ̄Wn ≤ C. Then by Lemma A1,
E[{ek Sn−1
∑ X i Pij (1 − Pj j )−1 ε j }2 |Z] ≤ C K /rn + C.
i= j
The conclusion then follows from the fact that E[An |Z] ≤ C implies An = O p (1).
JIVE WITH HETEROSKEDASTICITY
61
For the third conclusion, apply Lemma A1 with Yi = ek Sn−1 X i as before and Wi =
e Sn−1 X i , so a.s.
√
√
K σ̄Wn σ̄Yn ≤ C K /rn → 0,
σ̄Wn
ȳ ȳ ≤ Crn−1/2
∑ zik2 /n → 0,
√
σ̄Yn w̄ w̄ → 0.
n
The fourth conclusion follows similarly to the second conclusion.
Let H̄n = ∑i z i z i /n and Hn = ∑i (1 − Pii )z i z i /n.
LEMMA A6. If Assumptions 1–4 are satisfied, then
Sn−1 H̃ Sn−1 = H̄n + o p (1),
Sn−1 Ĥ Sn−1 = Hn + o p (1).
Proof. We use Lemma A5 and approximate the right-hand-side terms in Lemma A5 by
H̄n and Hn . Let z̄ i = ∑nj=1 Pij z j be the ith element of Pz and note that
n
∑ zi − z̄i 2 /n = (I − P)z
2 /n = tr(z (I − P)z/n)= tr[(z− Z π K n ) (I − P)(z − Z π K n )/n]
i=1
≤ tr[(z − Z π K n ) (z − Z π K n )/n] =
n
∑ zi − π K n Z i 2 /n → 0
a.s. PZ .
i=1
It follows that a.s.
−1
∑(z̄ i − z i )(1 − Pii ) z i /n ≤ ∑ z̄ i − z i (1 − Pii )−1 z i /n
i
i
2
2
≤ ∑ z̄ i − z i /n ∑ (1 − Pii )−1 z i /n → 0.
i
i
Then
∑ zi Pij (1 − Pj j )−1 z j /n = ∑ zi Pij (1 − Pj j )−1 z j /n − ∑ zi Pii (1 − Pii )−1 zi /n
i= j
i, j
=∑
i
z̄ i (1 − Pii )−1 z i /n −
i
∑ zi Pii (1 − Pii )−1 zi /n
i
= H̄n + ∑(z̄ i − z i )(1 − Pii )−1 z i /n = H̄n + oa.s. (1).
i
The first conclusion then follows from Lemma A5 and T. Also, as in the last equation, we
have
∑ zi Pij z j /n = ∑ zi Pij z j /n − ∑ Pii zi zi /n = ∑ z̄i zi /n − ∑ Pii zi zi /n
i= j
i, j
i
i
(z̄ i − z i )z i /n = Hn + oa.s. (1),
i
= Hn + ∑
i
n
so the second conclusion follows similarly to the first.
Proof of Theorem 1. First, note that by λmin Sn Sn /rn ≥ λmin S̃ S̃ ≥ C, we have
√ Sn (δ̃ − δ0 )/ rn ≥ λmin (Sn Sn /rn )1/2 δ̃ − δ0 ≥ C δ̃ − δ0 .
62
JOHN C. CHAO ET AL.
p
p
√
Therefore, Sn (δ̃ − δ0 )/ rn → 0 implies δ̃ → δ0 . Note that by Assumption 2, H̄n is
bounded and λmin ( H̄n ) ≥ C a.s.n. For H̃ from Section 2, it follows from Lemma A6 and
Assumption 2 that with probability approaching one λmin (Sn−1 H̃ Sn−1 ) ≥ C as the sample
−1
size grows. Hence Sn−1 H̃ Sn−1
= O p (1). By equation (1) and Lemma A5,
−1/2 Sn (δ̃ − δ0 ) = (Sn−1 H̃ Sn−1 )−1 Sn−1
√
p
rn = O p (1)o p (1) → 0.
∑ X i Pij ξ j /
rn
i= j
All of the previous statements are conditional on Z = (ϒ, Z ) for a given sample size n,
−1/2 Sn (δ̃ −δ0 ), we have shown that for any constant v >
so for the random variable Rn = rn
0, a.s. Pr(
Rn ≥ v|Z) → 0. Then by the dominated convergence theorem, Pr(
Rn ≥
v) = E[Pr(
Rn ≥ v|Z)] → 0. Therefore, because v is arbitrary, it follows that Rn =
p
−1/2 Sn (δ̃ − δ0 ) → 0.
rn
Next note that Pii ≤ C < 1, so in the positive semidefinite sense in large enough samples
a.s.,
Hn = ∑(1 − Pii )z i z i /n ≥ (1 − C) H̄n .
Thus, by Assumption 2, Hn is bounded and bounded away from singularity a.s.n. Then the
n
rest of the conclusion follows analogously with δ̂ replacing δ̃ and Hn replacing H̄n .
We now turn to the asymptotic normality results. In what follows, let ξi = εi when
considering the JIV2 estimator and let ξi = εi /(1 − Pii ) when considering JIV1.
Proof of Theorem 2. Define
√
Yn = ∑ z i (1 − Pii )ξi
n + Sn−1
i
∑ Ui Pij ξ j .
i= j
By Assumptions 2–4,
√ 2
n
E ∑i=1 (z i − z̄ i ) ξi / n |Z
a.s.
n
n
= ∑i=1 z i − z̄ i 2 E ξi2 |Z n ≤ C ∑i=1 z i − z̄ i 2 /n → 0.
Therefore, by M,
Sn−1
n
√
∑ X i Pij ξ j − Yn = ∑ (zi − z̄i ) ξi /
i= j
p
n → 0.
i=1
We now apply Lemma A2 to establish asymptotic normality of Yn conditional on Z. Let
n = Var (Yn |Z), so
n =
n
∑ zi zi (1 − Pii )2 E[ξi2 |Z]/n + Sn−1 ∑ Pij2
i=1
i= j
× E[Ui Ui |Z]E[ξ j2 |Z] + E[Ui ξi |Z]E[U j ξ j |Z] Sn−1 .
JIVE WITH HETEROSKEDASTICITY
Note that
63
√
rn Sn−1 is bounded by Assumption 2 and that ∑i= j Pij2 /K ≤ 1, so by bounded-
ness of K /rn and Assumption 3, it follows that n ≤ C a.s.n. Also, E[ξi2 |Z] ≥ C > 0,
so
n ≥
n
n
i=1
i=1
∑ zi zi (1 − Pii )2 E[ξi2 |Z]/n ≥ C ∑ zi zi /n.
Therefore, by Assumption 2,λmin (
n ) ≥ C > 0 a.s.n (for generic C that may be different
from before). It follows that n−1 ≤ C a.s.n.
Let α be a G × 1 nonzero vector. Let Ui be defined as in Lemma A2 and ξi be
√
−1/2
α,
defined as εi in Lemma A2. In addition, let Win = z i (1 − Pii )ξi / n, c1n = n
√ −1 −1/2
α. Note that condition (i) of Lemma A2 is satisfied. Also, by
and c2n = K Sn n
the boundedness of ∑i z i z i /n and E[ξi2 |Z] a.s.n, condition (ii) of Lemma A2 is satisfied; condition (iii) is satisfied
by Assumptions
3 and 5. Also, by (1 − Pii )−1 ≤ C
n E W 4 |Z ≤ C n z 4 /n 2 a.s.
→ 0, so condition (iv) is
and Assumption 5, ∑i=1
∑i=1 i
in
−1/2
satisfied. Finally, condition (v) is satisfied by hypothesis. Note also that c1n = n
α and
√
√
−1/2
K /rn rn Sn−1 n
α satisfy c1n ≤ C and c2n ≤ C a.s.n. This follows
c2n =
√
√
from the boundedness of K /rn , rn Sn−1 , and n−1 . Moreover, the n of Lemma A2 is
n = Var(c1n
n
∑ Win + c2n
∑ Ui Pij ξ j /
i= j
i=1
√
−1/2
K |Z) = Var(α n
Yn |Z) = α α
by construction. Then, applying Lemma A2, we have
n
√
−1/2 −1/2
d
−1/2
αα
α n
Y n = n
∑ c1n Win + c2n ∑ Ui Pij ξ j / K → N (0, 1)
−1/2
It follows that α n
a.s.
i= j
i=1
d
−1/2
Yn → N 0, α α a.s., so by the Cramér–Wold device, n
d
Yn → N (0, IG ) a.s.
Consider now the JIV1 estimator where ξi = εi /(1 − Pii ). Plugging this into the ex¯ n +
¯ n for ¯ n and ¯ n defined according to Assumption
pression for n , we find n = −1/2 −1 1/2
H̄n n
5. Let V̄n also be as defined following Assumption 5 and note that Bn = V̄n
−1/2
−1/2
is an orthogonal
matrix because Bn Bn = V̄n
= I. Also, Bn is a function of
V̄n V̄n
−1/2 1/2 only Z, V̄n
≤ C a.s.n because λmin (V̄n ) ≥ C > 0 a.s.n, and n ≤ C a.s.n.
By Lemma A6, (Sn−1 H̃ Sn−1 )−1 = H̄n−1 + o p (1). Note that if a random variable Wn sata.s.
isfies Wn ≤ C a.s.n, then Wn = O p (1) (note that 1(
Wn > C) → 0 implies that
E[1(
Wn > C)] = Pr(
Wn > C) → 0). Therefore, we have
−1/2
V̄n
(Sn−1 H̃ Sn−1 )−1 n
1/2
−1/2
Note that because n
−1/2
= V̄n
( H̄n−1 + o p (1))n
1/2
= Bn + o p (1).
d
Yn → N (0, IG ) a.s. and Bn is orthogonal to and a function
−1/2
only of Z, we have Bn n
d
Yn → N (0, IG ). Then by the Slutsky lemma and δ̃ = δ0 +
64
JOHN C. CHAO ET AL.
H̃ −1 ∑i= j X i Pij ξ j , for ξ j = (1 − Pj j )−1 ε j , we have
−1/2 −1/2 −1 −1 −1 −1 −1
Sn (δ̃ − δ0 ) = V̄n
(Sn H̃ Sn ) Sn
V̄n
−1/2
= V̄n
∑ X i Pij ξ j
i= j
(Sn−1 H̃ Sn−1 )−1 [Yn + o p (1)]
−1/2
= [Bn + o p (1)][n
−1/2
Yn +o p (1)] = Bn n
d
Yn +o p (1) → N (0, IG ),
which gives the first conclusion. The conclusion for JIV2 follows by a similar argument
n
for ξi = εi .
Proof of Theorem 3. Under the hypotheses of Theorem 3, rn /K → 0, so following
√
p
n z (1 − P )ξ /√n →
the proof of Theorem 2, we have rn /K ∑i=1
0. Then similar to
i
ii i
√
√
√
the proof of Theorem 2, for Yn = rn Sn−1 ∑i= j Ui Pij ξ j / K , we have rn /K Sn−1 ∑i= j
X i Pij ξ j = Yn + o p (1). Here let
n = Var (Yn |Z) = rn Sn−1 ∑ Pij2 E[Ui Ui |Z]E[ξ j2 |Z] + E[Ui ξi |Z]E[U j ξ j |Z] Sn−1/K .
i= j
Note that by Assumptions 2 and 3, n ≤ C a.s.n. Let L̄ n be any sequence of bounded
−1/2
matrices with λmin ( L̄ n n L̄ n ) ≥ C > 0 a.s.n and let Ȳn = L̄ n n L̄ n
L̄ n Yn . Now
let α be a nonzero vector and apply Lemma A2with Win = 0, εi = ξi , c
1n = 0, and
√
−1/2 √
L̄ n rn Sn−1 . We have Var c2n
c2n = α L̄ n n L̄ n
∑i= j Ui Pij ξ j / K |Z = α α > 0
by construction, and the other hypotheses of Lemma A2 can be verified as in the proof of
d
Theorem 2. Then by the conclusion of Lemma A2, it follows that α Ȳn → N (0, α α) a.s.
d
By the Cramér–Wold device, a.s. Ȳn → N (0, I ).
Consider now
the JIV1 estimator and let L n be specified as in the statement of the result
such that λmin L n V̄n∗ L n ≥ C > 0 a.s.n.Let L̄ n = L n H̄n−1 , so L n V̄n∗ L n = L̄ n n L̄ n .
−1/2 1/2 Note that L̄ n n L̄ n
≤ C and n ≤ C a.s.n. By Lemma A6, (Sn−1 H̃ Sn−1 )−1 =
H̄n−1 + o p (1). Therefore, we have
L̄ n n L̄ n
−1/2
Note also that
−1/2
L n (Sn−1 H̃ Sn−1 )−1 = L̄ n n L̄ n
L n ( H̄n−1 + o p (1))
−1/2
= L̄ n n L̄ n
L̄ n + o p (1).
√
rn /K Sn−1 ∑i= j X i Pij (1 − Pj j )−1 ε j = O p (1). Then we have
−1/2 L n rn /K Sn (δ̃ − δ0 )
L n V̄n∗ L n
−1/2
= L̄ n n L̄ n
L n (Sn−1 H̃ Sn−1 )−1 rn /K Sn−1
=
L̄ n n L̄ n
−1/2
∑ X i Pij (1 − Pj j )−1 ε j
i= j
d
L̄ n + o p (1) [Yn + o p (1)] = Ȳn + o p (1) → N (0, I ) .
The conclusion for JIV2 follows by a similar argument for ξi = εi .
n
JIVE WITH HETEROSKEDASTICITY
65
Next, we turn to the proof of Theorem 4. Let ξ̃i = ( yi − X i δ̃)/(1 − Pii )and ξi = εi /
(1 − Pii ) for JIV1 and ξ̂i = yi − X i δ̂ and ξi = εi for JIV2. Also, let
ˆ 1 = ∑ Ẋ i Pik ξ̂ 2 Pkj Ẋ , ˆ 2 = ∑ P 2 Ẋ i Ẋ ξ̂ 2 + Ẋ i ξ̂i ξ̂ j Ẋ ,
Ẋ i = Sn−1 X i , ij
k
j
i j
j
i= j=k
˙1 =
∑
i= j=k
Ẋ i Pik ξk2 Pkj Ẋ j ,
˙2 =
∑ Pij2
i= j
i= j
Ẋ i Ẋ i ξ j2 + Ẋ i ξi ξ j Ẋ j .
ˆ1 −
˙ 1 = o p (1) and ˆ2 −
˙2 =
LEMMA A7. If Assumptions 1–6 are satisfied, then o p (K /rn ).
Proof. To show the first conclusion, we use Lemma A4. Note that for δ̇ = δ̂ and X iP =
p
X i /(1 − Pii ) for JIV1 and δ̇ = δ̃ and X iP = X i for JIV2, we have δ̇ → δ0 and ξ̂i2 − ξi2 =
2
−2ξi X iP (δ̇ − δ0 ) + X iP (δ̇ − δ0 ) . Let ηi be any element of −2ξi X iP or X iP X iP . Note
√
√ that Sn / n is bounded, so by CS, ϒi = Sn z i / n ≤ C. Then
E[ηi2 |Z] ≤ C E[ξi2 |Z] + C E[
X i 2 |Z] ≤ C + C ϒi 2 + C E[
Ui 2 |Z] ≤ C.
ˆ n denote a sequence of random variables converging to zero in probability. By
Let Lemma A4,
ˆ
p
∑
i= j=k
Ẋ i Pik ηk Pkj Ẋ j = o p (1)O p (1) → 0.
ˆ1 −
˙ 1 is a sum of terms of the
From the preceding expression for ξ̂i2 − ξi2 , we see that form
p
ˆ 1 −
˙1 →
ˆ ∑i= j=k Ẋ i Pik ηk Pkj Ẋ , so T, 0.
j
Let di = C +|εi |+ Ui ,  = (1 + δ̂ ) for JIV1,  = (1 + δ̃ ) for JIV2, B̂ = δ̂ − δ0 for JIV1, and B̂ = δ̃ − δ0 for JIV2. By the conclusion of Theorem 1, we have  = O p (1)
p
and B̂ → 0. Also, because Pii is bounded away from 1, (1 − Pii )−1 ≤ C a.s. Hence, for
both JIV1 and JIV2,
X i ≤ C + Ui ≤ di , Ẋ i ≤ Crn−1/2 di , ξ̂i − ξi ≤ C X i (δ̂ − δ0 ) ≤ Cdi B̂,
ξ̂i ≤ C X i (δ0 − δ̂) + |ξi | ≤ Cdi Â,
2
ξ̂i − ξi2 ≤ |ξi | + ξ̂i ξ̂i − ξi ≤ Cdi (1 + Â)di B̂ ≤ Cdi2 Â B̂,
2
−1/2 2
di Â, Ẋ i ξi ≤ Crn−1/2 di2 .
Ẋ i ξ̂i − ξi ≤ Cμ−1
n di B̂, Ẋ i ξ̂i ≤ Crn
Also note that because E[di2 |Z] ≤ C,
E
∑
i= j
Pij2 di2 d j2 rn−1 | Z
≤ Crn−1 ∑ Pij2 = Crn−1 ∑ Pii = C K /rn ,
i, j
i
66
JOHN C. CHAO ET AL.
so ∑i= j Pij2 di2 d j2 rn−1 = O p (K /rn ) by M. Then it follows that
2 2
2
2
∑ Pij Ẋ i Ẋ i ξ̂ j − ξ j
≤ ∑ Pij2 Ẋ i ξ̂ j2 − ξ j2 i= j
i= j
≤ Crn−1
∑ Pij2 di2 d j2 Â B̂ = o p (K /rn ) .
i= j
We also have
2
∑ Pij Ẋ i ξ̂i ξ̂ j Ẋ j − Ẋ i ξi ξ j Ẋ j ≤ ∑ Pij2 Ẋ i ξ̂i Ẋ j ξ̂ j − ξ j i= j
i= j
+ Ẋ j ξ j X̆ i ξ̂i − ξi K
≤ Crn−1 ∑ Pij2 di2 d j2 Â B̂ = o p
.
r
n
i= j
n
The second conclusion then follows from T.
LEMMA A8. If Assumptions 1–6 are satisfied, then
˙1 =
˙2 =
∑
i= j=k
z i Pik E[ξk2 |Z]Pkj z j /n + o p (1),
∑ Pij2 zi zi E[ξ j2 |Z]/n + Sn−1 ∑ Pij2
i= j
i= j
E[Ui Ui |Z]E[ξ j2 |Z]
+ E[Ui ξi |Z]E[ξ j U j |Z] Sn−1 + o p (K /rn ).
Proof. To prove the first conclusion, apply Lemma A4 with Wi equal to an element of
Ẋ i , Y j equal to an element of Ẋ j , and ηk = ξk2 .
Next, we use Lemma A3. Note that Var(ξi2 |Z) ≤ C and rn ≤ Cn, so for u ki = ek Sn−1 Ui ,
4 + Ẋ 4 |Z]
E[( Ẋ ik Ẋ i )2 |Z] ≤ C E[ Ẋ ik
i
4 /n 2 + Eu 4 |Z + z 4 /n 2 + Eu 4 |Z ≤ C/r 2 ,
≤ C z ik
n
ki
i
i
2 2
2
2
2
E[( Ẋ ik ξi ) |Z] ≤ C E z ik ξi /n + u ki ξi |Z ≤ C/n + C/rn ≤ C/rn .
Also, if i = E[Ui Ui |Z], then E[ Ẋ i Ẋ i |Z] = z i z i /n + Sn−1 i Sn−1 and E[ Ẋ i ξi |Z] =
Sn−1 E[Ui ξi |Z]. Next let Wi be Ẋ ik Ẋ i for some k and , so
E[Wi |Z] = ek Sn−1 i Sn−1 e + z ik z i /n,
|E[Wi |Z]| ≤ C/rn ,
Var(Wi |Z) ≤ E[( Ẋ ik Ẋ i )2 |Z] ≤ C/rn2 .
Also let Yi = ξi2 and note that |E[Yi |Z]| ≤ C and Var(Wi |Z) ≤ C. Then in the notation of
Lemma A3,
√
√
√
K (σ̄Wn σ̄Yn + σ̄Wn μ̄Yn + μ̄Wn σ̄Yn ) ≤ K (C/rn + C/rn + C/rn ) ≤ C K /rn .
By the conclusion of Lemma A3, for this Wi and Yi we have
√
∑ Pij2 Ẋ ik Ẋ i ξ j2 = ek ∑ Pij2 zi zi /n + Sn−1 i Sn−1 e E[ξ j2 |Z] + O p ( K /rn ).
i= j
i= j
JIVE WITH HETEROSKEDASTICITY
67
Consider also Lemma A3 with Wi and Yi equal to Ẋ ik ξi and Ẋ i ξi , respectively, so
σ̄Wn σ̄Yn + σ̄Wn μ̄Yn + μ̄Wn σ̄Yn ≤ C/r n . Then, applying Lemma A3, we have
√
∑ Pij2 Ẋ ik ξi ξ j Ẋ j = ek Sn−1 ∑ Pij2 E[Ui ξi |Z]E[ξ j U j |Z]Sn−1 e + O p ( K /rn ).
i= j
i= j
√
Also, because K → ∞, we have O p ( K /rn ) = o p (K /rn ). The second conclusion then
n
follows by T.
Proof of Theorem 4. Note that X̄ i = ∑nj=1 Pij X j , so
n
∑ ( X̄ i X̄ i − X i Pii X̄ i − X̄ i Pii X i )ξ̂i2
i=1
n
=
=
=
n
Pik Pkj X i X j ξ̂k2 −
∑
Pik Pkj X i X j ξ̂k2 − ∑ Pii Pij X i X j ξ̂i2 − ∑ Pij Pj j X i X j ξ̂ j2 −2 ∑ Pii2 X i X i ξ̂i2
i, j,k=1
n
i, j,k=1
=
n
∑
i, j=1
i, j=1
i= j
∑
i= j=k
Pij Pj j X i X j ξ̂ j2
∑
n
i, j,k ∈{i,
/ j}
∑
Pii Pij X i X j ξ̂i2 −
∑
i= j
i=1
n
Pik Pkj X i X j ξ̂k2 −
∑ Pii2 X i X i ξ̂i2
i=1
Pik Pkj X i X j ξ̂k2 +
n
n
i= j
i=1
∑ Pij2 X i X i ξ̂ j2 − ∑ Pii2 X i X i ξ̂i2 .
Also, for Z i and Z̃ i equal to the ith row of Z and Z̃ = Z (Z Z )−1 , we have
K K
n
n
∑ ∑ ∑ Z̃ ik Z̃ i X i ξ̂i
k=1 =1
i=1
n
∑
=
=
i, j=1
n
∑
K
∑
Z jk Z j X j ξ̂ j
j=1
K
∑∑
k=1 =1
Z̃ ik Z jk Z̃ i Z j
n
X i ξ̂i ξ̂ j X j =
∑
i, j=1
( Z̃ i Z j )2 X i ξ̂i ξ̂ j X j =
i, j=1
n
∑
2
K
∑
Z̃ ik Z jk
X i ξ̂i ξ̂ j X j
k=1
Pij2 X i ξ̂i ξ̂ j X j .
i, j=1
Adding this equation to the previous one gives
ˆ =
=
∑
Pik Pkj X i X j ξ̂k2 +
∑
Pik Pkj X i X j ξ̂k2 +
i= j=k
i= j=k
n
n
i=1
i, j=1
∑ Pij2 X i X i ξ̂ j2 − ∑ Pii2 X i X i ξ̂i2 + ∑
i= j
Pij2 X i ξ̂i ξ̂ j X j
∑ Pij2 (X i X i ξ̂ j2 + X i ξ̂i ξ̂ j X j ),
i= j
which yields the
in Section 2.
equality
Let σ̇i2 = E ξi2 |Z and z̄ i = ∑ j Pij z j = ei Pz. Then following the same line of argument
as at the beginning of this proof, with z i replacing X i and σ̇k2 replacing ξ̂k2 ,
∑
i= j=k
z i Pik σ̇k2 Pkj z j /n = ∑ σ̇i2 z̄ i z̄ i − Pii z i z̄ i − Pii z̄ i z i + Pii2 z i z i
n − ∑ Pij2 z i z i σ̇ j2/n.
i
i= j
68
JOHN C. CHAO ET AL.
Also, as shown previously, Assumption 4 implies that ∑i z i − z̄ i 2 /n ≤ z (I − P)z/n →
0 a.s. Then by σ̇i2 and Pii bounded a.s. PZ , we have a.s.
2
∑ σ̇i (z̄ i z̄ i − z i z i )/n ≤ ∑ σ̇i2 (2 z i z i − z̄ i + z i − z̄ i 2 )/n
i
i
1/2 1/2
≤C
∑ zi 2 /n
∑ zi − z̄i 2 /n
i
+ C ∑ z i − z̄ i 2 /n → 0,
i
i
1/2 1/2
2
4
2
2
2
→ 0.
∑ σ̇i Pii (z i z̄ i − z i z i )/n ≤ ∑ σ̇i Pii z i /n
∑ zi − z̄i /n
i
i
i
It follows that
∑
i= j=k
z i Pik σ̇k2 Pkj z j /n = ∑ σ̇i2 (1 − Pii )2 z i z i /n −
i
∑ Pij2 zi zi σ̇ j2 /n + oa.s. (1).
i= j
It then follows from Lemmas A7 and A8 and T that
ˆ2 =
ˆ 1 +
∑
i= j=k
+ Sn−1
z i Pik σ̇k2 Pkj z j /n +
∑ Pij2
∑ Pij2 zi zi σ̇ j2 /n
i= j
E[Ui Ui |Z]σ̇ j2 + E[Ui ξi |Z]E[ξ j U j |Z] Sn−1
i= j
+ o p (1) + o p (K /rn )
= ∑ σ̇i2 (1 − Pii )2 z i z i /n
i
+ Sn−1
∑ Pij2
i= j
E[Ui Ui |Z]σ̇ j2 + E[Ui ξi |Z]E[ξ j U j |Z] Sn−1
+ o p (1) + o p (K /rn )
because n → 0. Then for JIV1, where ξi = εi /(1 − Pii ) and σ̇i2 = σi2 /(1 − Pii )2 , we have
ˆ2 =
¯ n +
¯ n + o p (1) + o p (K /rn ).
ˆ 1 +
For JIV2, where ξi = εi and σ̇i2 = σi2 , we have
ˆ 2 = n + n + o p (1) + o p (K /rn ).
ˆ 1 +
Consider the case where K /rn is bounded, implying o p (K /rn ) = o p (1). Then, because
¯ n +
¯ n , Hn−1 , and n + n are all bounded a.s.n, Lemma A6 implies
H̄n−1 , −1 −1
ˆ 2 Sn−1 H̃ Sn−1
ˆ 1 +
Sn Ṽ Sn = Sn−1 H̃ Sn−1
¯ n +
¯ n + o p (1) H̄n−1 + o p (1) = V̄n + o p (1),
= H̄n−1 + o p (1) Sn V̂ Sn = Vn + o p (1),
which gives the first conclusion.
JIVE WITH HETEROSKEDASTICITY
69
For the second result, consider the case where K /rn → ∞. Then for JIV1, where ξi =
¯ n for n sufficiently
εi /(1 − Pii ) and σ̇i2 = σi2 /(1 − Pii )2 , the almost sure boundedness of large implies that we have
ˆ 2 =(rn /K )
¯ n +(rn /K )
¯ n +(rn/K )o p (1)+ o p (1)=(rn/K )
¯ n +o p (1).
ˆ 1 +
(rn /K ) For JIV2, where ξi = εi and σ̇i2 = σi2 , we have
ˆ 2 =(rn/K )n +(rn /K )n +(rn /K )o p (1)+ o p (1)=(rn/K )n +o p (1).
ˆ 1 +
(rn /K ) ¯ n , Hn−1 , and (r/K n )n are all bounded a.s.n and by
Then by the fact that H̄n−1 , (r/K n )
Lemma A6,
−1 −1
ˆ 1 +
ˆ 2 Sn−1 H̃ Sn−1
Sn Ṽ Sn = Sn−1 H̃ Sn−1
¯ n /K n + o p (1) H̄n−1 + o p (1) = V̄n∗ + o p (1).
= H̄n−1 + o p (1) rn Similarly, Sn V̂ Sn = Vn∗ + o p (1), which gives the second conclusion.
n
Proof of Theorem 5. An expansion gives
a(δ̂) − a(δ0 ) = Ā(δ̂ − δ0 )
for Ā = ∂a(δ̄)/∂δ where δ̄ lies on the line joining δ̂ and δ0 and actually differs element by
p
p
element from a(δ). It follows from δ̂ → δ0 that δ̄ → δ0 , so by condition (iii), Bn ÂSn−1 =
Bn ASn−1 + o p (1). Then multiplying by Bn and using Theorem 4, we have
−1/2 Â V̂ Â
a(δ̂) − a(δ0 )
−1/2
Bn ĀSn−1 Sn δ̂ − δ0
= Bn ÂSn−1 Sn V̂ Sn Sn−1 Â Bn
−1/2
= Bn ASn−1 + o p (1) V̄n + o p (1) Sn−1 ABn + o p (1)
× Bn ASn−1 + o p (1) Sn δ̂ − δ0
−1/2
Bn ASn−1 Sn δ̂ − δ0 + o p (1)
= Bn ASn−1 V̄n Sn−1 A Bn
−1/2
1/2 −1/2 = Bn ASn−1 V̄n Sn−1 A Bn
Bn ASn−1 V̄n V̄n
Sn δ̂ − δ0
−1/2
+ o p (1) = Fn Fn
Fn Ȳn + o p (1)
−1/2
Sn (δ − δ0 ), where the third equality in the prefor Fn = Bn ASn−1 V̄n and Ȳn = V̄n
ceding display follows from the Slutsky theorem given the continuity of the square root
1/2
d
matrix. By Theorem 2, Ȳn → N (0, IG ). Also, from the proof of Theorem 2, it follows
that this convergence is a.s. conditional on Z. Then because L n = (Fn Fn )−1/2 Fn satisfies
L n L n = I , it follows from the Slutsky theorem and standard convergence in distribution
results that
−1/2 d
a(δ̂) − a(δ0 ) = L n Ȳn + o p (1) → N (0, I ),
 V̂ Â
giving the conclusion.
n
70
JOHN C. CHAO ET AL.
Proof of Corollary 1. Let a(δ) = c δ, so Ā = A = c . Note that condition (i) of Theorem 5 is satisfied. Let Bn = bn . Then Bn ASn−1 = bn c Sn−1 is bounded by hypothesis so
condition (ii) of Theorem 5 is satisfied. Also, Bn ( Ā − A)Sn−1 = 0, so condition (iii) of Theorem 5 is satisfied. If K /rn is bounded, then by hypothesis, λmin (Bn ASn−1 V̄n Sn−1 A Bn ) =
bn2 c Sn−1 V̄n Sn−1 c ≥ C; or if K /rn → ∞, then λmin (Bn ASn−1 V̄n∗ Sn−1 A Bn ) = bn2 c
Sn−1 V̄n∗ Sn−1 c ≥ C, which gives the first conclusion. The second conclusion follows
n
similarly.
Proof of Corollary 2. We will show the result for δ̂; the result for δ̃ follows analogously.
Let γ = limn→∞ (rn /n), so γ exists and γ ∈ {0, 1} by Assumption 2. Also,
√
√
√
√
√
√ γ I −π1
rn Sn−1 = rn S̃n−1 diag 1/ n, . . . , 1/ n, 1/ rn → R =
.
0
1
√
Consider first the case where rn = n so that γ = 1. Take bn = rn and note that bn c Sn−1 =
√
c ( rn Sn−1 ) is bounded. Also, c R = 0 because R is nonsingular and Vn ≤ C a.s.n
implying that bn2 c Sn−1 Vn Sn−1 c = c RVn R c + oa.s. (1). Also n = Sn−1 E[ ∑i= j Pij Ui ε j
∑i= j Pij Ui ε j |Z]Sn−1 is positive semidefinite, so Vn ≥ Hn−1 n Hn−1 . Also, by
Assumptions 2 and 4, there is C > 0 with λmin (Hn−1 n Hn−1 ) ≥ C a.s.n. Therefore, a.s.n,
bn2 c Sn−1 Vn Sn−1 c ≥ c R Hn−1 n Hn−1 R c + o(1) ≥ C + o(1) ≥ C.
(A.1)
The conclusion then follows from Corollary 1.
For γ = 0, let a = (−π1 , 1)c and note that c R = (0, a) = 0. If K /rn is bounded, let bn =
√
rn . Then, as before, bn c Sn−1 is bounded and equation (A.1) is satisfied, and the conclu√
√
√
sion follows. If K /rn → ∞, let bn = rn / K . Note that bn c Sn−1 = rn /K c ( rn Sn−1 )
→ 0, so bn c Sn−1 is bounded. Also, note that
√
rn Sn−1 eG = diag( rn /n, . . . , rn /n, 1)
I 0
e = eG .
−π1 1 G
Furthermore, a constant sign of E[εi UiG |Z ] implies E[εi UiG |Z ]E[ε j UjG |Z ] ≥ 0, so by
Pii ≤ C < 1,
2 |Z]σ 2 + E[ε U |Z]E[ε U |Z]
2 |Z]σ 2 /K
K ≥ ∑ Pij2 E[UiG
i iG
j jG
∑ Pij2 E[UiG
j
j
i= j
≥C
∑ Pij2 /K = C ∑ Pij2 −∑ Pii2
i= j
i, j
i
i= j
K = C 1−∑ Pii2 /K ≥ C.
Therefore, we have, a.s.,
(rn /K )n =
√
rn Sn−1 eG
.
≥ CeG eG
∑ Pij2
i= j
2 |Z]σ 2
E[UiG
j
√
+ E[εi UiG |Z]E[ε j UjG |Z] /K eG
rn Sn−1
JIVE WITH HETEROSKEDASTICITY
71
Also, Hn is a.s. bounded so that λmin (Hn−1 ) = 1/λmax (Hn ) ≥ C + oa.s. (1). It then follows
that
from c R = aeG
bn2 c Sn−1 V̄n∗ Sn−1 c = rn c Sn−1 Hn−1 (rn /K )n Hn−1 Sn−1 c ≥ Crn c Sn−1 Hn−1 eG eG
Hn−1 Sn−1 c
Hn−1 eG )2 + oa.s. (1) ≥ C + oa.s. (1).
= a 2 C(eG
n
The conclusion then follows from Corollary 1.
APPENDIX B: Proofs of Lemmas A2 and A4
We first give a series of lemmas that will be useful for the proofs of Lemmas A2 and A4.
n
LEMMA B1. Under Assumption 1 and for any subset I2 of the set (i, j)i,
j=1 and
n
4
2 2
any subset I3 of (i, j, k)i,
j,k=1 , (i) ∑ Pij ≤ K ; (ii) ∑ Pij Pjk ≤ K ; and
I
I
2
3
(iii) ∑ Pij2 Pik Pjk ≤ K , a.s.n.
I3
Proof. By Assumption 1, Z Z is nonsingular a.s.n. Also, because P is idempotent,
n
rank(P) = tr(P) = K , 0 ≤ Pii ≤ 1, and ∑ Pij2 = Pii . Therefore, a.s.n,
j=1
n
∑ Pij4 ≤ ∑
I2
∑
I3
Pij2 =
i, j=1
Pij2 Pjk2 ≤
n
∑ Pii = K ,
i=1
n
∑ ∑
j=1
n
n
∑
Pij2
i=1
Pjk2
k=1
=
n
n
j=1
j=1
∑ Pj2j ≤ ∑ Pj j = K ,
∑ Pij2 Pik Pjk ≤ ∑ Pij2 ∑ Pik Pjk ≤ ∑ Pij2 ∑ Pik2 ∑ Pjk2
I3
i, j
≤∑
i, j
k
Pij2
i, j
Pii Pj j ≤ ∑
For the next result, let Sn =
k
k
Pij2 = K .
i, j
∑
i<j<k<l
Pik Pjk Pil Pjl + Pij Pjk Pil Pkl + Pij Pik Pjl Pkl .
LEMMA B2. If Assumption 2 is satisfied, then a.s.n (i) tr (P − D)4 ≤ C K ;
(ii) ∑
Pik Pjk Pil Pjl ≤ C K ; and (iii) |Sn | ≤ C K , where D = diag(P11 , . . . , Pnn ).
i< j<k<l
Proof. To show part (i), note that
(P − D)4 = (P − PD − DP + D 2 )2 = P − PD − PDP + PD2 − PDP + PDPD + PD2 P
− PD3 − DP + DPD + DPDP − DPD2 + D 2 P − D 2 PD-D 3 P + D 4 .
for any square matrices A and B. Then,
Note
that tr(A
) = tr(A) and tr(AB) = tr(BA)
4
tr (P − D) = tr(P) − 4tr(PD) + 4tr(PD2 ) + 2tr(PDPD) − 4tr(PD3 ) + tr(D 4 ). By 0 ≤
72
JOHN C. CHAO ET AL.
Pii ≤ 1 we have D j ≤ I for any positive integer j and tr(PD j ) = tr(PD j P) ≤ tr(P) = K
a.s.n. Also, a.s.n, tr(PDPD) = tr(PDPDP) ≤ tr(PD2 P) ≤ tr(P) = K and tr(D 4 ) = ∑i Pii4 ≤
K . Therefore, by T we have tr (P − D)4 ≤ 16K , giving conclusion (i).
Next, let L be the lower triangular matrix with L ij = Pij 1(i > j). Then P = L + L + D,
so
(P − D)4 = (L + L )4 = (L 2 + LL + L L + L 2 )2
= L 4 + L 2 LL + L 2 L L + L 2 L 2 + LL L 2 + LL LL + LL L L + LL3
+ L LL2 + L LLL + L LL L + L LL2 + L 2 L 2 + L 2 LL + L 2 L L + L 4 .
Note that for positive integer j, [(L ) j ] = L j . Then using tr(AB) = tr(B A) and tr(A ) =
tr(A),
tr((P − D)4 ) = 2tr(L 4 ) + 8tr(L 3 L ) + 4tr(L 2 L 2 ) + 2tr(L LL L).
Next, compute each of the terms. Note that
tr(L 4 ) =
∑
Pij 1(i > j)Pjk 1( j > k)Pk 1(k > )Pi 1( > i) = 0,
∑
Pij 1(i > j)Pjk 1( j > k)Pk 1(k > )Pi 1(i > )
i, j,k,
tr(L 3 L ) =
i, j,k,
=
∑
Pij Pjk Pk Pi =
=
∑
Pk Pkj Pji Pi =
i< j<k<
tr L 2 L 2 =
∑
∑
Pij Pjk Pk Pi
∑
Pij Pjk Pk Pi ,
<k< j<i
i> j>k>
i< j<k<
Pij 1(i > j)Pjk 1( j > k)Pk 1( > k)Pi 1(i > )
i, j,k,
∑
=
Pij Pjk Pk Pi
i> j>k,i>>k
=
∑
∑
Pij Pjk Pk Pi +
i> j=>k
=
Pij Pjk Pkj Pji +
∑
Pij2 Pjk2 + 2
∑
Pij 1(i > j)Pjk (k > j)Pk 1(k > )Pi 1(i > )
i< j<k
∑
Pik Pk Pj Pji ,
i< j<k<
and
tr(L L L L ) =
i, j,k,
=
∑ Pij Pji Pij Pji + ∑
j<i
j<k<i
Pij Pjk Pk Pi
Pk Pki Pij Pj + Pj Pji Pik Pk
i< j<k<
∑
∑
i>> j>k
∑
i> j>k
=
Pij Pjk Pk Pi +
i> j>>k
Pij Pjk Pkj Pji
JIVE WITH HETEROSKEDASTICITY
∑
+
Pij Pjk Pkj Pji +
j<i<k
∑
Pij Pji Pi Pi
j<<i
+
∑
< j<i
∑
Pij Pji Pi Pi +
×Pij Pjk Pk Pi
=
73
∑ Pij4 + 2 ∑
i< j
< j<k<i
∑
+
Pij2 Pik2 + Pik2 Pjk2 + 4
i<j<k
+
j<<k<i
∑
∑
< j<i<k
+
∑
j<<i<k
Pik Pkj Pj Pi .
i< j<k<
Summing up gives the result tr((P − D)4 ) = 2 ∑i< j Pi4j + 4 ∑i< j<k (P 2ij Pjk2 + Pik2 Pjk2 +
Pij2 Pik2 ) + 8S n . Then by T and Lemma B1, we have
|Sn | ≤ (1/4) ∑ Pij4 + 1/2
i<j
∑
(Pij2 Pjk2 + Pik2 Pjk2 + Pij2 Pik2 ) + (1/8) tr((P − D)4 ) ≤ C K ,
i<j<k
a.s.n, thus giving part (iii). That is, Sn = O a.s. (K ).
To show part (ii), take {εi } to be a sequence of independent and identically distributed
random variables with mean 0 and variance 1 and where εi and Z are independent for all i
and n. Define the random quantities
1 = ∑ Pij Pik ε j εk + Pij Pjk εi εk + Pik Pjk εi ε j ,
i<j<k
2 =
∑
Pij Pik ε j εk + Pij Pjk εi εk ,
3 =
i<j<k
∑
Pik Pjk εi ε j .
i<j<k
Note that by Lemma A1,
E 23 |Z = E ∑i<j<k Pik Pjk εi ε j ∑<m<q Pq Pmq ε εm |Z
∑
=
Pik Pjk Pi Pj =
∑
2
(Pik )2 Pjk +2
i<j<k
i<j<{k,}
= Oa.s. (K ) + 2
∑
∑
Pik Pjk Pi Pj
i<j<k<
Pik Pjk Pi Pj .
i<j<k<
Also, note that
E 2 3 |Z = E ∑ i<j<k Pij Pik ε j εk + Pij Pjk εi εk ∑<m<q Pq Pmq ε εm |Z
=
∑
i<j<k<
Pij Pik Pj Pk +
∑
Pij Pjk Pi Pk
i<j<k<
and
E 22 |Z = E ∑i<j<k Pij Pik ε j εk + Pij Pjk εi εk
× ∑ <m<q Pm Pq εm εq + Pm Pmq ε εq |Z
74
JOHN C. CHAO ET AL.
=
∑
{i,}<j<k
∑
+
∑
Pij Pik Pj Pk +
Pij Pik Pjm Pmk +
i<j<m<k
∑
=
Pij2 Pik2 +
i<j<k
∑
∑
<i<j<k
Pij2 Pjk2 + 2
i<j<k
∑
+2
Pij Pjk Pim Pmk
i<{j,m}<k
Pij Pjk Pi Pk
∑
Pij Pik Pj Pk
i<<j<k
Pij Pjk Pim Pmk
i<j<m<k
∑
+
=
∑
i<j<k
∑
Pij Pi Pjk Pk +
i<j<k<
Pjk Pk Pij Pi
i<j<k<
Pij2 Pik2 +
∑
Pij2 Pjk2 + 2Sn = Oa.s. (K ).
i<j<k
Because 1 = 2 + 3 , it follows that E 21 |Z = E 22 |Z + E 23 |Z + 2E
2 3 |Z = O a.s. (K ) + 2S n = O a.s. (K ). Therefore, by T, the expression for E 23 |Z
given previously, and 3 = 1 − 2 ,
∑ Pik Pjk Pi Pj ≤ E 23 |Z + Oa.s. (K ) ≤ E (1 − 2 )2 |Z + Oa.s. (K )
i<j<k<
≤ 2E 21 |Z + 2E 22 |Z + Oa.s. (K ) ≤ Oa.s. (K ).
LEMMA B3. Let L be the lower triangular matrix with L ij = Pij 1(i > j). Then, under
√
1/2
Assumption 2, LL ≤ C K a.s.n, where A
= Tr A A
.
Proof. From the proof of Lemma B2 and by Lemma B1 and Lemma B2(ii), we have
a.s.n
2
LL = tr(LL LL ) = ∑ P 4 + 2 ∑ P 2 P 2 + P 2 P 2 + 4 ∑ Pik Pkj Pj Pi
ij
ij ik
ik jk
≤C
i<j
i<j<k
K + ∑ Pik Pkj Pj Pi ≤ CK.
i<j<k<
Taking square roots gives the answer.
i<j<k<
n
For Lemma B4, which follows, let φi = φi (Z) (i = 1, . . . , n) denote some sequence
of measurable functions. In applications of this lemma, we will take φi (Z) to be either
conditional variances or conditional covariances given Z. Also, to set some notation, let
2 (Z) = E[u 2 |Z], and γ = γ (Z) = E[u ε |Z], where
σi2 = σi2 (Z) = E[εi2 |Z], ωi2 = ωin
i
in
i i
i
to simplify notation we suppress the dependence of σi2 on Z and of ωi2 and γi on Z and
n. Let the following results apply.
LEMMA B4. Suppose that (a) P is a symmetric, idempotent matrix with rank (P) =
K and Pii ≤ C < 1; (b) (u 1 , ε1 ) , . . . ., (u n , εn ) are
conditional
independent
on Z; (c)
4
4
there exists a constant C such that, a.s., supi E u i |Z ≤ C, supi E εi |Z ≤ C, and
supi |φi | = supi |φi (Z)| ≤ C. Then, a.s.,
JIVE WITH HETEROSKEDASTICITY
(i) E
(ii) E
(iii) E
(iv) E
(v) E
(vi) E
75
2
1
2 φ (u ε − γ ) |Z → 0
P
∑
k
i
i
i
i<k
ki
K
|Z → 0 ;
2 2φ u 2− ω 2
|Z → 0
∑i<k Pki
k
j
j
2
| Z → 0;
∑i< j<k Pki Pkj φk u i ε j + u j εi
2 ∑i< j<k Pki Pkj φk εi ε j |Z → 0;
2 P
P
φ
u
u
∑i< j<k ki kj k i j |Z → 0.
1
2
2
2
K ∑i<k Pki φk ε j − σ j
1
K
1
K
1
K
1
K
2
Proof. To show part (i), note that
E
1
K
∑
2
i<k≤ n Pki φk u i εi − γi
2
|Z
1
4 φ 2 E u 2 ε 2 |Z − γ 2
P
∑
ki
k
i
i
i
i<k≤
n
K2
2
2 P 2 φ φ E u 2 ε 2 |Z − γ 2
+ 2 ∑1≤ i<k<l≤n Pki
li k l
i i
i
K
1
4
2
2
2
4
4
≤ 2 ∑1≤i<k≤n Pki φk
E u i |Z E εi |Z + E u i |Z E εi |Z
K
2
2
2
2
2
4
4
+ 2 ∑1≤i<k<l≤n Pki Pli |φk | |φl |
E u i |Z E εi |Z +E u i |Z E εi |Z
K
1
4 + 2
2 P 2 → 0,
≤C
P
P
∑
∑
ki
ki
li
K 2 1≤i<k≤n
K 2 1≤ i<k<l≤n
=
where the first inequality is the result of applying T and a conditional version of CS, the
second inequality follows by hypothesis, and the convergence to zero a.s. follows from
applying Lemma B1(i) and (ii). Parts (ii) and (iii) can be proved in essentially the same
way as part (i); hence, to avoid redundancy, we do not give detailed arguments for these
parts.
To show part (iv), first let L be a lower triangular matrix with (i, j)th element L ij =
Pij 1 (i > j) as in Lemma B3 and define Dγ = diag (γ1 , . . . , γn ), Dφ = diag (φ1 , . . . , φn ),
u = (u 1 , . . . , u n ) , and ε = (ε1 , . . . , εn ) . It then follows by direct multiplication that
ε L Dφ Lu − tr L Dφ L Dγ =
∑
2 φ (u ε − γ )
Pki
k i i
i
1≤i<k≤n
+
∑
1≤i< j<k ≤ n
Pki Pkj φk u i ε j + u j εi
76
JOHN C. CHAO ET AL.
so that, by making use of Loève’s cr inequality, we have that
2
1
E
P
P
φ
ε
+
u
ε
|
Z
u
∑ 1≤i<j<k≤n ki k j k i j j i
K2
2 1 ≤ 2 2 E u L Dφ Lε − tr L Dφ LDγ
|Z
K
2
1
+ 2 2 E ∑ 1≤i<k≤n P 2ki φ k u i εi − γ i
|Z .
K
(B.1)
It has already been shown in the proof of part (i) that ( 1/K 2 ) E ∑ 1≤i<k≤n
2
P 2ki φk u i εi − γ i
|Z → 0 a.s. PZ , so what remains to be shown is that 1/K 2
2 E u L Dφ Lε − tr L Dφ L Dγ
|Z → 0 a.s. PZ . To show the latter, note first that,
by straightforward calculations, we have
2
1 E
u
L
D
Lε
−
tr
L
D
L
D
|
Z
γ
φ
φ
K2
2
1 1 .
= 2 tr L Dφ L ⊗ L Dφ L E εu ⊗ εu |Z − 2 tr L Dφ L Dγ
K
K
(B.2)
Next, note that, by straightforward calculation, we have
E εu ⊗ εu |Z
⎛
⎞ ⎛
σ12 ω12 e1 e1 σ12 ω22 e1 e2 · · · σ12 ωn2 e1 en
γ 2 e1 e1 γ1 γ2 e2 e1
⎜
⎟ ⎜ 1
⎜ 2 2 2 2 ⎟ ⎜
⎜ σ ω e e σ ω e e · · · σ22 ωn2 e2 en ⎟ ⎜ γ2 γ1 e1 e2 γ22 e2 e2
= ⎜ 2 1. 2 1 2 2. 2 2
⎟+⎜
.
..
..
..
⎜
⎟ ⎜
..
..
..
.
⎝
⎠ ⎝
.
.
γn γ1 e1 en γn γ2 e2 en
σn2 ω12 en e1 σn2 ω22 en e2 · · · σn2 ωn2 en en
⎞ ⎛
⎛
γ1 ⊗ Dγ
ϑ1 e1 e1
0 ···
0
0
···
n×n
n×n
n×n
⎟
⎜
⎜ 0 ϑ e e · · ·
0 ⎟ ⎜
0
γ2 ⊗ Dγ · · ·
⎜
2 2 2
n×n ⎟ ⎜ n×n
⎜ n×n
+⎜ .
⎟+⎜
.
.
.
..
.
..
⎜
⎜ .
.. ⎟
..
..
..
.
.
⎠ ⎝
⎝ .
0
n×n
0
n×n
· · · ϑn en en
0
n×n
0
n×n
= (Dσ ⊗ In ) vec (In ) vec (In ) (Dω ⊗ In )
+ Dγ ⊗ In K nn Dγ ⊗ In + E Dϑ E + Dγ ⊗ Dγ ,
· · · γ1 γn en e1
⎞
⎟
⎟
· · · γ2 γn en e2 ⎟
⎟
.
..
⎟
..
.
⎠
2
· · · γn en en
⎞
0
n×n
⎟
0
⎟
n×n ⎟
⎟
..
⎟
.
⎠
· · · γn ⊗ D γ
(B.3)
where K nn is
ann 2 × n 2 commutation matrix such that, for any n × n matrix A, K nn
vec (A) = vec A . (See Magnus and Neudecker, 1988, pp. 46–48, for more
on commuta
tion matrices.) Also, here, Dγ = diag (γ1 , . . . ., γn ), Dσ = diag σ12 , . . . ., σn2 ,
Dω = diag ω12 , . . . ., ωn2 , Dϑ = diag (ϑ1 , . . . , ϑn ) with ϑi = E εi2 u i2 |Z − σi2 ωi2 − 2γi2
.
. .
for i = 1, . . . ., n, E = e1 ⊗ e1 ..e2 ⊗ e2 .. · · · ..en ⊗ en , and ei is the ith column of an
n × n identity matrix. It follows from (B.2) and (B.3) and by straightforward calculations
that
JIVE WITH HETEROSKEDASTICITY
77
2
1 E
u
L
D
Lε
−
tr
L
D
L
D
|
Z
γ
φ
φ
K2
2
1 1 = 2 tr L Dφ L ⊗ L Dφ L E εu ⊗ εu |Z − 2 tr L Dφ L Dγ
K
K
1
= 2 vec (In ) Dω L Dφ L Dσ ⊗ L Dφ L vec (In )
K
1 + 2 tr Dγ L Dφ L Dγ ⊗ L Dφ L K nn
K
1 1 + 2 tr L Dφ L ⊗ L Dφ L E Dϑ E + 2 tr L Dφ L Dγ ⊗ L Dφ L Dγ
K
K
2
1 − 2 tr L Dφ L Dγ
K
1 1 = 2 tr L Dφ L Dω L Dφ L Dσ + 2 tr Dγ L Dφ L Dγ ⊗ L Dφ L K nn
K
K
1 (B.4)
+ 2 tr L Dφ L ⊗ L Dφ L E Dϑ E .
K
Focusing first on the first term of (B.4), and letting ω2 = max1≤i≤n ωi2 , σ 2 =
2
max1≤i≤n σi2 , and φ = max1≤i≤n φi2 , we get
1 2 1
tr L Dφ LDω L Dφ LDσ ≤ ω2 σ 2 φ
tr L LL L
2
2
K
K
2
1 C ≤ C 2 tr L LL L = 2 L L K
K
a.s. PZ ,
(B.5)
where the first inequality follows by repeated application of CS and of the simple inequality
tr A A ≤ max λi tr A A ,
(B.6)
1≤ i ≤ n
which holds for n × n matrices A and = diag (λ1 , . . . , λn ) such that λi ≥ 0 for all i,
and where the second inequality follows in light of the assumptions of the lemma.
Turning our attention now to thesecond term of (B.4), we make use of the fact that,
for n × n matrices A and B, tr ( A ⊗ B) K nn = tr { AB} (a specialization of the
result given by Abadir and Magnus, 2005, p. 304) to obtain K −2 tr D γ L D φ LDγ ⊗
L D φ L K nn = K − 2 tr L D φ LDγ L D φ LD γ . As in (B.5), by repeated use
of CS and the inequality (B.6), we obtain
2
1 C tr Dγ L Dφ LDγ ⊗ L Dφ L K nn ≤ 2 L L 2
K
K
a.s. PZ .
(B.7)
78
JOHN C. CHAO ET AL.
Finally, to analyze the third term of (B.4), we note that
1 tr L Dφ L ⊗ L Dφ L E Dϑ E K2
≤
2
1 n
1 n
|ϑi | ei L Dφ Lei ≤ 2 ∑ |ϑi | ei L Dφ2 Lei ei L Lei
∑
2
K i=1
K i=1
≤φ
n
2
2 1
|ϑi | ei L Lei
2
K i=1
∑
≤C
2
2
1 n 1 n 1 n 2
L
Le
≤
C
P
Pe
=
C
e
e
i
i
∑
∑
∑P
K 2 i=1 i
K 2 i=1 i
K 2 i=1 ii
≤C
1 n
C
∑ Pii = K a.s. PZ ,
K 2 i=1
(B.8)
where the first inequality follows from T, the second inequality follows from CS, the third
inequality makes use of (B.6), the fourth inequality uses CS and T and follows in light of
the assumptions of the lemma, and the last inequality holds because Pii < 1.
In light of (B.4), it follows from (B.5), (B.7), and (B.8) and Lemma B3 that (1/K 2 )
2
E[(u L Dφ Lε − tr L Dφ L D γ )2 | Z ≤ 2C 1/K 2 LL + C ( 1/ K ) ≤ C/K a.s.
PZ , which shows part (iv).
It is easily seen that parts (v) and (vi) can be proved in essentially the same way as part
(iv) (by taking u i = εi ); hence, to avoid redundancy, we do not give detailed arguments
for these parts.
n
Proof of Lemma A2. Let b1n = c1n n −1/2 and b2n = c2n n −1/2 and note that these
W
are bounded in n because n is bounded away from zero by hypothesis. Let win = b1n
in
and u i = b2n Ui , where we suppress the n subscript on u i for notational convenience. Then,
√
n y , y = w + ȳ , ȳ =
Yn = w 1n + ∑i=2
∑ j<i (u j Pij εi + u i Pij ε j )/ K .
in in
in
in in
Also, E w1n 4 |Z ≤ ∑i E win 4 |Z ≤ C ∑i E Win 4 |Z → 0 a.s., so by a conditional version of M,we deduce that for any
υ > 0, P (|w1n | ≥ υ | Z) → 0. Moreover, note that supn E |P (|w1n | ≥ υ | Z)| 2 < ∞. It follows
that, by Theorem 25.12
of Billingsley (1986), P (|w1n | ≥ υ) = E P (|w1n | ≥ υ | Z) → 0 as n → ∞; i.e., w1n
p
n y + o (1).
→ 0 unconditionally. Hence, Yn = ∑i=2
p
in
d
n
Now, we will show that Yn → N (0, 1) by first showing that, conditional on Z, ∑i=2
d
,U , ε ) for i = 1, . . . , n. Define the
yin → N (0, 1), a.s. To proceed, let Xi = (Win
i i
σ -fields Fi,n = σ (X1 , . . . ., Xi ) for i = 1, . . . ., n. Note that, by construction, Fi−1,n ⊆
Fi,n . Moreover, it is straightforward to verify that, conditional on Z, {yin , Fi,n , 1 ≤ i ≤
n, n ≥ 2} is a martingale difference array, and we can apply the martingale central limit
2 (Z) = E[u 2 |Z], and γ = γ (Z) =
theorem. As before, let σi2 = E[εi2 |Z], ωi2 = ωin
i
in
i
E[u i εi |Z], where to simplify notation we suppress the dependence of σi2 on Z and of ωi2
and γi on Z and n. Now, note that E[win ȳ jn |Z] = 0 for all i and j and that
JIVE WITH HETEROSKEDASTICITY
E ( ȳin )2 |Z =
∑ ∑E
j<i k<i
=
∑ Pij2
79
(u j Pij εi + u i Pij ε j )(u k Pik εi + u i Pik εk )|Z /K
ω2j σi2 + ωi2 σ j2 + 2γi γ j
K.
j<i
Thus,
sn2 (Z) = E
∑i=2 yin
n
2
n 2 |Z + E ȳ 2 |Z
|Z = ∑ E win
in
i=2
D b − E w 2 |Z +
= b1n
n 1n
1n
∑ Pij2
i= j
ω2j σi2 + ωi2 σ j2 + 2γi γ j
K
D b + b ¯
= b1n
n 1n
2n n b2n + oa.s. (1)
−1/2 ¯ n c2n −1/2
+ oa.s. (1)
c1n Dn c1n + c2n
= n
n
−1/2
= n
−1/2
n n
+ oa.s. (1) = 1 + oa.s. (1) → 1 a.s.,
n E W W |Z and
where Dn = Dn (Z) = ∑i=1
in in
¯ n (Z) =
¯n =
∑ Pij2
i= j
E[Ui Ui |Z]E[ε 2j |Z] + E[Ui εi |Z]E[ε j U j |Z]
K.
4 |Z ≤ C n
Thus, sn2 (Z) is bounded and bounded away from zero a.s. Also, ∑i=2 E yin
∑i=2
n
n
ε =
4
4
4
E Win |Z +C ∑i=2 E ȳin |Z . By condition (iv), ∑i=2 E Win |Z → 0. Let ȳin
√
√
u =
∑ j<i u i Pij ε j / K . By Pij < 1 and ∑ j Pij2 = Pii , we have
∑ j<i u j Pij εi / K and ȳin
that a.s.
n
∑E
i=2
ε
ȳin
4
C n
|Z ≤ 2 ∑
Pij Pik Pi Pim E εi4 |Z E u j u k u u m |Z
∑
K i=2 j,k,,m<i
C n
4
2
2
≤ 2 ∑ ∑ Pij + ∑ Pij Pik ≤ C K /K 2 → 0.
K i=2 j<i
j,k<i
n E
Similarly, ∑i=2
n
∑E
i=2
u
ȳin
4
|Z → 0 a.s., so that
n 4 4
4 |Z ≤ C
ȳin
∑ E ȳinε |Z + E ȳinu |Z → 0
i=2
n E y 4 |Z → 0 a.s.
Then by T we have ∑i=2
in
Conditional on Z, to apply the martingale central limit theorem, it suffices to show that
for any > 0
n
2 |X , . . . , X
2 (Z) ≥ | Z → 0.
,
Z
−
s
P ∑i=2 E yin
1
i−1
n
(B.9)
80
JOHN C. CHAO ET AL.
Now note that E win ȳin |Z = 0 a.s. and thus we can write
n
∑E
i=2
n 2 |X , . . . , X
2
2
2
yin
1
i−1 , Z − sn (Z) = ∑ E[win |X1 , . . . , Xi−1 , Z] − E[win |Z]
i=2
n 2 |X , . . . , X
2
+ ∑ E win ȳin |X1 , . . . , Xi−1 , Z + ∑ E[ ȳin
1
i−1 , Z] − E[ ȳin |Z] .
n
i=2
i=2
(B.10)
We will show that each term on the right-hand side of (B.10) converges to zero a.s. To pro2 |X , . . . ,
ceed, note first that by independence of W1n , . . . , Wnn conditional on Z, E[win
1
2
Xi−1 , Z] = E[win |Z] a.s. Next, note that E win ȳin |X1 , . . . , Xi−1 , Z = E[win u i |Z]
√
√
∑ j<i Pij ε j / K + E[win εi |Z] ∑ j<i Pij u j / K . Let δi = δi (Z) = E[win u i |Z] and con√
sider the first term, δi ∑ j<i Pij ε j / K . Let P̄ be the upper triangular matrix with
n
P̄ij = Pij for j > i and P̄ij = 0, j ≤ i, and let δ = (δ1 , . . . , δn ). Then, ∑i=2
∑ j<i δi Pij ε j /
√
√
2
n
n
2
≤ ∑i=1 E[win |Z]E[u i2 |Z] ≤ C
K = δ P̄ ε/ K . By CS, δ δ = ∑i=1 E win u i |Z
√
√
a.s. By Lemma B3, P̄ P̄ ≤ C K a.s., which in turn implies that λmax P̄ P̄ ≤ C K
√
a.s. It then follows given E u 2j |Z ≤ C a.s. that E[(δ P̄ ε/ K )2 |Z] ≤ Cδ P̄ P̄δ/K ≤
√
√
C δ
2 / K ≤ C/ K → 0 a.s., so that by M we have for any > 0, P δ (Z) P̄ ε/
√ √
n E w ε |Z
K ≥ |Z → 0 a.s. Similarly, we have ∑i=2
∑ j<i Pij u j / K → 0 a.s.
in i
n
Therefore, it follows by T that, for any > 0, P ∑i=2 E win ȳin |X1 , . . . , Xi−1 , Z ≥ |Z → 0 a.s.
To finish showing that equation (B.9) is satisfied, it only remains to show that, for any
> 0,
n
2 |X , . . . , X
2
P ∑i=2 E ȳin
(B.11)
1
i−1 , Z − E[ ȳin |]Z ≥ |Z → 0 a.s.
Now, write
n
∑ E ȳin2 |X1 , . . . , Xi−1 , Z − E[ ȳin2 |Z]
i=2
=
∑ ωi2 Pij2
ε 2j − σ j2
K +2
j<i
∑
+ ∑ σi2 Pij2 u 2j − ω2j
K +2
j<i
+2 ∑
j<i
γi Pij2 u j ε j − γ j
ωi2 Pij Pik ε j εk /K
j<k<i
∑
σi2 Pij Pik u j u k /K
j<k<i
K +2
∑
γi Pij Pik (u j εk + u k ε j )/K .
(B.12)
j<k<i
By applying parts (i)–(iii) of Lemma B4 with φi = γi , ωi2 , and σi2 , respectively, we
2 2 obtain, a.s., E ∑ j<i γi Pij2 u j ε j − γ j /K |Z → 0, E ∑ j<i ωi2 Pij2 ε 2j − σ 2j /K | Z
2 → 0, and E ∑ j<i σi2 P 2ij u 2j − ω2j /K | Z → 0. Moreover, applying part (iv) of
2
Lemma B4 with φi = γi , we obtain E ∑ j<k<i γi Pi j P ik u j εk + u k εi /K | Z → 0
a.s. PZ . Similarly, conditional on Z, all of the remaining terms in equation (B.12) converge
in mean square to zero a.s. by parts (v) and (vi) of Lemma B4.
JIVE WITH HETEROSKEDASTICITY
81
The preceding argument shows that as n → ∞, P (Yn ≤ y|Z) → (y) a.s. PZ , for every
real number y, where (y) denotes the cumulative distributionfunction of a standard
normal distribution. Moreover, it is clear that, for some > 0, sup E |P (Yn ≤ y|Z)|1+ < ∞
n
(take, e.g., = 1 ). Hence, by a version of the dominated convergence theorem, as given by
Theorem 25.12 of Billingsley (1986), we deduce that P (Yn ≤ y) = E [P (Yn ≤ y| Z)] →
n
E [ (y)] = (y) , which gives the desired conclusion.
Proof of Lemma A4. Let w̄i = E[Wi |Z], W̃i = Wi − w̄i , ȳi = E[Yi |Z], Ỹi = Yi − ȳi ,
η̄i = E[ηi |Z], η̃i = ηi − η̄i ,
μ̄2W = max w̄i2 ≤ C/n,
i≤n
μ̄2Y = max ȳi2 ≤ C/n,
i≤n
2 = max Var(W |Z) ≤ C/r ,
σ̄W
n
i
i≤n
σ̄η2 = max Var(ηi |Z) ≤ C.
i≤n
μ̄2η = max η̄i2 ≤ C,
i≤n
2
σ̄Y = max Var(Yi |Z) ≤ C/rn ,
i ≤n
Also, let y̆i = ∑ j Pij ȳ j , w̆i = ∑ j Pij w̄ j , be predicted values from projecting ȳ and w̄ on
P and note that
∑ y̆i2 ≤ ∑ ȳi2 ≤ C, ∑ w̆i2 ≤ ∑ w̄i2 ≤ C.
i
i
i
i
By adding and subtracting terms similar to the beginning of the proof of Theorem 4,
An =
∑ ∑
i= j k ∈{i,
/ j}
w̄i Pik η̄k Pkj ȳ j
= ∑ η̄i w̆i y̆i − Pii w̄i y̆i − Pii w̆i ȳi + 2Pii2 w̄i ȳi
n − ∑ w̄i ȳi Pij2 η̄ j .
i
i, j
By T, CS, and η̄k ≤ C,
w̆
η̄
y̆
∑ k k k ≤ C ∑ w̆k2 ∑ y̆k2 ≤ C,
k
k
k
w̄
P
η̄
y̆
∑ i ii i i ≤ ∑ w̄i2 Pii2 η̄i2 ∑ y̆i2 ≤ C,
i
i
i
and it follows similarly that ∑i w̆i Pii η̄i ȳi is bounded. By Lemma B1, ∑i,k w̄i ȳi Pik2 η̄k ≤ Cn −1 ∑i,k Pik2 ≤ C K /n ≤ C. Also, ∑i w̄i ȳi Pii2 η̄i ≤ Cn/n = C. Thus, |An | ≤ C holds
by T.
For the remainder of this proof we let E[•] denote the conditional expectation given Z.
Note that
Wi Pik ηk Pkj Y j = W̃i Pik ηk Pkj Y j + w̄i Pik ηk Pkj Y j
= W̃i Pik η̃k Pkj Y j + W̃i Pik η̄k Pkj Y j + w̄i Pik η̃k Pkj Y j + w̄i Pik η̄k Pkj Y j
= W̃i Pik η̃k Pkj Ỹ j + W̃i Pik η̃k Pkj ȳ j + W̃i Pik η̄k Pkj Ỹ j + W̃i Pik η̄k Pkj ȳ j
+ w̄i Pik η̃k Pkj Ỹ j + w̄i Pik η̃k Pkj ȳ j + w̄i Pik η̄k Pkj Ỹ j + w̄i Pik η̄k Pkj ȳ j .
82
JOHN C. CHAO ET AL.
Summing and subtracting the last term gives
∑
i= j=k
Wi Pik ηk Pkj Y j − An =
7
∑ ψ̂r ,
r =1
where
ψ̂1 =
ψ̂4 =
∑
W̃i Pik η̃k Pkj Ỹ j ,
ψ̂2 =
∑
W̃i Pik η̄k Pkj ȳ j ,
ψ̂5 =
i= j=k
i= j=k
∑
W̃i Pik η̃k Pkj ȳ j ,
ψ̂3 =
∑
w̄i Pik η̃k Pkj Ỹ j ,
ψ̂6 =
i= j=k
i= j=k
∑
W̃i Pik η̄k Pkj Ỹ j ,
∑
w̄i Pik η̃k Pkj ȳ j ,
i= j=k
i= j=k
p
and ψ̂7 = ∑i= j=k w̄i Pik η̄k Pkj Ỹ j . By T, the second conclusion will follow from ψ̂r → 0
for r = 1, . . . , 7. Also, note that ψ̂7 is the same as ψ̂4 and ψ̂5 , which is the same as ψ̂2
with the random variables W and Y interchanged. Because the conditions on W and Y are
p
symmetric, it suffices to show that ψ̂r → 0 for r ∈ {1, 2, 3, 4, 6}.
Consider now ψ̂1 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̃k Pkj Ỹ j W̃r
Pr s η̃s Pst Ỹt ] = 0, except for when each of the three indexes i, j, k is equal to one of the
three indexes r, s, t. There are six ways this can happen, leading to six terms in
E[ψ̂12 ] =
∑
∑
i= j=k r =s=t
E[W̃i Pik η̃k Pkj Ỹ j W̃r Pr s η̃s Pst Ỹt ] =
6
∑ τ̂q .
q=1
2 σ̄ 2 σ̄ 2 K ≤ Cr −2 K → 0. By Lemma B1, we have
Note that by hypothesis, σ̄W
η Y
n
τ̂1 =
∑
i= j=k
E[(W̃i Pik η̃k Pkj Ỹ j )2 ] =
∑
i= j=k
2 σ̄ 2 σ̄ 2 K → 0.
E[W̃i2 ]Pik2 E[η̃k2 ]Pkj2 E[Ỹ j2 ] ≤ σ̄W
η Y
Similarly, by CS,
τ̂3 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃ j Pjk η̃k Pki Ỹi )]
i= j=k
2
2
2
= ∑ E[W̃i Ỹi ]E[W̃ j Ỹ j ]E[η̃k ]Pik Pkj i= j=k
2 σ̄ 2 σ̄ 2 K → 0.
≤ σW
η Y
Next, by Lemma B1 and CS
τ̂2 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃i Pij η̃ j Pjk Ỹk )]
i= j=k
2
2
= ∑ E[W̃i ]E[η̃k Ỹk ]E[η̃ j Ỹ j ]Pik Pij Pjk i= j=k
2 σ̄ 2 σ̄ 2 K → 0.
≤ σ̄W
η Y
JIVE WITH HETEROSKEDASTICITY
83
Similarly,
τ̂4 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃ j Pji η̃i Pik Ỹk )]
i= j=k
2
= ∑ E[W̃i η̃i ]E[W̃ j Ỹ j ]E[η̃k Ỹk ]Pik Pkj Pji i= j=k
2 σ̄ 2 σ̄ 2 K → 0,
≤ σ̄W
η Y
τ̂5 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃k Pki η̃i Pij Ỹ j )]
i= j=k
2
2
= ∑ E[W̃i η̃i ]E[Ỹ j ]E[W̃k η̃k ]Pik Pkj Pji i= j=k
2 σ̄ 2 σ̄ 2 K → 0,
≤ σ̄W
η Y
τ̂6 = ∑ E[(W̃i Pik η̃k Pkj Ỹ j )(W̃k Pkj η̃ j Pji Ỹi )]
i= j=k
2
= ∑ E[W̃i Ỹi ]E[η̃ j Ỹ j ]E[W̃k η̃k ]Pjk Pij Pik i= j=k
2 σ̄ 2 σ̄ 2 K → 0.
≤ σ̄W
η Y
p
T then gives E[ψ̂12 ] → 0, so ψ̂12 → 0 holds by M.
Consider now ψ̂2 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̃k Pkj ȳ j W̃r
Pr s η̃s Pst ȳt ] = 0, except when i = r and j = s or i = s and j = r. Then by (A +
B + C)2 ≤ 3(A2 + B 2 + C 2 ) and for fixed k, ∑i=k Pik2 ≤ Pkk , ∑i=k Pik4 ≤ Pkk , it follows
that
∑
i=k
Pik2
2
∑
j ∈{i,k}
/
Pkj ȳ j
2 ȳ 2 + P 2 ȳ 2
≤ 3 ∑ Pik2 y̆k2 + Pki
i
kk k
i=k
2 + 2 ȳ 2
2 +2
2 ≤ 9n μ̄2 ≤ C.
P
y̆
ȳ
y̆
≤
3
∑ kk k
∑ k ∑ k
k
Y
≤3
k
k
k
84
JOHN C. CHAO ET AL.
It follows by |AB| ≤ A2 + B 2
2, CS, and Pik = Pki that
E[ψ̂22 ] =
∑
i=k
2
∑
E[W̃i2 ]Pik2 E[η̃k2 ]
j ∈{i,k}
/
Pkj ȳ j
+∑
i=k
j ∈{i,k}
/
2 σ̄ 2
≤ 2σ̄W
η
∑
i=k
∑
Pkj ȳ j
j ∈{i,k}
/
Pij ȳ j
2
∑
Pik2
∑
E[W̃i η̃i ]Pik2 E[W̃k η̃k ]
j ∈{i,k}
/
≤ C/rn → 0.
Pkj ȳ j
p
Then ψ̂2 → 0 holds by M.
Consider ψ̂3 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̄k Pkj Ỹ j W̃r Pr s η̄s
Pst Ỹt ] = 0, except when i = r and j = t or i = t and j = r. Thus,
2
2
2
2
E[ψ̂3 ] = ∑ E[W̃i ]E[Ỹ j ] + E[W̃i Ỹi ]E[W̃ j Ỹ j ]
∑ Pik η̄k Pkj
i= j
2 σ̄ 2
≤ 2σ̄W
Y
∑
∑
i= j
k ∈{i,
/ j}
2
k ∈{i,
/ j}
Pik η̄k Pkj
.
Note that for i = j, ∑k ∈{i,
/ j} Pik Pkj η̄k = ∑k Pik Pkj η̄k − Pij Pii η̄i − Pij Pj j η̄ j . Note also that
2
∑ ∑
i
Pik2 η̄k
k
i,k,
2
∑ ∑ Pik η̄k Pkj
i, j
∑
=
∑
=
k
i, j,k,
=∑
k,
∑ ∑ Pik η̄k Pkj
k
i, j
2 = μ̄2
2
2
Pik2 Pi
η ∑ Pii ≤ μ̄η K ,
∑ Pik η̄k Pkj
k
∑
2 = μ̄2 K .
Pk
η
2
k,
−∑
i
i
k,
=∑
∑
i,k,
Pik η̄k Pjk Pi η̄ Pj = ∑ η̄k η̄
2 ≤ μ̄2
η̄k η̄ Pk
η
It therefore follows that
2
i= j
2 η̄ η̄ ≤ μ̄2
Pik2 Pi
k η
∑ Pik Pi
i
∑ Pjk Pj
j
2
∑ Pik η̄k Pki
k
≤ 2μ̄2η K .
Also, by Lemma B1, ∑i= j Pij2 Pj2j η̄2j ≤ μ̄2η ∑i= j Pij2 ≤ μ̄2η K , so that
∑
i= j
⎧
⎨
2
∑
k ∈{i,
/ j}
Pik η̄k Pkj
≤3
2
∑ ⎩ ∑ Pik η̄k Pkj
i= j
k
+ Pij2 Pii2 η̄i2 + Pij2 Pj2j η̄2j
⎫
⎬
⎭
≤ 6μ̄2η K .
2 σ̄ 2 μ̄2 K ≤ Cr −2
From the previous expression for E[ψ̂32 ], we then have E[ψ̂32 ] ≤ C σ̄W
n
Y η
p
K → 0. Then ψ̂3 → 0 by M.
JIVE WITH HETEROSKEDASTICITY
85
Next, consider ψ̂4 . Note that for i = j = k and r = s = t, we have E[W̃i Pik η̄k Pkj ȳ j W̃r
Pr s η̄s Pst ȳt ] = 0, except when i = r. Thus,
2
2
E[ψ̂42 ] = ∑ E[W̃i2 ]
∑ ∑
j=i k ∈{i,
/ j}
i
Pik η̄k Pkj ȳ j
2
≤ σ̄W
∑
i
∑ ∑
j=i k ∈{i,
/ j}
Pik η̄k Pkj ȳ j
.
Note that for i = j,
∑
k ∈{i,
/ j}
Pik η̄k Pkj ȳ j = ∑ Pik η̄k Pkj ȳ j − Pii η̄i Pij ȳ j − Pij η̄ j Pj j ȳ j .
k
Therefore, for fixed i,
∑ ∑
j=i k ∈{i,
/ j}
Pik η̄k Pkj ȳ j =
∑ ∑ Pik η̄k Pkj ȳj − Pii η̄i Pij ȳj − Pij η̄ j Pj j ȳj
j=i
k
= ∑ Pik η̄k y̆k − ∑ Pik2 η̄k ȳi − Pii η̄i y̆i − ∑ Pij η̄ j Pj j ȳ j + 2Pii2 η̄i ȳi .
k
k
j
Note that because P is idempotent, we have ∑ j ∑k Pjk η̄ j y̆ j η̄k y̆k ≤ ∑ j η̄2j y̆ j2 ≤ μ̄2η ∑ j y̆ j2 ≤
μ̄2η ∑ j ȳ j2 ≤ n μ̄2η μ̄2Y ≤ C. Then it follows that
∑{∑ Pik η̄k y̆k }2 = ∑ ∑ ∑ Pij η̄ j y̆j Pik η̄k y̆k = ∑ ∑ η̄ j y̆j η̄k y̆k ∑ Pij Pik
i
k
i
j k
j k
i
= ∑ ∑ Pjk η̄ j y̆ j η̄k y̆k ≤ C.
j k
Also, using similar reasoning,
∑(Pii η̄i y̆i )2 ≤ ∑ η̄i2 y̆i2 ≤ n μ̄2η μ̄2Y ≤ C,
i
i
2
∑ ∑ Pij η̄ j Pj j ȳj
i
≤ ∑ η̄i2 Pii2 ȳi2 ≤ ∑ η̄i2 ȳi2 ≤ C,
j
∑
i
i
i
2
ȳi ∑ Pik2 η̄k
≤ μ̄2Y
k
∑
i,k,
2 η̄ η̄ ≤ μ̄2 μ̄2
Pik2 Pi
k Y η
∑
i,k,
2 ≤ K μ̄2 μ̄2 ≤ C,
Pik2 Pi
η Y
∑ Pii4 η̄i2 ȳi2 ≤ n μ̄2η μ̄2Y ≤ C.
i
2 C ≤ C/r
Then using the fact that (∑r5=1 Ar )2 ≤ 5 ∑r5=1 Ar2 , it follows that E[ψ̂42 ] ≤ σ̄W
n
p
→ 0, so ψ̂4 → 0 by M.
w̄i Pik Pkj ȳ j = w̄i Pik y̆k − w̄i Pik2 ȳi −
Next, consider ψ̂6 . Note that for i = k, ∑ j ∈{i,k}
/
w̄i Pik Pkk ȳk . Then for fixed k,
∑ ∑ w̄i Pik Pkj ȳj = ∑ w̄i Pik y̆k −w̄i Pik2 ȳi −w̄i Pik Pkk ȳk − w̄k Pkk y̆k + 2w̄k Pkk2 ȳk
i=k j ∈{i,k}
/
i
2 ȳ .
= w̆k y̆k − ∑ w̄i Pik2 ȳi − w̆i Pkk ȳk − w̄k Pkk y̆k + 2w̄k Pkk
k
i
86
JOHN C. CHAO ET AL.
Then using the fact that (∑r5=1 Ar )2 ≤ 5 ∑r5=1 Ar2 we have
E[ψ̂62 ] = ∑ E[η̃k2 ]( ∑
∑
i=k j ∈{i,k}
/
k
w̄i Pik Pkj ȳ j )2
≤ 5σ̄η2
∑
w̆k2 y̆k2 +
k
∑
2 w̄ ȳ w̄ ȳ + w̆ 2 P 2 ȳ 2 + w̄ 2 P 2 y̆ 2 + 4w̄ 2 P 4 ȳ 2
Pkj2 Pki
i i j j
k kk k
k kk k
k kk k
i, j
≤ 5σ̄η2
∑
w̆k2 y̆k2 + μ̄2W μ̄2Y
k
∑
i, j,k
≤ 5σ̄η2
2 + μ̄2
Pkj2 Pki
Y
∑
w̆k2 + μ̄2W
k
∑
y̆k2 + n4μ̄2W μ̄2Y
k
∑ w̆k2 y̆k2 + 7n μ̄2W μ̄2Y
k
≤ C ∑ w̆k2 y̆k2 + Cn/n 2 ≤ C ∑ w̆k2 y̆k2 + o(1).
k
k
√
= maxi |ai − Z i πn | → 0, let αn = πn / n, and note that
Now let πn be such that n √
maxi≤n w̄i − Z i αn = n / n. Let w̄ = (w̄1 , . . . , w̄n ) . Then
|w̄i − w̆i | = w̄i − Z i (Z Z )−1 Z w̄ = w̄i − Z i αn − Z i (Z Z )−1 Z (w̄ − Z αn )
1/2 2 1/2
√
2
≤ n / n + ∑ Pij
∑ w̄ j − Z j αn
j
j
1/2 √
1/2
≤ n + Pii
n max w̄i − Z i αn = n + Pii n ≤ Cn .
i≤n
Then by T, maxi≤n |w̆i | ≤ maxi≤n |w̄i | + n → 0, so that
2
2 y̆ 2 ≤ max |w̆ |
w̆
i
∑ k k
∑ y̆k2 = o(1) ∑ ȳk2 → 0.
k
i≤n
k
k
p
Then we have E[ψ̂62 ] → 0, so by M, ψ̂6 → 0. The conclusion then follows by T.
n