Panel Data Analysis Part II Feasible Estimators James J. Heckman - - PowerPoint PPT Presentation
Panel Data Analysis Part II Feasible Estimators James J. Heckman - - PowerPoint PPT Presentation
Panel Data Analysis Part II Feasible Estimators James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 1 Feasible GLS estimation How to do feasible GLS? To do feasible GLS
1 Feasible GLS estimation
How to do feasible GLS? To do feasible GLS estimation we follow the following two-step process: (1) Perform OLS regression, fit ˆ and form residuals ˆ
- (2) Estimate 2
and 2 using the estimated residuals.
Step 2 of the above procedure is done as follows: Take ˆ · =
- X
=1
ˆ = ˆ +
- X
=1
˜
- 1
Then we have (using the assumptions that [00] = 0 and [] = 0)
- X
=1
( ˆ )2 1
- = 2
+ 2
- ()
ˆ is unbiased for but not consistent - why? For fixed, we cannot generate consistent estimates. 2
Impose restriction
- X
=1
ˆ = 0.
- "
X
=1
- X
=1
(ˆ ˆ ·)2 # = ( 1 ( 1)]2
- ()
= [( 1) ( 1)]2
- where ˆ
· = 1
- X
=1
- 3
we use residuals from the regression to estimate 2
- ˆ
2
= I
X
=1
- X
=1
(ˆ ˆ ·)2 I( 1) ( 1) 4
Then from equations () and () we have: ˆ 2
=
- X
=1
( ˆ )2 1 2
- ,
- (but this may go negative: if so, set ˆ
2
at zero). Then, simply
use
- X
=1
( ˆ )2 I 1
- to estimate ˆ
2
which is always positive, consistent.
5
- Assuming normality for the error term, we get that the
FGLS estimator is unbiased, i.e.: (Feasible GLS) = (ˆ GLS)
- Further, by a theorem of Taylor - only 17% reduction in
eciency by estimating P rather than knowing it. 6
Mundlak Problem: Example of a use of panel data to ferret
- ut and identify a cross sectional relationship. Mundlak posed
the problem that is correlated with (e.g.. is managerial ability; is inputs) (()) 6= 0, and we have a specification error bias:
- h
ˆ
- i
= + £ (0)10 ¤ 6= 0 since (0) 6= 0. 7
OLS is inconsistent and biased. One way to eliminate the problem: Use the within estimator:
- I - 0
- ¸
=
- I -0
- ¸
+
- I - 0
- ¸
- · = [ ·]0 + ·
On the transformed data, we get an estimator that is unbiased and consistent. Estimator of fixed eect not consistent (we acquire an inciden- tal parameters problem but we can eliminate it as fixed, we get a consistent estimator.) 8
Notice, however, if there exists a variable that stays constant
- ver the spell for all persons, we cannot estimate the associated
. ˆ = +
·
where
and are variables and associated coecients that
stay fixed over spells (we can regress estimated fixed eects on the provided that they stay constant over the spell are not corrected with the ). 9
- In a cross section context, we have that without some
- ther information, the model is not identified unless we
can invoke IV estimation.
- F.E. estimator is a conditional version of R.E. estima-
tor // R.E. estimator: + both random values, we condition on values of . 10
How To Test For the Presence of Bias? 0 : No Bias in the OLS estimator :OLS and between estimator are biased, Within estimator is unbiased. ˆ
- vs. ˆ
- ˆ
= + ()1
I
X
=1 [I - 1
0] ˆ B = + (B)1 " X
=1
- 1
# . 11
Under 0 COV ³ ˆ ˆ
- ´
= 0 Independently distributed under a normality assumption. we can test (just pool the standard errors). 12
Strict Exogeneity Test Basic idea ( | ) 6= 0. where (1 ) Failure of this is failure of strict exogeneity in the time series literature. Regression Function (Scalar Case) ( | ) = 0 + 11 + 22 + 33 + where denotes linear projection. Then, in Mundlak’s prob- lem, we get that = 1 = 0 + 1 + [0 + 11 + 22 + ] + 13
Then, we can test to see whether or not future and past values
- f enter the equation [if so, we get a violation of strict
exogeneity in this set up]. Notice we can estimate 2 (from first equation), 1 (from sec-
- nd equation) and so forth
can estimate 1 [but, we cannot separate out the intercepts in this equation. Nor can we identify variables that don’t vary
- ver time.] This is just a control function in the sense of Heck-
man and Robb (1985, 1986). 14
Chamberlain’s Strict Exogeneity Test = + , = 1 I, = 1 is strictly exogenous if ( | ) = 0 model can be fitted by OLS. We can test, in time series = + +
an extraneous variable
+ = 1 = 1 We have strict exogeneity in the process if = 0 (assumption: is correlated over time: +) a future value of a variable is in the equation (that doesn’t belong) we can do an exact test. 15
Consider special error structure: (one factor setup) = + i.i.d. ( |1 2 3 )=
- X
=1
- Then if we relax the strict exogeneity assumption, we have that
( | [1 ]) = +
- X
=1
- ( | ) =
16
Array the into a supervector = DIAG{ } + in all regressions, we have that stays fixed we can test this assumption. When applying this test in particular economic situations, we must interpret the results with caution. For e.g., in the ap- plication of this test to the situation in the permanent income hypothesis, the significance of the coecients of future values can not be ruled out under the model. 17
Example: Chamberlain test with T = 3 periods Simple regression setting with = + i.i.d.,
- we have:
= 11 + 22 + 33 + Then 1 = 11 + 11 + 22 + 33 + + 1 2 = 22 + 11 + 22 + 33 + + 2 3 = 33 + 11 + 22 + 33 + + 3 18
For a factor structure, = + i.i.d. . Then: 1 = 11 + 1(11 + 22 + 33) + 1 + 1 2 = 22 + 2(11 + 22 + 33) + 2 + 2 3 = 33 + 3(11 + 22 + 33) + 3 + 3 19
Can Identify
- 12
13 (1 + 11) 21 23 2 + 22 (3 + 33) 31 32 · · · · · · · · ·
- Normalize: set 1 1 then we can identify, 2, 3, 1 2 and
3. 20
2 Maximum likelihood panel data es- timators
Consider these models from a more general viewpoint, we can form dierent maximum likelihood estimators of the parame- ters of interest. Assume = + Write
- = (1 1 )
= 1 I 21
- is an i.i.d. random vector with distribution depending on
- ˜ = ( 1 I) = ( )
(treat as a parameter) £ =
- Y
=1
(
˜
¯ ¯ ¯
˜ )
( | 1 ). Max £ w.r.t. = ˆ . 22
³ ˆ
- ´
- 9 as
fixed . In general, ˆ
- 9 as I because of this. Not like in lin-
ear models (in general, roots of these equations interconnected and we have problems). A joint system of equations $
- = 0
$
- = 0
= 1 23
This set of likelihood equations can be solved using three dis- tinct concepts:
- 1. Marginal Likelihood;
- 2. Conditional Likelihood; and
- 3. Integrated Likelihood.
24
2.1 Marginal Likelihood (or Ancillary Like- lihood):
Find (if possible) ( ) independent of the i.e. find some statistic = ( ) such that ( |) £Marginal =
- Y
=1
( |)
- $ ˆ
- Then we can form the ML estimators for (the parameters of
interest) without worrying about the 25
We say that is ancillary for given with respect to original
- model. (This is really b-ancillarity). An example of this is the
within estimator. = + + i.i.d. N(0 2
)
26
= 2 = 2 1 is called an ancillary statistic distribution is independent is : |
- ˜N((2 1)
¯ ¯22) Thus an example of the Marginal likelihood estimator is the first dierence estimator, which is almost identical to the “within”
- estimator. Here, the within estimator would also be a Marginal
likelihood estimator. 27
Because
[ 0
]0 = 0 We can always break up the distribution of into two pieces =
- I 0
- ¸
+ 0 ( | ) = ( | ) | {z }
This portion ind of Marginal Likelihood
(· | ) | {z }
This is a sucient statistic for
28
2.2 Maximum Likelihood Second Principle
Find , a sucient statistic for such that ( | sucient statistic for ) is ind. of Find = ( ) so that ( | ) = ( | ) Can throw away , e.g., = 1 + 2 1 + 2
- ˜( + (1 + 2) 22
+ 42 )
29
Transform observation µ 1 2 ¶
- µ 2 1
2 + 1 ¶ (2 1 2 + 1 | ) = 0 (1 2) = (2 1 1 + 2) = (2 1 |) (1 + 2 |) but (2 1 2 + 1 | ) = (2 1 | ) conditional likelihood function is the same as in previous case. 30
2.3 Integrated L.F. or Random Eects Esti- mator
Pick a density for (other methods do not require this) pdf of ( | ). For each person ( | ) = Z ( | ) ( | )
- $ =
- Y
=1
( | ) Suppose it is normal, N ¡ 0 2
- ¢
. 31
When we integrate out in the above using normality, we get N
- (2
+ 2 )
- 1
- 1
- · · · · · ·
1
- Problem becomes one of estimating
( 2
and distribution function of ).
32
Two possible methods: (a) Assume ( | ) is a known finite parameter distribution (function of ) and estimate ( 2
) (maybe too).
(b) Nonparametric estimation (e.g., Heckman-Singer). Then estimate 2
().
33
Mundlak Point: The within estimator is the GLS estimator in all cases if = ¯ + The more general point is that if we permit fixed eects to be functions of exogenous variables, the between and within estimators will in general dier. Lee (as cited in Judge, et al.) shows how special the Mundlak point is. 34
Suppose = + N(0 2
)
If = · Marginal = Con = Int. = Within = ˆ MLE in a regression setting. Mundlak’s point is this: Suppose that = · + (then is ind of ) = + · + + 35
Now what is the random eect estimator? = ( ·) + ·( + ) + + intuitively: you get info only on from within. Apply GLS =
- ¸
= ˜
- where =
- 1
r 1 1 + ¸ (refer to Section 3.2 of Part I). 36
Thus, we get the GLS transformation as: ˜ = ˜ ( ·) + ˜ ·( + ) + ˜ ( + ) In general,
- ¸
· = ·(1 ) 37