3

Error Components - Random Effects - Variance Components
3.1 The Problem
Quite often we have disaggregated data for a large number of individuals (a cross section), say N, observed over a number of periods (a time series), say T. The problem is to combine the NT observations in order to make our estimator more efficient. We will explore issues of unbiasedness, consistency and efficiency in the context of pooled cross section -- time series data. We will begin with the random effects model since, arguably, the dummy variables model is a special case (or is it the other way around?).

The classic paper in applied economics is Nerlove's essay on the demand for natural gas. Most of the subsequent theoretical developments can be attributed to Swamy and Mehta. Other more recent names to look for are Baltagi and Avery.

Before going into details, a simple example (or choose the *.mcd file) will help you visualize what is going on.

For an R-script with output and a LIMDEP output that both deal with random effects models skip down ot the bottom of this page.

3.2 Set Up

There are i=1,...,N individuals observed over t=1,...,T periods and we posit the relationship

(1)

When we have the same number of observations for each person the experimental design is known as a balanced block design. When the blocks are unequal all of the follwoing results hold, but only after some corrections.

Depending on the circumstance, we might also represent the model in one of two alternate forms

where e is an NTx1 vector of ones, a is the corresponding intercept, Z are the remaining k-1 columns of X and d are slope coefficients.

Note the following:
1) We presume that all individuals have the same response to changes in the independent variables. The coefficients to be estimated are equal across persons and periods. That is, they all have the same MPC in a consumption model.

2) We might think that there are random effects across individuals. That is, at all points in time, a change in an unobserved variable affects each individual differently, but the effect is fixed over all the periods. Also, there may be a random effect in each period that affects all individuals in the same way. Finally, there may be unobservables that are random through time and across individuals.

Thus, we can decompose our error term as follows

m_i represents the individual effects

l_t represents the time effect

n_it represents the purely random or white noise effect

We make the following assumptions about the components of the error

That is, each component of the error term has a mean of zero.

The variance of the individual effect is the same for all persons, although the realization of the disturbance may differ across persons. Further, there is no correlation between persons.

The variance of the time effect is the same for all periods, but the realization differs from period to period. There is no serial correlation. If the time subscript does not match then the expectation is zero.

The expectation of the white noise product is non zero only when both the individual and time subscripts match.

Let us construct the covariance matrix for the population disturbance. We'll do this one piece at a time, starting with the individual effect

Given our assumption about the individual random effect, we can write the covariance matrix for the ith person's random effect as

e_T is a column vector of ones of dimension Tx1. When we stack all of the individuals one over the other, the pattern of individual effects variances is

This is a great big matrix of dimension NTxNT. There are a total of N blocks on the main diagonal, each of which is TxT. Each block is filled with the common variance. Every other position is filled with a zero.

Similarly for the time effect, we first stack the observations, running first through the time subscript then incrementing the individual subscript

Consider first the time effect covariance between person i and person j

This is a scalar diagonal matrix because the time subscripts match between the ith and jth persons.
For any given person we have

Putting everyone back together

This NTxNT matrix is composed of TxT scalar diagonal matrices everywhere.
Consider now the white noise term.

Anywhere that both the individual and time subscripts match we get a nonzero expectation.

The result is a big, NTxNT, scalar diagonal matrix.

Putting together the three pieces we get the patterned matrix

Estimation

Let us begin by redefining some variances

The strategy that we employ is to go through the data the first time to obtain sets of residuals which are then used to construct estimates of the parameters in the error covariance matrix. As a strategy, this is similar to what you did to correct for autocorrelation using the Cochrane Orcutt technique.

Step 1.a

Construct a cross section regression by finding the mean of T observations for each individual. That is,

Notice several things: We have shifted the position of the disturbance m_i and that Sl_t = 0. One way to interpret this formulation would be to assert that the intercept is random. Parenthetically, P.A.V.B. Swamy extended this notion to make all of the regression coeffcients random.
The estimator applied to this equation is known as the between estimator and is equivalent to applying least squares to

Q₁ is a matrix which puts the data in a form that provides one data point per person; we have only N observations now. Q₁ is idempotent and has a trace of N-1. Each person's time mean is measured as a deviation from the grand mean. The result is that we lose the intercept. Hence, X_s refers to the set of independent variables excluding the column of ones for the intercept. An equivalent OLS formulation would be to use the original data, but include a dummy variable for each distinct person (no intercept then). Suppose that you have a cross section of time series on wages, the dependent variable, and schooling, the independent variable. Ability is an omitted variable that serves to shift the intercept across individuals. The question is whether the shift is random or fixed? You specify a random effects model. The result is that there will be a correlation between schooling and the error term. The testing procedure for discriminating between the least squares dummy variable model and the random effects model exploits this fact.
Let us look at the residual sum of squares for the 'between' estimator.

Taking the trace and expected value we have

Let us consider the term involving s_l². We are particularly interested in the product of Q₁ and the Kronecker product term.

with some manipulation we can show that this is zero so the s_l² term drops out.
Now consider the s_m² term, the relevant portion of which we reproduce here

Let us look specifically at the part involving the product of Q₁ and the Kronecker product.

The trace of this we can see to be NT - T, or T(N-1).
We also want to take advantage of the following property

If you multiply through by T/T you can see that we get TQ₁Q₁. Since Q₁ is idempotent this is just TQ₁. Therefore, when we put together some of the pieces involving the s_m² we can write

Therefore

So, using the residuals from the 'between' estimator,

is an unbiased estimator of . In the above equation the . indicates that we have already summed out the time effects

Step 1.b
Construct a time series regression by finding the mean of N observations at each point in time. That is,

Applying least squares to this equation is the 'within' estimator encountered in the analysis of variance. This estimation equation could be written as

where Q₂ is given by

Each observation is measured as a deviation of the mean across individuals for the t^th period from the grand mean. The OLS equivalent to the 'within' estimator would be one which included a dummy variable for each time period (no intercept in that case).
Notice that we have shifted the position of the disturbance l_t so that it is closely associated with the intercept. Again there is the question of whether the appropriate model is fixed effects or random effects. If we incorrectly specify the RE, then the error term will be correlated with the RHS variables. Also, Sm_i = 0. Note that the intercept can again be thought of as a random term with non-zero mean. From this regression we can save the residual sum of squares to construct

which we can show to be unbiased using the same methods as applied in step 1.a. The . indicates that we have summed out the individual effects.

Step 1.c

We will now use time, state, and overall means to construct

There is no intercept in this model. It is equivalent to applying least squares to a model that has all the variables measured in their levels, but which has a set of dummies for individuals and a set of dummies for time periods. It is sometimes referred to as the least squares dummy variables model or, in ANOVA, the fully saturated model. The algebraic form taken by the saturated model is

Save the residuals from this model and construct

which is also unbiased.

Step 2.a

Form the coefficients

where .

Step 2.b

Construct the transformed variables

Step 2.c

Now you are ready to estimate the parameters of

3.4 Properties of the Estimator

We will consider two step estimators in general, of which

is a particular example.

THEOREM

Consider the model

we can regard this as a set of N equations with each equation having T observations. Assume that the disturbances in the different equations U₁(t), ..., U_N(t) follow a t-dimensional continuous probability law, symmetric about zero. That is, f(U₁(t), ..., U_N(t)) is an even function.
Then

where is an unbiased estimator of W^-1, is itself unbiased. Also assume U has a fourth moment and E(1-r-w)^-1 exists.

Proof:

In part a. of the proof we show that the expectation of the estimator exists and in part b. we show that the estimator is unbiased.

a. Let h denote any vector of real numbers from NT dimensional space and consider the expectation of

Recall Y = X b + U, so we will make this substitution also

Recall the Cauchy Schwartz Inequality

For our problem we will adopt the following definitions

and

Substituting into

Recall from our unit on linear algebra that if A-B is positive semi definite then Z'(A-B)Z � 0. For our problem we'll let

So upon taking the difference

Factoring the square root of out of this expression gives us

The part in square brackets is idempotent so must be positive semi-definite. The square root of the inverse of the error covariance estimator, , is also positive semi definite. Therefore A - B is positive semi definite and we can conclude that

and that

Therefore

Now introduce the following definitions

so that for the model as stated at the start of the theorem we can write

The largest and smallest characteristic roots of the estimated error covariance matrix are

From two theorems of linear algebra

Substituting these results into

Finally

Since all three terms on the right are finite we can conclude that

is a continuous random variable since U is continuous. Therefore the probability that is singular is zero. We wish to demonstrate that E()=0.
We can write

Note that is an even function of U, and therefore H(U) is also even. Now

is an odd function and is isomorphic about zero. So

H(U) and f(U) are even, U is odd, so the integrand is odd. Therefore, E()=0.

3.5 Testing the Specification
A. Random Effects vs. OLS

Our test statistic will be

where . That is, the set of residuals is saved from applying OLS to the whole sample. In this case our test statistic is distributed as .
We note the following

The first of these sums the OLS residuals over time for each individual and squares the n results, then adds them up. The sum can be thought of as an estimate of the numerator of s₁²=s_u²+Ts_m². The second sums the OLS residuals over individuals for each period and squares the T results, then adds them up. The sum can be thought of as an estimate of the numerator of s₂²=s_u²+Ts_l². Under the null hypothesis s_m² and s_l² are both zero, so the terms in square brackets in the test statistic are zero.

B. Fixed Effects vs. OLS

The test statistic is

C. Random Effects vs Least Squares Dummy Variable Model

1. The REM assumes that, for example, individual effects are uncorrelated with the other regressors. In the example provided earlier wages were regressed on schooling and we acknowledged that ability was an unobservable that could serve to shift the intercept. Now, if we had data on the entire population then LSDV would surely be the appropriate model. But since we are drawing only a sample REM might be appropriate. That is, the intercept varies in a random fashion across individuals due to sampling. The problem is that the random effect attributable to ability might be correlated with schooling.

2. If the random effects are correlated with other regressors then the random effects estimator is inconsistent due to omitted variables. Recall we raised this possibility of specification error earlier in the discussion.

H_o: No correlation. LSDV and REM are both consistent, LSDV is not efficient. Therefore the REM is the better estimator.

H₁: Correlation. REM is not consistent, so use LSDV.

Under H_o LSDV and REM will not differ systematically, so we look at

We know

A result due to Hausman is the following

Using this

Note that the variances in S exclude any terms corresponding to dummy variables and intercepts. It is based solely on the slope coefficients.
A Wald statistic is then

An example is provided in another section of the lecture notes.

_______________________________________________________________________

Random Effects, R and LIMDEP

_______________________________________________________________________

The R-script is

## Example of Random Effects model from Venables and Ripley, page 205
library("nlme")
library("regress")
data(Oats)
names(Oats) <- c("B","V","N","Y")
Oats$N <- as.factor(Oats$N)

## Using regress
oats.reg <- regress(Y~N+V,~B+I(B:V),identity=TRUE,print.level=1,data=Oats)
summary(oats.reg)

## Using lme
oats.lme <- lme(Y~N+V,random=~1|B/V,data=Oats,method="REML")
summary(oats.lme)

The corresponding output for the REGRESS command is

Maximised Residual Log Likelihood is -214.975

Linear Coefficients:
                  Estimate     Std. Error
(Intercept)     79.917        8.220
N0.2              19.500        4.250
N0.4              34.833        4.250
N0.6              44.000        4.250
VMarvellous   5.292        7.079
VVictory        -6.875        7.079

Variance Coefficients:
             Estimate Std. Error
B           214.477 168.834
I(B:V)   109.693    67.711
I            162.559    32.191

The corresponding output for the LME command is

Linear mixed-effects model fit by REML
Data: Oats
AIC            BIC           logLik
586.0688   605.7756   -284.0344

Random effects:
Formula: ~1 | B
             (Intercept)
StdDev: 14.64549        # 14.64549^2 = 214.477

Formula: ~1 | V %in% B
                (Intercept)   Residual
StdDev:   10.47060    12.75034    # 10.4706^2 = 109.693 and 12.75034^2 = 162.559

Fixed effects: Y ~ N + V
                          Value     Std.Error DF     t-value    p-value
(Intercept)     79.91667   8.219989   51    9.722235 0.0000
N0.2              19.50000   4.250113   51    4.588114 0.0000
N0.4              34.83333   4.250113   51    8.195861   0.0000
N0.6              44.00000   4.250113   51 10.352667   0.0000
VMarvellous   5.29167   7.077578   10    0.747666   0.4719
VVictory        -6.87500   7.077578   10   -0.971378   0.3543

Correlation:
                          (Intr)   N0.2    N0.4    N0.6   VMrvll
N0.2                -0.259
N0.4                -0.259   0.500
N0.6                -0.259   0.500   0.500
VMarvellous   -0.431   0.000   0.000   0.000
VVictory         -0.431   0.000   0.000    0.000 0.500

Standardized Within-Group Residuals:
     Min               Q1                    Med          Q3               Max
-1.84137227 -0.66274193 -0.06682795 0.63830229 1.66054158

Number of Observations: 72
Number of Groups:
B     V %in% B
6            18

Output from LIMDEP

Fixed Effects

+----------------------------------------------------+
| Ordinary    least squares regression               |
| Model was estimated Jan 19, 2006 at 10:04:23AM     |
| LHS=YIELD    Mean                 =   103.9722     |
|              Standard deviation   =   27.05913     |
| WTS=none     Number of observs.   =         72     |
| Model size   Parameters           =          6     |
|              Degrees of freedom   =         66     |
| Residuals    Sum of squares       =   30179.08     |
|              Standard error of e =   21.38361     |
| Fit          R-squared            =   .4194761     |
|              Adjusted R-squared   =   .3754970     |
| Model test   F[ 5,    66] (prob) =   9.54 (.0000) |
| Diagnostic   Log likelihood       = -319.5402     |
|              Restricted(b=0)      = -339.1178     |
|              Chi-sq [ 5] (prob) = 39.16 (.0000) |
| Info criter. LogAmemiya Prd. Crt. =   6.205292     |
|              Akaike Info. Criter. =   6.204905     |
| Autocorrel   Durbin-Watson Stat. =   .8174727     |
|              Rho = cor[e,e(-1)]   =   .5912637     |
+----------------------------------------------------+

+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant      79.9166667     6.17291691    12.946   .0000
N2            19.5000000     7.12787048     2.736   .0080     .25000000
N3            34.8333333     7.12787048     4.887   .0000     .25000000
N4            44.0000000     7.12787048     6.173   .0000     .25000000
V1           -6.87500000     6.17291691    -1.114   .2694     .33333333
V3            5.29166667     6.17291691      .857   .3944     .33333333

+--------------------------------------------------+
| Random Effects Model: v(i,t) = e(i,t) + u(i)     |
| Estimates: Var[e]              =   .255399D+03 |
|             Var[u]              =   .214681D+03 |
|             Corr[v(i,t),v(i,s)] =   .456689      |
| Lagrange Multiplier Test vs. Model (3) =   80.50 |
| ( 1 df, prob value = .000000)                   |
| (High values of LM favor FEM/REM over CR model.) |
| Baltagi-Li form of LM Statistic =          80.50 |
+--------------------------------------------------+

+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
N2            19.5000000     5.32707415     3.661   .0003     .25000000
N3            34.8333333     5.32707415     6.539   .0000     .25000000
N4            44.0000000     5.32707415     8.260   .0000     .25000000
Constant      79.3888889     7.06887251    11.231 .0000

Notice that Greene's Var(u) is quite close to the Variance due to the Block effect in R|REGRESS. For this data set LIMDEP's estimator was not able to find positive estimates of the variances when there were random BLOCK and Variety effects.