Wednesday, January 20, 2021

Criteria for model/lag selection (Applied Econometrics)

 ■ The commonly used criteria

a) R ^2

b) R'^2

c) Akaike Information Criterion (AIC ) 

d) Schwartz Information Criterion (SIC)

e) Hannan Quann  Information Criterion (HQIC)

f) ML criterion (Will be discussed in upcoming blogs)

The explanations are:

a) R ^ 2

R ^ 2 = ESS ÷ TSS

Is the only measure of goodness of the fit  (only a sample) not the population fit. So nothing is to do with the CLRM.

R^2 is non decreasing function of additional regressors. When an additional regressors is included in the model the R^2 is invariably increases and never decreases.

If we've two models, one with one explanatory variables, R^2 from the second model will be larger than R^2 from the 1st model.

Now we have the tenptation to choose the second model but this doesn't become a good procedure, because, that improves no penalty on additional regresors present in the model we use R'^2.

b) R'^2

R'^2 = 1 - Summation [[Ui hat]^2]

Thus, R'^2 = 1 - [[RSS/n-k] ÷ [TSS/n-1]]

                   = 1 - ( 1 - R^2) (n-1/n-k)

In general, R^2 < = R'^2

Now we choose the model with thr R'^2 highest.

The degrees of freedom (n-k), these are the penalty factors for adding the additional regresors.

c) AIC

We get, 

AIC = [e ^(2k/n)] × [Summation (Ui^2)/n]]

AIC = [e ^(2k/n)] × [RSS/n]]

Taking natural log on both sides

ln AIC = ln [e^(2k/n)] + ln [RSS/n]

ln AIC = 2k/n + ln (RSS/n)

This 2k/n is the penalty factor.

Comparing to R'^2, the AIC improves a harsher penalty ( = 2k/n) for introducing more regressors and when we compare two models on the basis of AIC the criteria is to select the model with the lowest AIC.

AIC is also used for lag length selection.

d) SIC

SIC = [n^(k/n)] × [Summation (Ui^2)/n]

SIC = [n^(k/n)] × [RSS/n]

Taking natural logs on both sides

ln SIC = k/n ×ln(n) + ln (RSS/n)

The penalty factor is k/n × ln (n)

SIC improves more harsher penalty. SIC aleo would be selected low.

e) HQIC

ln HQIC = ln (Sigmahat^2) + 2k/n × ln [ln (n)]

ln HQIC = ln (RSS/n) + 2k/n × ln [ln (n)]

(2k/n) ln [ln (n)] is a penalty factor

Moreover, there are no any hard and fast rule for any of these criterion. The least one is always encouraged to choose in a particular model auch as be it ARDL, VAR lag selection.

Thank you

Aditya Pokhrel
MBA, MA Economics, MPA






No comments:

Post a Comment

Regression Discontinuity - How to determine whether it is Sharp or Fuzzy RD ? Simplest Look.           Regression discontinuity design is ga...