Wednesday, May 4, 2022

Bootsrap in Econometrics. Basic Intro. Without figures and equations.

Bootstrap in Econometrics.

Many of us have acquainted with the concept of the Parametric, Semiparametric, and Non-Parametric Regressions. We have applied the concept to the domain of the several pieces of research that we carry and interpreted the results thereof. The very basic thing, to obtain any analytical distributional approximations through other parametric regression seems quite easier. But, what if those analytical distributional approximations seem difficult to obtain? Or say, impossible to obtain? Do not panic. This blog is for you.

Do you remember when you read the Monte Carlo Inference techniques or say its simulation technique? I can. I read that technique when I was going through a journal article collection of the central bank during the break period while I was in my office at Nepal Rastra Bank a year ago. The interesting thing was, then I was lackadaisical about the fact that the concept like Bootsrap technique, Kernal Density Estimation, and techniques like Monotonicity, Concavity, and other restrictions based regression even existed. Now, fairly I tend to understand, that not every but mine approach was naïve though. ;-)

Who laid down this concept then? Why do we really bother?  It was Efron (1979), whose work is cited often followed by Barnard (1963) and Hartigan (1971). So the concept was introduced before Armstrong stepped onto the moon. Not so new though.

To define, Bootstrap is basically a simulation-based technique that provides estimates of variability, confidence intervals, and critical value for tests. The pivotal idea is to create replications by treating the existing data set (size n) as a population from which the samples (size n) are obtained. This is a method for estimating the distribution of an estimator or a test statistic by resampling one's data or a model estimated from the data. It counts on treating the data as if they were the purpose of evaluating the distribution of interest.  Bootstrap yields an approximation to the distribution of an estimator that is at least as accurate as often more accurate than the approximation obtained from the first-order asymptotic theory. Say, bootstrap provides a way to substitute computations for mathematical analysis if calculating the asymptotic distribution of an estimator or a statistic is difficult, and it often provides a practical way to improve upon the first-order approximations. 

Is this what Aditya means for layman? Let's cite a beautiful example laid down by Strummer (2022) [slight modifications].

Say, we introduced a macro-econometric model. For a treatment group, we took 8 cities. The model was replicated in the 8 different cities that had specific problems with their local economy. In the 5 cities, the model seemed to work quite well and helped revive the economy quite well. But 3 cities, out of 8, the model even worsened the economic situation. If the mean is calculated from the response of the model, it seems to be 0.5. 0.5 is not a good improvement, however, many of the cities experienced good it can be said that the model drafted was better than others, to the other not better at all. Maybe the model is better than not using any other model at all.

Maybe the 5 cities experienced good because they had other variables that pushed for such an event and also, maybe the 3 cities were pulled down other facts that were not so entertaining for those 3 cities. It can be also thought that the mean was obtained as 0.5 instead of 0 because there were several things that were beyond the control of the city administration. So now, how can we decide whether the model is effective or not? We can. We can replicate the experiment many times. If such experiments are repeated we can keep track of the each of mean values ending up with the histogram. The mean values near zero state that the model can not improve anything for the event that is likely to occur and mean values that are far from the zero state that the model does something that is quite rare. BUT, instead of replicating the experiment many times, we can use Bootsrap technique.

For this, from the 8 measurement cities, we can select the samples 8 cities, with replacement (same cities many times). Now we get the bootstrapped dataset and calculate the mean. We great a different mean. Then we repeat this process until we obtain a histogram of the means from the bootstrapped dataset. We can calculate other statistics as well such as median, standard deviation, etc. We bootsrap a thousand times and make a sample out of it. We create a subset to estimate the full distribution. The mean might change if we redo the bootstrapping many times. Then we calculate the standard deviation of the distribution of the histogram obtained. We can use the confidence interval as 95 percent of the bootstrapped mean. From the case, if we find that 95 percent of the confidence interval covers 0, the hypothesis the macro-econometric model is not doing good can't be rejected. We can calculate confidence intervals from other ways as well (discussed some other day).

Though bootsrap is often quite accurate, it can be inaccurate and misleading if it is used incorrectly. Examples include inference about a parameter that is on the boundary of the parameter set, inference about the maximum or minimum random variables, and inference in the presence of weak instruments. 

Oh my... long way. No equations and figures. This is just a basic idea of the very interesting topic in the Semiparametric regression. We will catch up in the next.


Thank You

Aditya

P.S. Laud with your comments.

My references are some of the genius chaps  - Strummer (2018), Horowitz (2022), and Yatchew (2003).





No comments:

Post a Comment

Regression Discontinuity - How to determine whether it is Sharp or Fuzzy RD ? Simplest Look.           Regression discontinuity design is ga...