Hence Z’X has size which is not a square matrix and therefore not invertible. The finite sample estimator of regression coefficients for the model containing IVs (Image by Author) Our regression model with IVs is as follows: the factors that are hiding in the error term and which are making education be endogenous. ![]() Instead, we take it upon faith that parents’ number of years of schooling is unlikely to be correlated with factors such as the child’s grasp of material, i.e. So the exogeneity condition for the IVs cannot be directly tested. The error term ϵ is inherently unobservable. If a regression of education on the rest of the variables in X plus meducation and feducation reveals (via an F-test) that meducation and feducation are jointly significant, the two IVs pass the relevance condition. Our chosen IVs need to pass the relevance condition. We’ll propose two variables, mother’s number of years of schooling ( meducation) and father’s number of years of schooling ( feducation) as the IVs for the person’s education. And it also leaves out aspects such as how well the person has grasped the material, their knowledge of topics outside of the curriculum and so on, all of which are left unobserved and therefore captured in the error term ϵ. As such, years of schooling captures only what is taught in school or college. We’ll argue that education is endogenous. Our X matrix is, where the each variable is a column vector of size and the size of X is. Unemp contains the percentage unemployment rate in the county of residence. College and city are boolean variables indicating whether the person went to college and whether they live in a city. Education is measured in terms of years of schooling. In the above model, we regress the natural log of wage instead of the raw wage as wage data is often right-skewed and logging it can reduce the skew. Log of wage regressed on a variety of variables (Image by Author) Replacing x _3 with z _3 yields the following model: It is uncorrelated with the error term which takes care of the second requirement.That (to some extent) satisfies the first of the above two requirements, and This is the part that is endogenous.Īnd one way to accomplish this goal is to identify a variable z _3, “an instrument for x _3”, with the following properties: A second chunk that is correlated with ϵ which we will cut out of the model.This is the part of x _3 that is in fact exogenous. A chunk that is uncorrelated with ϵ which we will add back into the model in place of x _3.One way to rescue the situation is to devise a way to effectively “break” x _3 into two parts: The coefficient estimates it generates are biased away from the true values, putting into question the usefulness of the experiment. ![]() If one or more regression variables, say x _3, is endogenous, i.e., it is correlated with the error term ϵ, the Ordinary Least Squares (OLS) estimator is not consistent. From subsequent equations, we’ll drop the 1 (which is a vector of 1s) for brevity. In the above equation, y, 1, x _2, x _3, and ϵ are column vectors of size. A linear model of two variables x_2 and x_3 (Image by Author)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |