Matchmaking and Regression
When the assignment isn’t so random
As I said previously, when we don’t have random assignment we must resort to econometric tools. The most fundamental econometric tool is regression.
Regression attempts to make all else equal by controlling for certain other observable variables that describe key ways in which treatment and control groups differ, thus removing selection bias.
Under a very specific set of circumstances regression succeeds at ridding us of selection bias (more on that in the coming weeks).
Even when regression does not successfully rid us of selection bias (which we will discover is pretty common), it still provides a nice foundation for other, “fancier” econometric tools.
Motivating example: private v. public college
We motivate our use of regression with a simple (to ask) question: “Is it worth it to go to a private four-year college over a public university?”.
Private colleges are generally much more expensive than local public universities: an average of $29,000 in 2012-2013 for private vs. about $9,000 for local public universities.
So does one receive that much more in earnings from a private college to make it worthwhile to pursue a more expensive education?
While this is an easily asked questions, it is far from easy to answer.
Simple comparisons tend to yield large gaps.
Graduates of “elite” private colleges (e.g. Harvard) tend to have higher SAT scores, HS GPAs, and often perhaps other skills on average than do graduates of local state universities (e.g. U-Mass).
Thus we fully expect that simple comparisons are contaminated by selection bias. That is, perhaps they just have higher ability.
Thinking on the margin
We would like to compare individuals who are similar enough in terms of those clouding factors of ability, who applied to both private and public institutions, but who made opposing decisions for factors other than those measure of ability.
There is much to be said about thinking along these lines. Economists are often concerned with finding variation in people’s choice into or out of treatment that is otherwise unrelated to our outcome of interest except through treatment. This is how we conduct plausible causal inference.
In order to get reliable causal inference we want to compare apples to apples, not apples to avocados.
One way that we could think about doing this is matching two groups of people, one that went to a private college and one that went to a public university, who are identical in every way that affects both college attendance and earnings-potential.
This is a particularly challenging issue in labor economics. How do we control for every single possible measure of ability, motivation, etc. that affects one’s choice of school and future earnings?
A truly impossible task, as not all measures of “ability” are observable.
One clever solution proposed by Stacy Berg Dale and Alan Krueger is to control for certain summary factors that help explain the key differences in ability between treatment and control groups: the characteristics of the colleges that they applied for and were accepted to.
Matching by college
Group A: Average private-public differential of -$5,000
Group B: Average private-public differential of $30,000
Group C: Only went to Private
Group D: Only went to Public
Group A suggests a negative treatment effect.
Group B suggests a sizable positive treatment effect.
A simple average yields a more reasonable $12,500.
A weighted (by sample size) average yields
A weighted average places more emphasis on larger sample sizes, which may yield a more precise estimate.
If we simply compare average earnings of public and private college attendees in groups A and B we get a much larger estimate of $20,000.
This reflects selection bias in that students who apply to and are admitted to private schools tend to have higher earnings potential regardless of which school they attend.
Regression as a matchmaker
We can think of regression as an automated matchmaker. It automatically creates a weighted average of matched comparisons similar to what we have just looked at.
There are three key components to a regression equation:
The dependent variable: This is also called the outcome variable. In our case, we are interested in earnings. We will denote this as
The treatment variable: In our case, a dummy variable to indicate students who attended a private college. We will denote this
A set of controls: In our case, variables that identify the schools students applied and were admitted to.
These are important. Think of regression as a dating website. The more information you provide, the better the match.
The regression equation
The regression equation is a linear equation linking the dependent variable to the treatment and control variables:
is the intercept (or constant term)
is the causal effect of treatment.
is the effect of being a group A student.
In econometrics we often denote the estimate of a parameter by a “hat” (e.g. )
There is no mechanical difference between the treatment variable and the control variables. We estimate them exactly in the same manner, and it is simply a matter of what question you are trying to answer as to which one you focus on.
Fitted values and residuals
The fitted values are the values predicted by the estimated regression coefficients and the values of the treatment and control variables
The final component of the regression equation is the residual. The residual is defined as the difference between the dependent variable and the fitted value
Ordinary Least Squares
The most common means of estimating a linear regression model like we have is using ordinary least squares (OLS) estimation.
In order to estimate the parameters of the equation we select , , and to minimize the sum of squared residuals (hence the name “least squares”).
So we minimize
We will not delve into the mathematical method of solving the above minimization problem (for those of you who are calculus-savvy think first order conditions), however we will briefly touch on what the estimates look like in the coming class sections.
Estimating the parameters
We could calculate the OLS estimates by hand, but we have wonderful technology to do that for us… Insert clever segue to R demonstration here.
We end up with the following estimates
Thus OLS suggests that the average causal effect of private college attendance over public college is $10,000, which is not too far from the simple matching weighted average we calculated prior.
We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.Read more
The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.Read more
The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.Read more
By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.Read more