Anonymous:
On April 2, you were talking about outlier and how to do with this issue. One of the techniques that you suggested is quantile regression, which “models the quantile of the distribution.” I was not very clear about what the quantile regression model, so I went back to listen again to that part of the lecture where you discussed the quantile regression (from 00:37:25 to 00:43:14). I also checked in Wikipedia.org which said “quantile regression results in estimates approximating either the median or other quantiles of the response variable.” It is still not very completely clear though.
However, from what I understand, the quantile regression model connects all the same quantile of distribution of dependent variable (Y) at given level of independent variable (X), just like the classical regression model connect all the mean of the distribution of Y at given level of X. And you also said that when we write the estimated quantile, there is no error term “because it is quantile”, the same way as we do not have error term for expected value of X. For example, the 50th or median is written as Q_.5(Y) = beta0 +beta1*X.
So my question is: “What is model that generates the data in the case of quantile regression model?” The corresponding linear regression model is written as Y=beta0+beta1*X+ epsilon. Should the quantile regression model, say at 50th quantile, be Y_.5= beta0+beta1*X+ epsilon? What restrictions (or assumptions) do we need to put on epsilon? You said that the advantage of quantile regression model is that it does not require normality or homoskedasticity. You also said that we can add X^2 term into the quantile regression model to have curvature. Therefore, the quantile model does not require any of the assumptions of the classical linear regression model meaning linearity, normality and constant variance, right?
My other question is “What data are going to be generated from the quantile regression model?” For example, if I look at the 50th quantile, does it mean that the quantile regression model is going to generate the bottom 50 percent of the data? So if I need a model to generate the whole data, we have to use 100th quantile?
Thank you,
49329
You said, "However, from what I understand, the quantile regression model connects all the same quantile of distribution of dependent variable (Y) at given level of independent variable (X), just like the classical regression model connect all the mean of the distribution of Y at given level of X. " Right. This gives you a new perspective on the ordinary regression model, right? It helps understand the main point that it is a model for the mean. But you could as easily model the median , or the 25th percentile.
Then you said, "there is no error term “because it is quantile”, the same way as we do not have error term for expected value of X." Right, but you mean "expected value of Y" not X.
As far as the model goes, you students are all too hung up on epsilon, just like the previous post. You really don;t need it. The model is p(Y|X=x) = f(x; theta). It is just a statement that there is a distribution of Y for each X=x. There is no need for an error term. Probably, the error term causes more misunderstanding than anything else.
Now, when we model quantile regression, we have to assume that the quantiles of these distributions p(Y|X=x) follow a linear function. That is the assumption. So, there is a linearity assumption; or more generally a "correct functional specification" assumption. After all, if you assume a quadratic, it's still an assumption. Also, independence of observations is needed.
As far as model produces data, it is simple. The model produces half of the data below the .5 quantile and half above. Similarly, the model produces 20% of the data below the .2 quantile and 80% above. It all follows from p(Y|X=x)) = f(x; theta). Please review the concept of a "probability distribution." That seems to me to be the source of your confusion.
90 100 90 90