Lab 10: Double LASSO

## Monte Carlo simulations ```{r} library(hdm) ?rlassoEffect n=100 R=300 rho=1 beta1=0 beta2=0.35 ``` Write a function for generating data: ```{r} data_sim<-function(n,beta1,beta2,rho){ X=matrix(rnorm(n*3),ncol=3) X[,2]<-rho*X[,1]+X[,2] Y=beta1*X[,1]+beta2*X[,2]+rnorm(n) data<-list(Y=Y,X=X) } ``` Generate data on the main regressor (D), potential controls, and the dependent variable: ```{r} set.seed(5,sample.kind = "Rejection") data<-data_sim(n,beta1,beta2,rho) y=data$Y #dep. variable Controls=data$X[,-1] # controls D=data$X[,1] # the main regressor for which the effect is estimated ``` Run double LASSO: ```{r} Effect<-rlassoEffect(Controls,y,D,method="double selection") summary(Effect) ``` Objects inside: ```{r} names(Effect) ``` Included controls and t-statistic on D: ```{r} Effect$selection.index Effect$t ``` We run the simulations using the setup from lab 9. ```{r} rho=1 set.seed(6064,sample.kind = "Rejection") T_Beta1_post=rep(0,R) # Vector to store t-stats for the main regressor for (r in 1:R){ data<-data_sim(n,beta1,beta2,rho) Effect<-rlassoEffect(data$X[,-1],data$Y,data$X[,1],method="double selection") T_Beta1_post[r]=Effect$t } ``` Plot of the distribution of the post-double-Lasso $t$-statistic: ```{r} low=min(T_Beta1_post) high=max(T_Beta1_post) step=(high-low)/20 hist(T_Beta1_post,breaks=seq(low-2*step,high+2*step,step),xlab="estimates",main="The exact distribution of the post-Double-LASSO t-statistic vs N(0,1)",freq=FALSE,ylim=c(0,0.5)) # add a vertical line at the true value abline(v=beta1,col="blue") # add the plot of the N(0,1) pdf x=seq(-4,4,0.01) f=exp(-x^2/2)/sqrt(2*pi) lines(x,f,col="red") ``` ## Illustration of double LASSO with cross country growth data The model is $\Delta\log (GDP_{it})=\alpha \cdot GDP_{i0}+U_i$. Hypothesis: $\alpha <0$. Less developed countries catch up with more developed. ```{r} data("GrowthData") ?GrowthData names(GrowthData) ``` The hypothesis fails: ```{r} summary(lm(Outcome~gdpsh465,data=GrowthData)) ``` An alternative model controls for the institutional and technological characteristics: $\Delta\log (GDP_{it})=\alpha \cdot GDP_{i0}+X_i'\beta+U_i$. There are a lot of potential controls: ```{r} dim(GrowthData) ``` Let's set up estimation ```{r} names(GrowthData) y=as.vector(GrowthData$Outcome) D=as.vector(GrowthData$gdpsh465) Controls=as.matrix(GrowthData)[,-c(1,2,3)] ``` We run OLS with all controls. The estimate is negative but the standard error is too large, since there are too many controls. ```{r} Full=lm(y~D+Controls) head(coef(summary(Full)),2) ``` Post-LASSO with Double LASSO ```{r} Effect<-rlassoEffect(Controls,y,D,method="double selection") summary(Effect) ``` Included controls: ```{r} Effect$selection.index sum(Effect$selection.index==TRUE) ``` ## The partialling out approach ```{r} Effect_PO<-rlassoEffect(Controls,y,D,method="partialling out") summary(Effect_PO) ``` ```{r} Effect_PO$selection.index sum(Effect_PO$selection.index==TRUE) ```