Stata Programs

  • The package lassopack implements lasso ( Tibshirani 1996), square-root lasso ( Belloni et al. 2011), elastic net ( Zou & Hastie 2005), ridge regression ( Hoerl & Kennard 1970), adaptive lasso ( Zou 2006) and post-estimation OLS. lassopack also supports logistic lasso.

  • pdslasso offers methods to facilitate causal inference in structural models. The package allows to select control variables and/or instruments from a large set of variables in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest.

  • pystacked implements stacking regression ( Wolpert, 1992) via scikit-learn’s sklearn.ensemble.StackingRegressor and sklearn.ensemble.StackingClassifier. Stacking is a way of combining predictions from multiple supervised machine learners (the “base learners”) into a final prediction to improve performance.

  • ddml implements Double/Debiased Machine Learning (DDML) for Stata. Five different estimators are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables. ddml is compatible with many existing supervised machine learning programs in Stata.

I maintain a separate website for these packages: ↪statalasso.github.io/

R Packages

ddml is an implementation of double/debiased machine learning estimators as proposed by Chernozhukov et al. (2018). The key feature of ddml is the straightforward estimation of nuisance parameters using (short-)stacking (Wolpert, 1992), which allows for multiple machine learners to increase robustness to the underlying data generating process.

Please see our package vignette for more information. The package is maintained by Thomas Wiemann.