基于双重群组套索的高维空间多值处置效应估计

马键, 胡毅, 林建浩

系统工程理论与实践 ›› 2018, Vol. 38 ›› Issue (11) : 2750-2761.

PDF(882 KB)
PDF(882 KB)
系统工程理论与实践 ›› 2018, Vol. 38 ›› Issue (11) : 2750-2761. DOI: 10.12011/1000-6788(2018)11-2750-12
论文

基于双重群组套索的高维空间多值处置效应估计

    马键1, 胡毅2, 林建浩3
作者信息 +

A grouped LASSO double selection estimator for multi-valued treatment effects in high dimensional sparse space

    MA Jian1, HU Yi2, LIN Jianhao3
Author information +
文章历史 +

摘要

罗宾因果推断模型在非实验数据分析中具有重要地位,但对高维数据分析,古典低维空间处置效应估计量往往表现欠佳.本文结合高维空间下的双重选择估计与群组套索回归,提出一种估计高维稀疏空间下多值处置效应的双重群组套索估计方法.数值模拟发现,对于因果参数估计,双重群组套索估计的经验功效接近理论值,而预测性套索回归则存在较大的功效偏差.对教育生产函数的案例研究发现,该方法可以有效地从多个备选控制变量中选出正确的控制变量,仅有一个噪声变量被错误选择.

Abstract

The Rubin causal model is a cornerstone in observational data analysis. However, classical treatment effect estimators do not perform well in high dimensional space. This article combines the post-LASSO double selection method and the grouped LASSO method to construct the grouped LASSO double selection estimator (GLDSE) for multi-valued treatment effects in high dimensional sparse space. The numerical simulation confirms that, the predictive LASSO regression shows significant empirical size biases, while the GLDSE has much lower biases. A case study about educational production function shows that, out of many potential control variables, the GLDSE selects five true variables from them. Moreover, only one noise variable is selected.

关键词

罗宾因果推断模型 / 高维稀疏性 / 群组套索回归

Key words

Rubin causal model / high dimensional sparsity / grouped LASSO

引用本文

导出引用
马键 , 胡毅 , 林建浩. 基于双重群组套索的高维空间多值处置效应估计. 系统工程理论与实践, 2018, 38(11): 2750-2761 https://doi.org/10.12011/1000-6788(2018)11-2750-12
MA Jian , HU Yi , LIN Jianhao. A grouped LASSO double selection estimator for multi-valued treatment effects in high dimensional sparse space. Systems Engineering - Theory & Practice, 2018, 38(11): 2750-2761 https://doi.org/10.12011/1000-6788(2018)11-2750-12
中图分类号: F064.1   

参考文献

[1] Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies[J]. Journal of Educational Psychology, 1974, 66(5):688-701.
[2] 王美今, 林建浩. 计量经济学应用研究的可信性革命[J]. 经济研究, 2012(2):120-132.Wang M J, Lin J H. The credibility revolution in applied econometric research[J]. Economic Research Journal, 2012(2):120-132.
[3] Heckman J J. Building bridges between structural and program evaluation approaches to evaluating policy[J]. Journal of Economic Literature, 2010, 48(2):356-398.
[4] Imbens G W, Rubin D B. Causal inference in statistics, social, and biomedical sciences[J]. Cambridge University Press, 2015.
[5] Imbens G W, Wooldridge J M. Recent developments in the econometrics of program evaluation[J]. Journal of Economic Literature, 2009, 47(1):5-86.
[6] Tibshirani R. Regression shrinkage and selection via the Lasso[J]. Journal of The Royal Statistical Society, Series B (Methodological), 1996, 58(1):267-288.
[7] Bickel P J, Ritov Y A, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector[J]. The Annals of Statistics, 2009, 37(4):1705-1732.
[8] Belloni A, Chernozhukov V, Fernández-Val I, et al. Program evaluation and causal inference with high dimensional data[J]. Econometrica, 2017, 85(1):233-298.
[9] 吴刘仓, 张忠占, 徐登可. 联合均值与方差模型的变量选择[J]. 系统工程理论与实践, 2012, 32(8):1754-1760.Wu L C, Zhang Z Z, Xu D K. Variable selection in joint mean and variance models[J]. Systems Engineering-Theory & Practice, 2012, 32(8):1754-1760.
[10] 吴武清, 汪成杰, 蒋勇,等. 高维数据选元:方法比较及其在纳税评估中的应用[J]. 管理评论, 2013(8):10-20.Wu W Q, Wang C J, Jiang Y, et al. Variable selection in high-dimensional data:Method comparison and its application in tax assessment[J]. Review of Managements, 2013(8):10-20.
[11] 陈艳, 王宣承. 基于变量选择和遗传网络规划的期货高频交易策略研究[J]. 中国管理科学, 2015(10):47-56.Chen Y, Wang X C. A study on high-frequency futures trading strategy based on variable selection and genetic network programming[J]. Chinese Journal of Management Science, 2015(10):47-56.
[12] 洪永淼, 方颖, 陈海强,等. 计量经济学与实验经济学的若干新近发展及展望[J]. 中国经济问题, 2016, 1(2):126-136.Hong Y M, Fang Y, Chen H Q, et al. The recent advances of econometrics and experimental economics[J]. China Economic Studies, 2016, 1(2):126-136.
[13] 刘丽萍. 大维数据背景下金融协方差阵的估计及应用[J]. 系统工程理论与实践, 2017, 37(3):597-606.Liu L P. Estimation and application study on the financial covariance matrix of large dimensional data[J]. Systems Engineering-Theory & Practice, 2017, 37(3):597-606.
[14] 杨青, 武高宁, 王丽珍. 大数据:数据驱动下的工程项目管理新视角[J]. 系统工程理论与实践, 2017, 37(3):710-719.Yang Q, Wu G N, Wang L Z. Big data:A new perspective of the engineering project management driven by data[J]. Systems Engineering-Theory & Practice, 2017, 37(3):710-719.
[15] Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity[M]. CRC Press, 2015.
[16] McCaffrey D F, Ridgeway G, Morral A R. Propensity score estimation with boosted regression for evaluating causal effects in observational studies[J]. Psychological Methods, 2004, 9(4):403-425.
[17] Athey S, Imbens G W, Wager S. Approximate residual balancing:De-biased inference of average treatment effects in high dimensions[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2018, 80(4):597-623.
[18] Belloni A, Chernozhukov V, Hansen C. Inference on treatment effects after selection among high-dimensional controls[J]. The Review of Economic Studies, 2014, 81(2):608-650.
[19] Farrell M H. Robust inference on average treatment effects with possibly more covariates than observations[J]. Journal of Econometrics, 2015, 189(1):1-23.
[20] Chernozhukov V, Chetverikov D, Demirer M, et al. Double machine learning for treatment and causal parameters[J]. Arxiv, 2016.
[21] Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1):41-55.
[22] Egami N, Imai K. Causal interaction in factorial experiments:Application to conjoint analysis[R]. Working Paper, 2016.
[23] Wooldridge J M. Econometric analysis of cross section and panel data[M]. The MIT Press, 2010.
[24] Crump R K, Hotz V J, Imbens G W, et al. Dealing with limited overlap in estimation of average treatment effects[J]. Biometrika, 2009, 96(1):187-199.
[25] Wooldridge J M. Further results on instrumental variables estimation of average treatment effects in the correlated random coefficient model[J]. Economics Letters, 2003, 79(2):185-191.
[26] Wooldridge J M. Instrumental variables estimation of the average treatment effect in the correlated random coefficient model[J]. Advances in Econometrics, 2008, 21:93-117.
[27] Belloni A, Chen D, Chernozhukov V, et al. Sparse models and methods for optimal instruments with an application to eminent domain[J]. Econometrica, 2012, 80(6):2369-2429.
[28] Boyd S, Vandenberghe L. Convex optimization[M]. Cambridge University Press, 2004.
[29] Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent[J]. Journal of Statistical Software, 2010, 33(1):1-22.
[30] Krueger A B. Experimental estimates of education production functions[J]. The Quarterly Journal of Economics, 1999, 114(2):497-532.
[31] Mosteller F. The tennessee study of class size in the early school grades[J]. The Future of Children, 1995, 5(2):113-127.
[32] Dee T S. Teachers, race, and student achievement in a randomized experiment[J]. The Review of Economics and Statistics, 2004, 86(1):195-210.
[33] Rivkin S G, Hanushek E A, Kain J F. Teachers, schools, and academic achievement[J]. Econometrica, 2005, 73(2):417-458.
[34] 张海峰, 姚先国, 张俊森. 教育质量对地区劳动生产率的影响[J]. 经济研究, 2010, 7(9):57-67.Zhang H F, Yao X G, Zhang J S. The impact of school quality on regional labor productivity[J]. Economic Research Journal, 2010, 7(9):57-67.
[35] Stock J H, Watson M W. Introduction to econometrics[M]. Addison Wesley Boston, 2003.

基金

国家自然科学基金(71503056);国家公派高级研究学者、访问学者、博士后项目(201608440100);国家社会科学基金(18AJL004,16CJL010)
PDF(882 KB)

448

Accesses

0

Citation

Detail

段落导航
相关文章

/