基于多源域知识迁移学习的小微企业信用评分

方匡南, 李晶茂, 范新妍, 余乐安

系统工程理论与实践 ›› 2023, Vol. 43 ›› Issue (5) : 1320-1332.

PDF(573 KB)
PDF(573 KB)
系统工程理论与实践 ›› 2023, Vol. 43 ›› Issue (5) : 1320-1332. DOI: 10.12011/SETP2021-2526
论文

基于多源域知识迁移学习的小微企业信用评分

    方匡南1,2, 李晶茂1, 范新妍3, 余乐安4
作者信息 +

Credit scoring of small and micro enterprises using multi-source information transfer learning

    FANG Kuangnan1,2, LI Jingmao1, FAN Xinyan3, YU Lean4
Author information +
文章历史 +

摘要

针对新业务,新场景下金融机构目标数据集“高维小样本”的问题,本文提出了基于多源域知识迁移学习的小微企业信用风险测度方法,其能够迁移学习其它数据源(源域)的知识以提升目标域模型的预测效果.该方法通过对来自多个源域的多种源域知识进行归纳提取,进而将其纳入目标域模型的构建中,可以充分利用源域知识,提升目标域模型的估计精度.另外,模型无需获取各源域的原始数据,因此很大程度上降低了数据传输中隐私泄露的风险.模拟实验和企业信用评分的实例数据验证了所提方法的可行性及其在变量选择,系数估计和分类预测上的良好效果.该方法能够在隐私限制的背景下有效迁移源域知识以克服信用评分中目标数据集信息量不足,而导致估计效果较差的问题.

Abstract

When building credit scoring models for new products or businesses, financial institutions often encounter the "high-dimensional, small samples" problem which results in unsatisfactory model performance. We propose a credit scoring method for small and micro enterprises based on a multi-source transfer learning technique. This method can transfer knowledge from other data sources (source domains) to improve the prediction performance of the credit scoring model of the target domain. Specifically, we first extract the multi-form knowledge from each source domain and then incorporate the information into the building process of the target domain model. The proposed method can take full advantage of the knowledge from source domains, and improve the prediction accuracy of the target domain model. In addition, throughout the modeling process, there is no need to obtain the original data of each source domain, which greatly reduces the risk of privacy leakage during data transmission. Simulation studies and real data analysis illustrate the superior performance of the method on variable selection, estimation, and prediction aspects. This method can effectively transfer the source domain information under privacy-preserving constraints to overcome the "high-dimensional, small samples" problem in the target data set.

关键词

多源域知识 / 迁移学习 / 信用评分 / 小微企业

Key words

multi-source knowledge / transfer learning / credit scoring / small and micro enterprise

引用本文

导出引用
方匡南 , 李晶茂 , 范新妍 , 余乐安. 基于多源域知识迁移学习的小微企业信用评分. 系统工程理论与实践, 2023, 43(5): 1320-1332 https://doi.org/10.12011/SETP2021-2526
FANG Kuangnan , LI Jingmao , FAN Xinyan , YU Lean. Credit scoring of small and micro enterprises using multi-source information transfer learning. Systems Engineering - Theory & Practice, 2023, 43(5): 1320-1332 https://doi.org/10.12011/SETP2021-2526
中图分类号: F830.5   

参考文献

[1] Abdou H A, Pointon J. Credit scoring, statistical techniques and evaluation criteria: A review of the literature[J]. Intelligent Systems in Accounting, Finance and Management, 2011, 18(2-3): 59-88.
[2] 叶强, 刘作义, 孟庆峰, 等.互联网金融的国家战略需求和关键科学问题[J]. 中国科学基金, 2016(2): 150-158. Ye Q, Liu Z Y, Meng Q F, et al. National strategic demand and key scientific issues in relation to internet finance[J]. China Science Foundation, 2016(2): 150-158.
[3] Louzada F, Ara A, Fernandes G B. Classification methods applied to credit scoring: Systematic review and overall comparison[J]. Surveys in Operations Research & Management Science, 2016, 21(2): 117-134.
[4] 解维敏,吴浩,冯彦杰.数字金融是否缓解了民营企业融资约束?[J].系统工程理论与实践, 2021, 41(12): 3129-3146. Xie W M, Wu H, Feng Y J. Does digital finance ease the financing constraints of private enterprises?[J]. Systems Engineering - Theory & Practice, 2021, 41(12): 3129-3146.
[5] 曾燕, 杨雅婷, 徐凤敏,等. 消费金融研究综述[J]. 系统工程理论与实践, 2022, 42(1): 84-109. Zeng Y, Yang Y T, Xu F M, et al. A survery on commercial finance[J]. Systems Engineering - Theory & Practice, 2022, 42(1): 84-109.
[6] Einav L, Jenkins M, Levin J. The impact of credit scoring on consumer lending[J]. RAND Journal of Economics, 2013, 44(2): 249-274.
[7] 王正位, 周从意, 廖理, 等. 消费行为在个人信用风险识别中的信息含量研究[J]. 经济研究, 2020, 55(1): 149-163. Wang Z W, Zhou C Y, Liao L, et al. Informational content of consumption behavior in consumer credit risk evaluation[J]. Economic Research Journal, 2020, 55(1): 149-163.
[8] Nie G, Rowe W, Zhang L Y, et al. Credit card churn forecasting by logistic regression and decision tree[J]. Expert Systems with Applications, 2011, 38(12): 15273-15285.
[9] 胡毅,王珏,杨晓光.基于面板Logit模型的银行客户贷款违约风险预警研究[J].系统工程理论与实践, 2015, 35(7): 1752-1759. Hu Y, Wang J, Yang X G. A study on early warning of bank customer loan defaults based on panel logit model[J]. Systems Engineering - Theory & Practice, 2015, 35(7): 1752-1759.
[10] Danenas P, Garsva G. Selection of support vector machines based classifiers for credit risk domain[J]. Expert Systems with Applications, 2015, 42(6): 3194-3204.
[11] 张奇,胡蓝艺,王珏.基于Logit与SVM的银行业信用风险预警模型研究[J].系统工程理论与实践, 2015, 35(7): 1784-1790. Zhang Q, Hu L Y, Wang J. Study on credit early warning based on logit and SVM[J]. Systems Engineering - Theory & Practice, 2015, 35(7): 1784-1790.
[12] 迟国泰,李鸿禧.基于逐步判别分析的小企业债信评级模型及实证[J].管理工程学报, 2019, 33(4): 205-215.Chi G T, Li H X. Debt rating model of small businesses and empirical analysis based on stepwise discriminant[J]. Journal of Industrial Engineering and Engineering Management, 2019, 33(4): 205-215.
[13] 姚潇, 李可, 余乐安. 非平衡样本下基于生成对抗网络过抽样技术的公司债券违约风险预测研究[J]. 系统工程理论与实践, 2022, 42(10): 2617-2634. Yao X, Li K, Yu L A. Research on corporate bond default risk prediction based on generative adversarial network oversampling technology under unbalanced samples[J]. Systems Engineering - Theory & Practice, 2022, 42(10): 2617-2634.
[14] 余乐安,张有德.基于关联规则赋权特征选择集成的信用分类研究[J].系统工程理论与实践, 2020, 40(2): 366-372. Yu L A, Zhang Y D. Weight-selected attribute bagging based on association rule for credit dataset classification[J]. Systems Engineering - Theory & Practice, 2020, 40(2): 366-372.
[15] 章彤,迟国泰.基于最优信用特征组合的违约判别模型-以中国A股上市公司为例[J].系统工程理论与实践, 2020, 40(10): 2546-2562. Zhang T, Chi G T. Default discriminant study based on optimal credit feature set: A case study of China A-share listed companies[J]. Systems Engineering - Theory & Practice, 2020, 40(10): 2546-2562.
[16] Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267-288.
[17] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties[J]. Journal of the American Statistical Association, 2001, 96(456): 1348-1360.
[18] Zhang C H. Nearly unbiased variable selection under minimax concave penalty[J]. The Annals of Statistics, 2010, 38(2): 894-942.
[19] 方匡南, 章贵军, 张惠颖. 基于lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2): 125-136. Fang K N, Zhang G J, Zhang H Y. Individual credit risk prediction method: Application of a lasso-logistic model[J]. The Journal of Quantitative & Technical Economics, 2014, 31(2): 125-136
[20] 方匡南,范新妍,马双鸽.基于网络结构Logistic模型的企业信用风险预警[J].统计研究, 2016, 33(4): 50-55. Fang K N, Fan X Y, Ma S G. Forecasting of enterprise's credit risk based on network-logistic model[J]. Statistical Rearch, 2016, 33(4): 50-55.
[21] Ma S, Huang J, Song X. Integrative analysis and variable selection with multiple high-dimensional data sets[J]. Biostatistics, 2011, 12(4): 763-775.
[22] 马双鸽,王小燕,方匡南.大数据的整合分析方法[J]. 统计研究, 2015, 32(11): 3-11. Ma S G, Wang X Y, Fang K N. Integrative analysis for big data[J]. Statistical Research, 2015, 32(11): 3-11.
[23] Fang K, Fan X, Zhang Q, et al. Integrative sparse principal component analysis[J]. Journal of Multivariate Analysis, 2018, 166: 1-16.
[24] 方匡南,赵梦峦.基于多源数据融合的个人信用评分研究[J].统计研究, 2018, 35(12): 92-101. Fang K N, Zhao M L. A study on credit scoring based on multi-source data integration[J]. Statistical Research, 2018, 35(12): 92-101.
[25] Fan X, Fang K, Ma S, et al. Integrating approximate single factor graphical models[J]. Statistics in Medicine, 2020, 39(2): 146-155.
[26] McMahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]// Artificial Intelligence and Statistics, PMLR, 2017: 1273-1282.
[27] Kairouz P, McMahan H B, Avent B, et al. Advances and open problems in federated learning[J]. arXiv preprint arXiv: 1912.04977, 2019.
[28] Xie M, Singh K, Strawderman W E. Confidence distributions and a unifying framework for meta-analysis[J]. Journal of the American Statistical Association, 2011, 106(493): 320-333.
[29] Shen J, Liu R Y, Xie M. Ifusion: Individualized fusion learning[J]. Journal of the American Statistical Association, 2020, 115(531): 1251-1267.
[30] Jiang Y, He Y, Zhang H. Variable selection with prior information for generalized linear models via the prior lasso method[J]. Journal of the American Statistical Association, 2016, 111(513): 355-376.
[31] Breheny P, Huang J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection[J]. The Annals of Applied Statistics, 2011, 5(1): 232-253.

基金

国家自然科学基金面上项目(72071169);国家自然科学基金重点项目(72233002)
PDF(573 KB)

935

Accesses

0

Citation

Detail

段落导航
相关文章

/