基于k-近邻互信息的前向式变量选择方法及在水质参数软测量中的应用

王威, 阳春华, 韩洁, 李文婷, 李勇刚

系统工程理论与实践 ›› 2022, Vol. 42 ›› Issue (1) : 253-261.

PDF(803 KB)
PDF(803 KB)
系统工程理论与实践 ›› 2022, Vol. 42 ›› Issue (1) : 253-261. DOI: 10.12011/SETP2020-2882
论文

基于k-近邻互信息的前向式变量选择方法及在水质参数软测量中的应用

    王威, 阳春华, 韩洁, 李文婷, 李勇刚
作者信息 +

Forward variable selection method based on k-nearest neighbor mutual information and its application in soft sensor modeling of water quality parameters

    WANG Wei, YANG Chunhua, HAN Jie, LI Wenting, LI Yonggang
Author information +
文章历史 +

摘要

软测量技术通过构造易测量的辅助变量与难测量的主导变量间的数学模型,实现对难测变量的实时预测.为有效分析辅助变量间的相关性和冗余性并实现变量精选,本文提出了一种基于k-近邻互信息的前向式变量选择方法.该方法以变量前向累加互信息值最大化为准则选择相关变量,同时计算每次累加变量与已选择变量子集间的互信息值来判断所累加变量是否为冗余变量,通过设定冗余互信息阈值,实现冗余变量的剔除,得到最优辅助输入变量子集.基于数值案例仿真结果验证了本文所提变量选择方法的可行性与有效性,在准确选取辅助变量的同时降低了算法复杂度.最后,该方法成功应用于污水处理过程中出水生化需氧量(biochemical oxygen demand,BOD)预测模型的输入变量选择上,利用精选出的辅助变量有效提高了模型预测精度.

Abstract

Soft sensor technology realizes real-time prediction of difficult-to-measure variables by constructing a mathematical model between easy-to-measure auxiliary variables and difficult-to-measure primary variables. In order to effectively analyze the correlation and redundancy between variables and realize the selection of auxiliary variables, this paper proposes a forward variable selection method based on k-nearest neighbor mutual information. Based on the criterion of maximizing the forward cumulative mutual information value of input variables to select correlated variable, and the redundant mutual information value between each new added variable and the subset of selected variables is calculated to judge whether the added variable is redundant variables. By setting the threshold of redundant mutual information value, the redundant variables are eliminated, and the optimal subset of auxiliary input variables can be obtained. The simulation results based on a numerical case verify the feasibility and effectiveness of the variable selection method proposed in this paper, which not only accurately selects auxiliary variables but also reduces the complexity of the algorithm. Finally, the method was successfully applied to the selection of input variables for the effluent biochemical oxygen demand (BOD) prediction model in the wastewater treatment process, and the selected auxiliary variables were used to effectively improve the prediction accuracy of the model.

关键词

软测量 / k-近邻互信息 / 前向式变量选择 / 相关性 / 冗余性

Key words

soft sensor / k-nearest neighbor mutual information / forward variable selection / correlation / redundancy

引用本文

导出引用
王威 , 阳春华 , 韩洁 , 李文婷 , 李勇刚. 基于k-近邻互信息的前向式变量选择方法及在水质参数软测量中的应用. 系统工程理论与实践, 2022, 42(1): 253-261 https://doi.org/10.12011/SETP2020-2882
WANG Wei , YANG Chunhua , HAN Jie , LI Wenting , LI Yonggang. Forward variable selection method based on k-nearest neighbor mutual information and its application in soft sensor modeling of water quality parameters. Systems Engineering - Theory & Practice, 2022, 42(1): 253-261 https://doi.org/10.12011/SETP2020-2882
中图分类号: TP274   

参考文献

[1] Ge Z Q. Process data analytics via probabilistic latent variable models:A tutorial review[J]. Industrial & Engineering Chemistry Research, 2018, 57(38):12646-12661.
[2] Khatibisepehr S, Huang B, Khare S. Design of inferential sensors in the process industry:A review of bayesian methods[J]. Journal of Process Control, 2013, 23(10):1575-1596.
[3] Abeykoon C. A novel soft sensor for real-time monitoring of the die melt temperature profile in polymer extrusion[J]. IEEE Transactions on Industrial Electronics, 2014, 61(12):7113-7123.
[4] Yuan X F, Gu Y J, Wang Y L, et al. A deep supervised learning framework for data-driven soft sensor modeling of industrial processes[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 69(10):7953-7961.
[5] Liu Y Q, Liu B, Zhao X J, et al. Development of RVM-based multiple-output soft sensors with serial and parallel stacking strategies[J]. IEEE Transactions on Control Systems Technology, 2018, 27(6):2727-2734.
[6] Liu Z W, Ge Z Q, Chen G J, et al. Adaptive soft sensors for quality prediction under the framework of bayesian network[J]. Control Engineering Practice, 2018, 72(3):19-28.
[7] Yuan X F, Ou C, Wang Y L, et al. Deep quality-related feature extraction for soft sensing modeling:A deep learning approach with hybrid VW-SAE[J]. Neurocomputing, 2020, 396:375-382.
[8] 李东, 黄道平, 刘乙奇. 基于协同训练的半监督异构自适应软测量建模方法的研究[J]. 化工学报, 2020, 71(5):2128-2138. Li D, Huang D P, Liu Y Q. Research on semi-supervised heterogeneous adaptive co-training soft-sensor model[J]. CIESC Journal, 2020, 71(5):2128-2138.
[9] Qiu Y, Liu Y Q, Huang D P. Date-driven soft-sensor design for biological wastewater treatment using deep neural networks and genetic algorithms[J]. Journal of Chemical Engineering of Japan, 2016, 49(10):925-936.
[10] Rauber T W, Assis Boldt de F, Varejao F M. Heterogeneous feature models and feature selection applied to bearing fault diagnosis[J]. IEEE Transactions on Industrial Electronics, 2015, 62(1):637-646.
[11] 李灵, 王雅琳, 孙备. 一种分步约简的炼油生产敏感变量选择方法[J]. 化工学报, 2020, 71(5):2173-2181. Li L, Wang Y L, Sun B. Fractional step reduction method for sensitive variable selection of refining processes[J]. CIESC Journal, 2020, 71(5):2173-2181.
[12] 范菊逸, 詹铭峰, 蔡宗武, 等. 带有变量选择的协变量平衡倾向得分的估计:基于GMM-LASSO方法[J]. 系统工程理论与实践, 2021, 41(10):2631-2639. Fan J Y, Zan M F, Cai Z W, et al. Covariate balancing in propensity score estimation with variable selection:Based on GMM-LASSO approach, 2021, 41(10):2631-2639.
[13] Lu B, Castillo I, Chiang L, et al. Industrial PLS model variable selection using moving window variable importance in projection[J]. Chemometrics and Intelligent Laboratory Systems, 2014, 135:90-109.
[14] 熊富强, 桂卫华, 阳春华, 等. 基于PLS-LSSVM方法的湿法炼锌过程预测建模[J]. 仪器仪表学报, 2011, 32(4):941-948. Xiong F Q, Gui W H, Yang C H, et al. Forecasting modeling of zinc hydrometallurgy process based on PLS-LSSVM approach[J]. Chinese Journal of Scientific Instrument, 2011, 32(4):941-948.
[15] 刘博, 万金泉, 黄明智, 等. 基于PCA-LSSVM的厌氧废水处理系统出水VFA在线预测模型[J]. 环境科学学报, 2015, 35(6):1768-1778. Liu B, Wan J Q, Huang M Z, et al. A PCA-LSSVM model for online prediction of the effluent VFA in an anaerobic wastewater treatment system[J]. Acta Scientiae Circumstantiae, 2015, 35(6):1768-1778.
[16] Zheng R J, Pan F. Soft sensor modeling of product concentration in glutamate fermentation using gaussian process regression[J]. American Journal of Biochemistry and Biotechnology, 2016, 12(3):179-187.
[17] Feng J, Jiao L C, Liu F, et al. Mutual-information-based semi-supervised hyperspectral band selection with high discrimination, high information, and low redundancy[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(5):2956-2969.
[18] Kraskov A, Stogbauer H, Grassberger P. Estimating mutual information[J]. Physical review E, 2004, 69(6):1-16.
[19] 赵荣荣, 赵忠盖, 刘飞. 基于k-近邻互信息的发酵过程高斯过程回归建模[J]. 化工学报, 2019, 70(12):4741-4748. Zhao R R, Zhao Z G, Liu F. Gaussian process regression modeling of fermentation process based on k-nearest neighbor mutual information[J]. CIESC Journal, 2019, 70(12):4741-4748.
[20] Battiti R. Using mutual information for selecting features in supervised neural net learning[J]. IEEE Transactions on Neural Networks, 1994, 5(4):537-550.
[21] Peng H C, Long F H, Ding C. Feature Selection based on mutual information:Criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8):1226-1238.
[22] 阮宏镁, 田学民, 王平. 基于联合互信息的动态软测量方法[J]. 化工学报, 2014, 65(11):4497-4502. Ruan H M, Tian X M, Wang P. Dynamic soft sensor method based on joint mutual information[J]. CIESC Journal, 2014, 65(11):4497-4502.
[23] Fleuret F. Fast binary feature selection with conditional mutual information[J]. Journal of Machine Learning Research, 2004, 5:1531-1555.
[24] 周红标, 乔俊飞. 基于高维k-近邻互信息的特征选择方法[J]. 智能系统学报, 2017, 12(5):595-600. Zhou H B, Qiao J F. Feature selection method based on high dimensional k-nearest neighbors mutual information[J]. CAAI Transactions on Intelligent Systems, 2017, 12(5):595-600.
[25] Friedman J H. Multivariate adaptive regression splines[J]. The Annals of Statistics, 1991, 19(1):1-67.
[26] Morris L, Colombo V, Hassell K, et al. Municipal wastewater effluent licensing:A global perspective and recommendations for best practice[J]. Science of the Total Environment, 2017, 580(1):1327-1339.
[27] Jin L Y, Zhang G M, Tian H F. Current State of Sewage Treatment in China[J]. Water Research, 2014, 66(12):85-98.
[28] 张松兰, 王鹏, 徐子伟. 基于统计相关的缺失值数据处理研究[J]. 统计与决策, 2016(12):13-16. Zhang S L, Wang P, Xu Z W. Research on missing value data processing based on statistical correlation[J]. Statistics & Decision, 2016(12):13-16.
[29] Liu Y Q, Huang D P, Li Y. Development of interval soft sensors using enhanced just-in-time learning and inductive confidence predictor[J]. Industrial & Engineering Chemistry Research, 2012, 51(8):3356-3367.
[30] Tipping M E. Sparse bayesian learning and the relevance vector machine[J]. Journal of Machine Learning Research, 2001, 1(3):211-244.

基金

国家自然科学基金重大项目(61890932);中南大学中央高校基本科研业务费专项资金(2021zzts0696)
PDF(803 KB)

1891

Accesses

0

Citation

Detail

段落导航
相关文章

/