现有的文本特征选择方法都是串行化的, 应用于海量文本数据集时时间效率较低, 因此利用并行思想来提高文本特征选择的效率, 已成为文本挖掘领域的一个研究热点. 本文将 遗传算法和并行协同进化算法结合起来, 在粗糙集的基础上设计了一个并行协同进化遗传算法 并将该算法用于文本特征选择. 该方法采用遗传算法搜索特征, 利用并行协同进化算法来提高 时间效率, 从而较快地获得较具代表性的特征子集. 实验结果表明该方法是有效的.
Abstract
Most of existing text feature selection methods are serial and are inefficient timely to be applied to Chinese massive text data sets. So, it is a hotspot of text mining how to improve efficiency of text feature selection by means of parallel thinking. Combining genetic algorithm with parallel collaborative evolutionary, a parallel collaborative evolutionary genetic algorithm (PCEGA) based on rough sets was designed and used to select text features. The presented method took advantage of genetic algorithm to select features and employed parallel collaborative evolutionary to enhance time efficiency, so that the more representative feature subsets was acquired quickly. Experimental results show that the method is effective.
关键词
特征选择 /
文本挖掘 /
遗传算法 /
协同进化 /
粗糙集
{{custom_keyword}} /
Key words
feature selection /
text ming /
genetic algorithm /
collaborative evolutionary /
rough sets
{{custom_keyword}} /
中图分类号:
O231.1
{{custom_clc.code}}
({{custom_clc.text}})
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nguyen M H, Torre F D. Optimal feature selection for support vector machines[J]. Pattern Recognition, 2010, 43(3): 584-591.
[2] Liu H W, Sun J G, Liu L. Feature selection with dynamic mutual information[J]. Pattern Recognition, 2009, 42(7): 1330-1339.
[3] Destrero A, Mosci S, Mol C D. Feature selection for high-dimensional data[J]. Computational Management Science, 2009, 6(1): 25-40.
[4] Xu Y. A formal study of feature selection in text categorization[J]. Journal of Communication and Computer, 2009, 6(4): 32-41.
[5] Nandi B, Barman S, Paul S. Genetic algorithm based optimization of clustering in ad-hoc networks[J]. International Journal of Computer Science and Information Security, 2010, 7(1): 165-169.
[6] Lung R I, Chira C, Dumitrescu D. An agent-based collaborative evolutionary model for multimodal optimization[C]// Proceedings of the 2008 GECCO Conference Companion on Genetic and Evolutionary Computation, USA: Atlanta, 2008: 1969-1976.
[7] 胡寿松,何亚群. 粗糙决策理论与应用[M]. 北京:北京航空航天大学出版社, 2006. Hu S S, He Y Q. Rough Decision Theory and Application[M]. Beijing: Beihang University Press, 2006.
[8] Lung R I, Dumitrescu D. A new collaborative evolutionary-swarm optimization technique[C]// Proceedings of the 2007 GECCO Conference Companion on Genetic and Evolutionary Computation, England: London, 2007: 2817-2820.
[9] Gog A, Dumitrescu D, Hirsbrunner B. Collaborative evolutionary algorithms for combinatorial optimization[C]// Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, England: London, 2007: 1511-1517.
[10] 于晓义,孙树栋,褚崴.基于并行协同进化遗传算法的多协作车间计划调度[J].计算机集成制造系统, 2008, 14(5): 991-1000.Yu X Y, Sun S D, Chu W. Parallel collaborative evolutionary genetic algorithm for multi-workshop planning and scheduling problems[J]. Computer Integrated Manufacturing Systems, 2008, 14(5): 991-1000.
[11] 谷建军. 粗糙集理论在数据约简中的应用研究[D]. 济南: 山东师范大学, 2007. Gu J J. Application of rough set theory in data reduction[D]. Jinan: Shandong Normal University, 2007.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(12CGL004); 兰州交通大学青年科学研究基金(2011005)
{{custom_fund}}