基于信任的真实数据判定方法

余祖坤, 许景楠, 郑小林, 陈德人

系统工程理论与实践 ›› 2013, Vol. 33 ›› Issue (9) : 2404-2414.

PDF(1315 KB)
PDF(1315 KB)
系统工程理论与实践 ›› 2013, Vol. 33 ›› Issue (9) : 2404-2414. DOI: 10.12011/1000-6788(2013)9-2404
研究论文

基于信任的真实数据判定方法

    余祖坤, 许景楠, 郑小林, 陈德人
作者信息 +

TruData:A trust-based method for truth data discovery

    YU Zu-kun, XU Jing-nan, ZHENG Xiao-lin, CHEN De-ren
Author information +
文章历史 +

摘要

互联网已经成为人们获取信息的重要来源, 然而互联网上提供的信息并不全是正确的, 甚至很多信息是相互冲突的. 这种情况一方面是因为很多数据源自身的数据来源也不可靠, 甚至是随意地复制得来的; 另一方面则是因为很多数据源在处理数据的过程中产生了错误, 导致不同的数据源提供的往往是互相矛盾的信息. 不真实信息将会导致使用者决策失误进而给他们造成重大的损失, 因此真实数据的判定是一个非常重要的研究热点. 该研究提出了一种基于信任的方法TruData来找出真实的数据. TruData方法基于数据源相似度构建数据源信任网络, 通过信任网络对数据源和数据项的信任度进行计算, 实现对互联网数据真实度的判定. 实验表明,该方法在对丰富数据集进行真实数据判定方面具有良好的效果.

Abstract

Internet has become an important platform for people to get information, but not all the information from internet is true (or correct). For one thing, many data sources get data from fallible providers or copy from other data sources. For another thing, data sources inevitably make some mistakes in the process of providing data and consequently different data sources frequently provide conflict data. Untrue data can cause a loss to users, so it is important to identify truth data. This work proposed an algorithm to compute trust network of data sources based on the degree of agreement among data sources. Based on the trust network of data sources, this work proposed a trust-based method, called "TruData", by way of computing the trust values of data sources and data items, to discovery truth data. The experiments show that TruData has a good result in identifying truth data for big data set.

关键词

信任 / 真实数据判定 / 冲突数据 / 数据挖掘

Key words

trust / truth data discovery / conflict data / data mining

引用本文

导出引用
余祖坤 , 许景楠 , 郑小林 , 陈德人. 基于信任的真实数据判定方法. 系统工程理论与实践, 2013, 33(9): 2404-2414 https://doi.org/10.12011/1000-6788(2013)9-2404
YU Zu-kun , XU Jing-nan , ZHENG Xiao-lin , CHEN De-ren. TruData:A trust-based method for truth data discovery. Systems Engineering - Theory & Practice, 2013, 33(9): 2404-2414 https://doi.org/10.12011/1000-6788(2013)9-2404
中图分类号: TP39   

参考文献

[1] Pipino L L, Lee Y W, Wang R Y. Data quality assessment[J]. Communications of the ACM, 2002, 45(4): 211-218.

[2] Batini C, Scannapieca M. Data quality: Concepts, methodologies and techniques[M]. Springer, Heidelberg, 2006: 16-17.

[3] Keast G, Toms E G, Cherry J. Measuring the reputation of web sites: A preliminary exploration[C]//Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, Virginia, United States: ACM, 2001: 77-78.

[4] Yin X X, Han J W, Yu P S. Truth discovery with multiple conflicting information providers on the Web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-808.

[5] Galland A, et al. Corroborating information from disagreeing views[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, NY, USA: ACM, 2010: 131-140.

[6] Wu M, Marian A. Corroborating answers from multiple web sources[C]//Proceedings of the 10th International Workshop on Web and Databases, Beijing, China: 2007: 38-4.

[7] Buneman P, Khanna S, Tan W. Data provenance: Some basic issues[J]. Lecture Notes in Computer Science, 2000, 1974: 87-93.

[8] Dai C, Wang T, Zhang P. Survey of data provenance technique[J]. Application Research of Computers, 2010, 27(9): 3215-3221.

[9] Benjelloun O, Sarma A D, Hayworth C, et al. An introduction to ULDBs and the trio system[J]. IEEE Data Engineering Bulletin, 2006, 29(1): 5-16.

[10] Benjelloun O, Sarma A D, Halevy A, et al. Databases with uncertainty and lineage[J]. VLDB Journal, 2008, 17(2): 243-264.

[11] Dai C, Lin D, Bertino E, et al. An approach to evaluate data trustworthiness based on data provenance[C]//Proceedings of the 5th VLDB Workshop on Secure Data Management, Auckland, New Zealand, 2008: 82-98.

[12] Dong X L, Berti-Equille L, Srivastava D. Integrating conflicting data: The role of source dependence[C]//Proceedings of the VLDB Endowment, Lyon, France: 2009, 2(1): 550-561.

[13] Dong X L, Berti-Equille L, Srivastava D, et al. SOLOMON: Seeking the truth via copying detection[C]//Proceedings of the VLDB Endowment, Singapore: 2010, 3(1-2): 1617-1620.

[14] Dong X L, Naumann F. Data fusion: Resolving data conflicts for integration[C]//Proceedings of the VLDB Endowment, Lyon, France, 2009, 2(2): 1654-1655.

[15] Princeton Survey Research Associates International. Leap of faith: Using the internet despite the dangers, results of a national survey of internet users for consumer reports webwatch[R]. Consumer Reports Webwatch, 2005.

[16] Balakrishnan R, Kambhampati S. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement[C]//Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India: ACM, 2011: 227-236.

[17] Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. J ACM, 1999, 46(5): 604-632.

[18] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the web[R]. Technical Report, Stanford InfoLab, 1998.

基金

国家自然科学基金(61003254,70771018);浙江省自然科学基金(Y1080130);中央高校基本科研业务专项基金

PDF(1315 KB)

269

Accesses

0

Citation

Detail

段落导航
相关文章

/